2021 12 De+Lucia
2021 12 De+Lucia
2021 12 De+Lucia
i
CONTENTS
Acknowledgements .................................................................................................... i
List of figures ............................................................................................................ iv
Abstract................................................................................................................... vii
1 Introduction ........................................................................................................ 1
2 State of Art and Background ............................................................................... 5
2.1 Microphone array ................................................................................................... 5
2.2 Array Geometry ...................................................................................................... 6
2.3 Directivity Pattern .................................................................................................. 7
2.4 Operating principle ................................................................................................. 9
2.5 Spatial Aliasing ..................................................................................................... 11
2.6 Beamforming........................................................................................................ 15
2.7 Delay and Sum Beamforming................................................................................ 15
2.7.1 Time Domain DAS .................................................................................................................16
2.7.2 Frequency Domain DAS ........................................................................................................17
iii
LIST OF FIGURES
Figure 1 The array shapes of interest consisted of 30 microphones. (a) a cross type, (b) a
circular type, (c) a modified spiral type array. Image taken from [9]. .......................................... 6
Figure 2 Microphone Array parameters at 1 KHz. Image taken from [6] ..................................... 8
Figure 3 A liner microphone array in a far-field. Image taken from [5] ........................................ 9
Figure 4 DAS algorithm operating principle. Image taken from [5] ............................................ 16
Figure 5 Mini DSP UMA 16 .......................................................................................................... 24
Figure 6 Mini DSP Control Panel ................................................................................................. 25
Figure 7 UMA 16 Frequency Response ....................................................................................... 26
Figure 8 Mini DSP datasheet ....................................................................................................... 26
Figure 9 CMOS OV5640 USB Camera .......................................................................................... 28
Figure 10 CMOS OV5640 Datasheet ........................................................................................... 29
Figure 11 Microphone positions ................................................................................................. 32
Figure 12 Array Impulse Response ............................................................................................. 33
Figure 13 Array Beam Patter at f=500 Hz ................................................................................... 34
Figure 14 Array Beam Pattern at f=1000 HZ ............................................................................... 35
Figure 15 Array Beam Pattern at f=1500 HZ ............................................................................... 35
Figure 16 Array Beam Pattern at f=2000 HZ ............................................................................... 36
Figure 17 Measurement of AoV. Image taken from [7] .............................................................. 39
Figure 18 Acoustic Camera App .................................................................................................. 40
Figure 19 Designed Flochart ....................................................................................................... 41
Figure 20 Power map of 1 KHz tone coming from [30 0] of 1s ................................................... 45
Figure 21 Power map of 1 KHz tone coming from [30 0] of 4s ................................................... 45
Figure 22 Power map of 2 KHz tone coming from [30 0] of 1s ................................................... 46
Figure 23 Power map of 2 KHz tone coming from [30 0] of 4s ................................................... 46
Figure 24 Power map of 2 KHz tone coming from [-30 0] located at 1.5 m................................ 47
Figure 25 Power map of 2 KHz tone coming from [-30 0] located at 2.1m................................. 48
Figure 26 Power map of 1 KHz tone coming from [-30 0] of 0.2 resolution ............................... 49
Figure 27 Power map of 1 KHz tone coming from [-30 0] of 0.5 resolution ............................... 49
Figure 28 Power map of 2 KHz tone coming from [30 0] of 1s ................................................... 51
Figure 29 Power map of 2 KHz tone coming from [30 0] of 4s ................................................... 51
Figure 30 Power map of 2 KHz tone coming from [-30 0] located at 1.5 m................................ 52
Figure 31 Power map of 2 KHz tone coming from [-30 0] located at 2.1 m................................ 52
Figure 32 Power map of 2 KHz tone coming from [60 30] of 0.2 resolution .............................. 53
Figure 33 Power map of 2 KHz tone coming from [60 30] of 0.5 resolution .............................. 54
Figure 34 Power map of 2 KHz tone coming from [0 -30] generated by DaS ............................. 54
Figure 35 Power map of 2 KHz tone coming from [0 -30] generated by MUSIC......................... 55
Figure 36 Acoustic Camera Image related to Hypnocampus 2C at f=2000 Hz ............................ 56
Figure 37 Acoustic Camera Image related to Hypnocampus 2C at f=800 Hz .............................. 57
Figure 38 T60 Room with Yamaha HS7 ....................................................................................... 57
Figure 39 Acoustic Camera Image related to Yamaha HS7 at f=400 Hz ...................................... 58
Figure 40 Acoustic Camera Image related to Yamaha HS7 at f=2000 Hz .................................... 59
Figure 41 T60 Room with Tannoy T125 ...................................................................................... 59
Figure 42 Acoustic Camera Image related to Tannoy T-125 at f=200 Hz .................................... 60
Figure 43 Acoustic Camera Image related to Sony SS TS-20 at f=800 Hz.................................... 60
iv
Figure 44 Acoustic Camera image related to a real noise........................................................... 61
v
ABSTRACT
In the last few years viewing an audio source has become a widely used tool
not only in teleconferencing, speech enhancement and recognition, video
games etc, but also in the acoustic field, especially regarding noise
localization and sound insulation.
Although its potentiality, an acoustic image is difficult to achieve in
environments with a large amount of noise and reverberation. An effective
approach to obtaining a clean recording of desired acoustic signal comes
from Beamforming Theory coupled with Microphone Array.
The latter together with a camera is usually referred to as acoustic camera,
a device used to locate sound sources and to characterize them.
In this thesis we have designed an acoustic camera using a microphone
array mini-DSP UMA 16 and MATLAB software in order to determine sound
power estimation performance and sound source separation ability. We
have implemented it in two steps: a static setup in which the audio signals
have been acquired by a microphone array and have been processed at
later time, while in a second step we have extended it in a real time
application.
Alter an investigation about the techniques utilized for an acoustic camera
application, we have explored the uses of the array and the beamformer in
order to obtain a sound intensity map and reconstruct the acoustic scene.
The obtained results presented in this thesis show that reliable application
of beamforming techniques can generate a sound intensity map with a fair
accuracy both in static setup and in real time mode.
vii
1 INTRODUCTION
The term acoustic camera has been widely used during the 20th century to
designate various types of acoustic devices, such as underwater localization
systems or active systems used in medicine.
Nowadays it designates for every transducer array coupled with a camera
which are enabled to localize a sound source and visualize it [11][17].
The advantage to “see” the sound and characterize it, has stimulated many
companies to produce their own acoustic camera. But the signal processing
required is very intensive and the process needs powerful hardware. So,
cameras that do perform signal processing in real time tend to be large and
very expensive.
Starting from several studies in this field [25][26][27][28], this thesis
investigates different approaches designing a low-cost acoustic camera
using micro electro-mechanical microphones (MEMS) collected into a plane
array [18]. MEMS are micro-scale devices that provide high fidelity acoustic
sensing and are small enough to be included in a tightly integrated
electronic product [23]. They are used to measure the propagating
wavefield and transferring the filed energy to electrical energy. Since the
wavefield is assumed to have information about the signal source, it is
sampled into a data set to be processed in order to extract as much as
possible information from it. This process is a branch of array signal
processing, and it involves several algorithms which are known as
beamforming techniques. The aim of these techniques is to detect and to
estimate many properties and parameters of the signal such as power level
and direction of arrival (DOA) [32].
The estimation problem is a key research area in the array signal processing
and many engineering applications [15], such as wireless communications,
radar, sonar, other devices that need to be supported by direction of arrival
estimation. For example, the job of radar systems is to detect and locate
objects [20]. A signal emitted at pre specified frequency hits the objects.
One of the reflected rays comes back in the direction of the radar to be
detected by a receiving antenna which delivers to the receiver information
about the distance between the radar and the object location. Same
1
Chapter 1-Introduction
2
The aim of this thesis is to develop an acoustic camera that is cheaper than
the commercial one. With this in mind, we have divided the work in two
steps about a static acquisition and a real-time implementation
respectively. For the static acquisition, the idea was to explore different
beamforming techniques and evaluate in terms of accuracy, resolution, and
computational time which technique could be more suitable for our system
in real time application. In particular, we have implemented a classical
beamforming (DAS) and subspace method called Multiple Signals
Classification (MUSIC). Regarding the former, depending on the position of
the sound source, we have noticed that it can be influenced by the reflected
sound that should be seen by the system as another source. In terms of
accuracy, it varies in both Azimuth and Elevation angles from 4 to 20
degrees while the resolution depends on the scanning point of the steering
vector. Obviously, the greater the number of scanning point the greater will
be the computational time. Different from DAS, MUSIC algorithm does not
present the problem of the reflected sound and it also shows an
improvement of the accuracy which decreases from 20 to 5 degrees for
both Azimuth and Elevation, while the resolution depends on the scan
angles too.
Due to its relative localization error and a faster time response, we have
decided to implement MUSIC method in the real time section. For easy to
use we have developed an application using App Designer tool of Matlab in
order to create a simple graphic user interface (GUI).
As a result, we have designed and implemented an acoustic camera with a
frequency range between 200 and 4000 Hz that is also able to detect and
locate sound sources in the reverberating environments.
3
Chapter 1-Introduction
4
2 STATE OF ART AND BACKGROUND
The goal of the acoustic camera is to localize a sound source drawing on its
position a power map that corresponds to its sound intensity. Today all
commercial acoustic cameras use a microphone array and a camera on its
center in order to collect the incoming signal and visualize the sound scene.
This signal is analyzed through a beamforming technique, and after a
localization, the system displays a power map on the source position that
represents the sound intensity. This section explains the basic principles of
microphone array and beamforming techniques. We will review some
previous studies about the suitable number of microphones in the array and
its shape and we will analyze some of widely used beamforming techniques.
Finally, we will present the device and software used to implement our low-
cost acoustic camera
Figure 1 The array shapes of interest consisted of 30 microphones. (a) a cross type, (b) a circular type,
(c) a modified spiral type array. Image taken from [9].
6
2.3 Directivity Pattern
• 3 dB beamwidth
• Relative side-lobe level
• Peak-to-zero distance
7
Chapter 2-State of Art and Background
All the performance parameters are functions not only of the array
geometry, but also of the number of microphones and of the frequency of
the impinging signal.
The 3 dB beamwidth, also called half power beamwidth, is the region where
the main lobe has not decreased by more the 3 dB. The relative side lobe
level expresses the relative sensitivity of the first side lobe compared with
the main lobe. Peak-to-zero distance measures the angle from the main
lobe maximum to the first minimum.
In previous research [3], it was shown that increasing the number of
microphones produces a larger and narrowed main lobe, which means a
higher gain of the signal in the desired direction. Furthermore, it was
noticed that increasing the spacing between adjacent microphones will
result in an increase of the side lobe.
8
2.4 Operating principle
9
Chapter 2-State of Art and Background
the wave front of a sound source is curved, and it introduces much more
complexity in the actual position calculation using the TDOA. However, this
wave front can be approximated as flat, but it introduces an error in the
calculated position. The relative size of this error compared to other error
sources depends on the distance r between the sound source and the
microphone array and it depends on the size of the array. If the distance
between the sound source and the array is large compared to the
dimensions of the array, the approximation does not introduce much
additional error in the calculated position. This is referred to as a Far-Field
case. If the distance between the sound source and the array is small
compared to the dimension of the array this approximation cannot be made
and it is referred to as a Near-Field case.
In most beamforming applications the assumption of Far-Field case
simplifies the analysis. It means that the signal source is located far enough
away from the array so that the wave fronts impinging on it can be seen as
a plane wave. The equation which verifies the far-field condition is
𝑟 ≫ 2𝜆 (2.1)
In the case of linear array, the microphones in the array are positioned
equidistantly, i.e., di=d. For such linear array it is possible to define the time
delay 𝜏I for each microphone mi. That time delay represents the TDOA for
uniform linear array in the far filed and it can be calculated as
!∙#$%& ( 2 . 2)
𝜏=
#
10
2.5 Spatial Aliasing
Where:
Using the same notation, the STFT of the microphone signals can be written
as
Where 𝐻' (𝜔# ) is the frequency response of the kth microphone evaluated
at 𝜔 = 𝜔# .
Adopting a vector representation, it is possible to derive the complete array
model for a single source
11
Chapter 2-State of Art and Background
Where
𝑦, (𝑡)
𝑦 (𝑡)
• 𝑦(𝑡) = 7 - 9 is the Array Vector
⋮
𝑦. (𝑡)
𝐻 (𝜔 )𝑒 ()*! +#
⎡ , # ()* + ⎤
• 𝑎(𝜃 ) = ⎢ 𝐻- (𝜔# )𝑒
! $
⎥ is the Propagation Vector or Steering
⎢ ⋮ ⎥
⎣𝐻. (𝜔# )𝑒 ()*! . ⎦
Vector
𝑒, (𝑡)
𝑒 (𝑡)
• 𝑒(𝑡) = 7 - 9 is the Noise Vector
⋮
𝑒. (𝑡)
Now the explicit dependence of the time delay as a function of the DOA 𝜃
can be derived as
!∙%/0(&)
𝜏 ' = (𝑘 − 1 ) ∙ for 𝜃 ∈ [−90° , 90° ]
# (2 . 6)
12
2.5 Spatial Aliasing
!%/0&
⎡ 𝑒 ()*! # ⎤
⎢ !%/0& ⎥
()*! #
𝑎 (𝜃 ) = ⎢ 𝑒 ⎥
⎢ ⋮ ⎥
( 2 . 8)
!%/0&
⎣𝑒 ()(.(,)* ! # ⎦
% ( 2 . 9)
𝑒 ()4*! !%/0 ! 5' , 𝑘 =0, 1, …, 𝑀 − 1
𝑑𝑠𝑖𝑛𝜃
𝜔% ≜ 𝜔#
𝑐 ( 2 . 10)
6 ( 2 . 11)
𝑎(𝜃 ) = N1 𝑒 ()*& … 𝑒 ()(.(,)*& P
|𝜔% | ≤ 𝜋 ( 2 . 12)
# # -9#
• Wavelength: 𝜆 = 7 = (* =
! ! /-9) *!
!%/0& !%/0&
• Spatial Frequency: 𝜔% = 𝜔# = 2𝜋
# ;
While the increase of the number of microphones increases the gain and
improves the separation of sources, the distance between adjacent
microphones can generate spatial aliasing and consequently the wave
arrival direction can be ambiguous.
In addition, the anti-aliasing condition introduces an important weakness
of the array.
Knowing that
𝑐
𝜆=
𝑓
( 2 . 14)
𝑐
𝑑<
2𝑓
( 2 . 15)
14
2.6 Beamforming
𝑐
f<
2𝑑
( 2 . 16)
2.6 BEAMFORMING
The Delay and Sum beamforming algorithm can be performed in time and
in frequency domain.
The functionality of the delay and sum beamforming in the time domain
can be resume as following: the sound source reaches the microphone on
different paths. The signals captured by the microphones are similar in
wave form but show different delays and phases. Both are proportional to
the cover distances. The delays can be determined from the speed of sound,
the distance between microphone and the sound source [29].
Then the delays of the signals can be calculated with respect to the
reference microphone to ensure that the part of the desired signal has the
16
2.7 Delay and Sum Beamforming
same phase on all channels. From the signal model [5], the delayed signal
of the i-th microphone is:
1 .
𝑦 (𝑡 ) = \ [𝑥/ (𝑡) + 𝑒/ (𝑡)]
𝑀 , ( 2. 18)
In other words, the DAS algorithm in the time domain separates the delay
into an integer multiple of the sampling period and a non-integer part. The
integer part is obtained by delaying the signal with a certain amount of
samples, whereas a Finite Impulse Response (FIR) filter is used to add the
non-integer part of the delay. The algorithm cannot process negative delay
values and therefore all the delays need to be additionally shifted so that
the delay with the smallest, i.e. most negative value, becomes the reference
value.
ℎ = [ℎ, , ℎ- , … , ℎ. ]? ( 2. 19)
17
Chapter 2-State of Art and Background
.
𝑦@ (𝑡) = \ ℎ' ∙ 𝑦' (𝑡) = ℎ? 𝑦(𝑡)
'A,
( 2. 20)
It means that the DOA of the incoming signal can be estimated evaluating
the equation (21) known as pseudo-spectrum function.
Since the Covariance Matrix is only related to the incoming signal, the
problem is reduced to design a spatial band-pass filter ℎ(𝜃) such that it
passes undistorted the signal with a target DOA 𝜃 and it attenuates all the
other DOAs different from 𝜃 as much as possible.
The first condition is met when the spatial response at 𝜃 is unitary, i.e. when
18
2.7 Delay and Sum Beamforming
ℎ? a𝜃 b𝑎a𝜃 b = 1 ( 2. 23)
Then assuming that the signal is spatially white, minimizing the power from
all the direction but 𝜃 means solving the following problem:
𝑎(𝜃 ) 𝑎(𝜃 )
ℎa𝜃 b = =
𝑎? (𝜃 )𝑎(𝜃 ) 𝑀 ( 2. 25)
𝑎? (𝜃)𝑅𝑎(𝜃)
𝑝(𝜃 )𝐸 = {|𝑦@ (𝑡)|- } = ℎ? a𝜃b𝑅ℎa𝜃b =
𝑀- ( 2. 25)
19
Chapter 2-State of Art and Background
Let’s consider the complete data matrix X that contains the signals at each
microphone in the case of multiple signal sources:
𝑋⃗ = 𝐴⃗𝑆⃗ + 𝑉
o⃗ ( 2. 26)
𝑁<𝑀 ( 2. 27)
• The source signals 𝑆⃗ are such that their covariance matrix is non-
singular:
20
2.8 Multiple Signal Classification (Music)
Λ = diag{𝜆, , 𝜆- , … , 𝜆. } ( 2. 30)
21
Chapter 2-State of Art and Background
𝑅 = 𝑈% Λ % 𝑈% ? + 𝑈0 Λ 0 𝑈0 ? ( 2. 31)
Where 𝑈% and 𝑈0 are the signal and the noise subspace unitary matrix.
The key issue in estimating the direction of arrival consists in observing that
all the noise eigenvectors are orthogonal to A, the columns of 𝑈% span the
range space of A and the columns of 𝑈0 span the orthogonal complement
of A which is the nullspace of 𝐴? . By definition, the projection operators
onto the noise and signal subspace are:
And
Since the signals are linearly independent, 𝐴𝑅% 𝐴? is a full rank and since the
eigenvectors in 𝑈0 ? are orthogonal to 𝐴, it leads to:
𝑈0 ? 𝐴 = 0, 𝜃 ∈ {𝜃, , … 𝜃B } ( 2. 34)
22
2.9 Mini DSP UMA 16
Finally, the DOAs can be retrieved from the N highest peaks of the Music
spatial “pseudo-spectrum” function defined as:
1
𝑃.DEFG (𝜃 ) =
𝑎? (𝜃)𝑃0 𝑎(𝜃)
( 2. 35)
Basically, the music algorithm estimates the distance between the signal
and the noise subspaces in a direction where a signal is present and, since
the two subspaces are orthogonal to each other, the distance between
them at that every angle will be zero or near zero. Similarly, if no signal is
present at a particular direction the subspaces are not orthogonal and the
result will be zero.
23
Chapter 2-State of Art and Background
Due to its embedded ADC converter, DSP UMA-16 guarantees a high data
transfer rate. In addition, the processing power allows for high quality PDM
to PCM conversion and present all 16 channels of raw audio to the ASIO
USB audio driver.
Moreover, the mini-UMA 16 is equipped with control panel which allows to
set and control both the volume of each microphone and master volume
i.e. all together.
24
2.9 Mini DSP UMA 16
25
Chapter 2-State of Art and Background
26
2.10 CMOS OV5640 USB CAMERA
The CMOS OV5640 is an USB camera module that enables image capturing
along with the 720p, 1080p video streaming capability. Its 5MP hardware
pixel together with its several resolutions allow to reconstruct the acoustic
scene with high quality image.
27
Chapter 2-State of Art and Background
Due to the fact CMOS OV5640 perfect fits the UMA 16, and due to its
extremely low-cost respect to the quality image that it offers, the choice
of the camera fallen it. The devices datasheet is showed in following
figure.
28
2.10 CMOS OV5640 USB CAMERA
29
Chapter 2-State of Art and Background
30
3 SYSTEM DESIGN
Starting from the position of each sensor into the array, we have arranged
the microphone array on the X-Y plane and associated a number from 1 to
16 to each microphone.
31
Chapter 3-System Design
32
3.2 Array Impulse Response
the reflected sound i.e., by the room itself [14]. Since the beamforming
techniques are based on the deconvolution of the captured signals, for the
beamforming device to be fully functional, a fundamental aspect is the fact
the all the sensor of the microphone array must have the same impulse
response. But since received signal captured by each microphone is
affected from the environment and, even if the microphone positions are
known accurately, there may be some phase differences between
microphones. This will give the same effect as microphone position errors
[19]. The figure 12 shows the impulse response captured by the sixteen
channels of the microphone array in a room used for the experiments.
33
Chapter 3-System Design
Using the phase array system toolbox in Matlab, we have simulated the
beam pattern for our array at different frequencies.
34
3.3 Array Beam Pattern
35
Chapter 3-System Design
As show in the figures 13-14-15-16, the microphone array has the same
behavior at different frequencies because it has identical omnidirectional
microphones. But due to its omnidirectionality, putting the array in a X-Y
axis, it presents the same beam pattern from signals coming from both the
Z and -Z direction. As a result, the system can capture signals coming from
the back which theoretically should be a reflected signal. But due to high
stability of the beamforming algorithm the system is not so affect by the
reflected signal coming from the back when the incoming signal is in front
of the device. This will be deeply analyzed in chapter 4 about the test and
the experimentation results.
36
3.4 Beamforming Implementation
The data was acquired using Matlab Audio Toolbox. It includes a function
which allows to capture audio directly from Mini DSP UMA 16 and store all
sixteen audio signals coming from the microphone array into an audio file.
Once the data was collected into a matrix, the main process was done by
the beamforming algorithm. We have firstly implemented both code
algorithms step by step as described in chapter 2.6 and 2.7. Regarding
MUSIC method, after some tests we have preferred to employ a function
called phased.MUSICEstimator2D of the phase array system toolbox since
its computational time is less than own MUSIC algorithms code.
37
Chapter 3-System Design
The core of the project is the implementation of the acoustic camera in real
time mode. Different from the static setup, the real time application is
based on the continuous streaming of input data without record them.
Since the UMA 16 has an integrated DSP the latency of the input signal is
almost negligible. But, before locating the sound source and creating the
relative acoustic map, the algorithm needs a certain amount of time that
can generate a delay. In order to overcome this type of problem, we have
decided to implement the MUSIC algorithm which, in the static setup, was
resulted the faster implemented algorithm in term of computational time
together with a better accuracy.
Obviously, MUSIC does not guarantee a null time delay because it has to
analyze sixteen audio channels coming from the microphone array. Since
the length of each channel in term of samples plays a fundamental role, the
idea was to fill a buffer of a minimum length that guarantees a correct
localization.
Particular attention was paid to the Angle of View (AoV) of the webcam in
the system. This because it specifies steering angle of beamforming.
Denoting the AoV by a, it was calculated using a relation of triangle as
shown in figure 17.
38
3.4 Beamforming Implementation
39
Chapter 3-System Design
code view of the app design allows to assign a callback function to the
components defined in the design view. The result is shown in figure 18.
Practically, the audio signals coming from the microphone array fill the
audio buffer while the video frame streamed by the cam is sent directly to
the app. At the same time Music algorithm takes the audio frame from the
buffer, analyzes it and generates a power map which is overlayed to the
video frame. The key process is shown in figure 19.
40
3.4 Beamforming Implementation
41
Chapter 3-System Design
42
4 TEST AND RESULTS
In this section, different incoming signals have been used with different
duration in time. The goal was to find the minimal length of analyzed signals
which guarantee the localization of source.
Then we have proceeded by changing the distance between the sound
source and the device and changing the position of the sound source.
The performance of the acoustic camera was measured in term of accuracy,
resolution and computational time. The accuracy represents how the
estimated DoA is close to the real one and it was measured in both Azimuth
and Elevation angles. It was also measured with regard the root mean
square error (RMSE) between the true source position and the estimated
one where the former was measured putting the speaker on a rotate
system with reference angle.
All the experiment was carried out in a reverberant environment with
T60=1.2 seconds and the test tones were generated from the 5’’
Hypnocampus 2C speaker. We will refer to the direction of arrival in both
Azimuth and Elevation as [Az El] in degrees [°].
Staring from the static acquisition, a first step was to evaluate the Delay and
Sum algorithm using two tone signals of different duration and coming from
several directions.
The signals captured by the microphones have a Signal to Noise Ratio
between 30 and 54 dB. The results are shown in table 1.
43
4.1 Static setup test
42
4.1 Static setup test
A first test was performed using 1 KHz tone by changing its duration. The
figures 20-21 shows the response of the algorithm to incoming signals of
duration 1 second and 4 seconds respectively.
Since the generated power maps present the same accuracy, we have
noticed that the DaS is not affect by the length of the signal to be analyzed.
The same happens regarding a 2 KHz tone signal.
45
Chapter 4-Test and Results
Although the duration of the signal is not relevant, we have noticed that
when the number of samples of the incoming signal increases the system
takes long to plot the power map. However, the length of the signal
depends not only on the duration of acquisition but also it depends on the
sampling frequency of the system Fs. The test tones present in figure 20
were recorded at different sampling frequencies starting from 44.1 KHz
down to 11.205 KHz. As noticed in chapter 3.1 the distance between the
46
4.1 Static setup test
microphones into the array sets the upper frequency range about 4000 Hz.
Since all the selected Fs guarantee the anti-aliasing condition (Fs>2Fmax), we
have decided to set the sampling frequency at 11025 Hz in order to process
the minimum number of samples.
Figure 24 Power map of 2 KHz tone coming from [-30 0] located at 1.5 m
47
Chapter 4-Test and Results
Figure 25 Power map of 2 KHz tone coming from [-30 0] located at 2.1m
Comparing the two signals, we have noticed that they are symmetric
respect of the X axis, but they also present a similar RMSE which are equal
to 3 for the signal positioned at distance 1.5 meter and 3.6 for the signal at
2.1 meters. It means that they present almost the same localization error.
A considerable difference is in the Sound Pressure Level of the two signals
which are 0.35 dB for the former and 2.8 dB for the latter. In order to relate
the distance between the microphone array and sound source we have
tested the DaS algorithm considering signals with different SPL. The result
was that the algorithm can localize sound sources with maximum distance
between 2 and 3 meters with a power between 0.5 and 0.2 dB at least.
The quality of the image power map depends on the resolution. It is related
to the number of the scanning point of the steering vector (equation 2.9).
The figures 26-27 show the power map related to a 1 Khz tone signal coming
from [-30 0] computed with a scan angle between – 90 and 90 for both
Azimuth and Elevation with a resolution of 0.2 and 0.5 respectively. As
shown, the resolution does not affect the accuracy. The higher the
resolution corresponds to the higher quality power map but, on the other
hand the price to pay is related to the computational time in terms number
of point that the algorithm must draw which are 361x361 for a resolution
of 0.5 and 901x901 for a resolution of 0.2.
48
4.1 Static setup test
Figure 26 Power map of 1 KHz tone coming from [-30 0] of 0.2 resolution
Figure 27 Power map of 1 KHz tone coming from [-30 0] of 0.5 resolution
Analyzing the obtained results, we have noticed that the accuracy of the
DaS algorithm is a very frequency dependent. Although a discrete
localization of the sound source, the performance of the method is
influenced by the reflected sound that reaches the device. As a result, the
corresponding power map also shows an intensity sound scene that is not
related to the source and it might suggest the presence of another sound
source.
49
Chapter 4-Test and Results
Under the same condition and using the same test signals we have
evaluated the performance of the MUSIC algorithm. The results are shown
in table 2.
50
4.1 Static setup test
51
Chapter 4-Test and Results
In order to ensure the dependance of the accuracy on the sound source SPL
rather than its position respect to the microphone array, we have
performed a test using a 2 KHz tone. We have put the sound source at 1.5
meter from the device, we have measured the SPL and we evaluated the
MUSIC response. Then we have moved back the sound source at 2.1 meters
and we have increase the sound pressure level util it was equal to the SPL
measured at 1.5 meters. The result is show in the figure 30-31.
Figure 30 Power map of 2 KHz tone coming from [-30 0] located at 1.5 m
Figure 31 Power map of 2 KHz tone coming from [-30 0] located at 2.1 m
52
4.1 Static setup test
The power maps present the same result for signals coming from both
directions. In additions we have accomplished a test in presence of two
sound sources located at the same distance from the device which
generated two signals with equal SPL. In this condition, the system does not
locate both sources, but it generates a power map that focuses on the
higher energy space. As expected, as the SPL of one source increases the
energy map follows the signal generated by that source.
Furthermore, like DaS the resolution does not affect the accuracy. Although
the number of drawn point (361x361 for a resolution of 0.5 or 961x961 for
0.2 of resolution both regarding a scan angle between -90 and 90 degrees
in Azimuth and Elevation direction) is the same for MUSIC and DaS
spectrum, Music performs faster than Das due to the nature of the two
algorithms.
Figure 32 Power map of 2 KHz tone coming from [60 30] of 0.2 resolution
53
Chapter 4-Test and Results
Figure 33 Power map of 2 KHz tone coming from [60 30] of 0.5 resolution
In addition, MUSIC algorithm is not affected by the reflected sound and this
allows to identify uniquely the sound source.
Furthermore, we have noticed an improvement of the accuracy which drop
from 6.8 to 4.1 for the maximum value and from 3 to 1.2 for the minimum
value in term of Root Mean Square Error.
Figure 34 Power map of 2 KHz tone coming from [0 -30] generated by DaS
54
4.2 Real Time mode experiment
Figure 35 Power map of 2 KHz tone coming from [0 -30] generated by MUSIC
55
Chapter 4-Test and Results
A first test was done in the same environment where we have performed
the static acquisition and tests. The reverberation time was unvaried ad
equal to 1.2 seconds as the used speaker too.
In this condition the figure 36 shows the result for a pink noise incoming
signal.
As we can notice, the system does not locate correctly the sound source
with a search frequency equal to 2000 Hz. Tuning the search frequency at
800 Hz, the system locates correctly the sound source.
56
4.2 Real Time mode experiment
As the first test, we have noticed that our system can locate the sound
source with a proper search frequency.
57
Chapter 4-Test and Results
In the figure 39, the test signal was a pink noise. In this case the search
frequency of 400 Hz is related to the emitting woofer surface. The same
behavior was observed using a white noise, as show in figure 40, where the
sound also coming from a different position and the search frequency is
related to the tweeter area.
58
4.2 Real Time mode experiment
In the last experiment, we have accomplished the test using two different
speakers separately which are a woofer 10” Tannoy T 125 and a wideband
2.7” Sony SS-TS20. It was carried out in a third environment with the
reverberation time equal to 0.6.
59
Chapter 4-Test and Results
The figures 42-43 show the acoustic scene created for two different sound
sources which generate a pink noise signal. As expected, we have noticed
that the accuracy of the device depends on the search frequency in case of
wideband signals.
60
4.2 Real Time mode experiment
In table 3 are collected all the obtained results. The table related the search
frequencies that allow to correctly locate the sound source and the
corresponding signal and speaker from which they were generated.
61
Conclusion and Future works
The goal of this thesis was to design and develop a low-cost acoustic
camera. Based on MUSIC algorithm, using a mini-DSP UMA 16 with a CMOS
camera, we have implemented a GUI that facilitates its use resulting very
intuitive. The operating frequency range is not so extended due to its
physical limitation but, depending on the input signal and depending on the
emitting area of sound source, the system can detect and locate signals
which have energy content from 4000 Hz to 200 Hz. The reproduced
acoustic scene depends on both the resolution of the power map and on
the resolution of the camera. A right compromise between quality image
and time to stream it was found by setting the algorithm resolution at 0.2
and the image resolution at 800x600. In these conditions the system reacts
rapidly to an incoming sound source. In addition, the acoustic camera
presents a discrete accuracy that, in the best case drops until 1.2 in terms
of RMSE. Unfortunately, the device fails when the incoming signals are a
low frequency pure tone since it is surrounded by signals that have
wavelength greater than its size. Moreover, the acoustic camera is not able
to detect and locate more than one sound source but it only detects the
source with high SPL.
63
REFERENCE
[1] Acoustic Camera Design with Different Types of MEMS Microphone. Arrays Sanja
Grubesa Jasna Stamac, Mia Suhanek Department of Electroacoustics, Faculty of
Electrical Engineering and Computing, University, of Zagreb, Zagreb, Croatia.
[2] Computer-steered microphone arrays for sound transduction in large rooms. The
Journal of the Acoustical Society of America, 78(5), 1508-1518.Flanagan, J. L., Johnston,
J. D., Zahn, R., & Elko, G. W. (1985).
[3] Designing the Acoustic Camera using MATLAB with respect to different types of
microphone arrays. J. Stamac, S. Grubesa and A. Petosic Second International
Colloquium on Smart Grid Metrology, SMAGRIMET 2019.
[4] Microphone Array and Spatial Method. Augusto Sarti Politecnico Di Milano.
[5] Beamforming: A Versatile Approach to Spatial Filtering. Barry D. Van Veen and
Kevin M. Buckley.
[5] The Development and Analysis of Beamforming Algorithms Used for Designing an
Acoustic Camera Sanja Grubesa, Jasna Stamac, Ivan Krizanic, Antonio Petosic.
[7] Design and build of a planar acoustic camera using digital microphones.
KhemapatTontiwattanakul. https://www.researchgate.net/publication/335363705.
[10] Real-time conversion of sensor array signals into spherical harmonic signals with
applications to spatially localised sub-band sound-field analysis. Leo McCormack,
Symeon Delikaris-Manias, Angelo Farina, Daniel Pinardi, and Ville Pulkki.
64
[12] Parametric Acoustic Camera for Real-time Sound Capture, Analysis and Tracking.
Leo McCormack, Symeon Delikaris-Manias and Ville Pulkki.
[13] Imaging concert hall acoustics using visual and audio cam- eras. Adam O’Donovan,
Ramani Duraiswami, and Dmitry Zotkin, in Acoustics, Speech and Signal Processing,
2008. ICASSP 2008. IEEE International Conference on. IEEE, 2008, pp. 5284–5287.
[14] High resolution imaging of acoustic reflections with spherical microphone arrays.
Lucio Bianchi, Marco Verdi, Fabio Antonacci, Augusto Sarti, and Stefano Tubaro in
Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015 IEEE
Workshop on. IEEE, 2015.
[16] A frequency estimation algorithm based on carrier date detection scheme and
MUSIC algorithm for DS/BPSK signals. Biwen Wang, Peng Liu, Jiyan Huang, Bowu,
and Guocai Mu.
[18] A robust doppler ultrasonic 3D imaging system with MEMS microphone array and
configurable processor. Maeda, Y., Sugimoto, M., & Hashizume, H. In Ultrasonics
Symposium (IUS), 2011 IEEE International (pp. 1968- 1971). IEEE.
[21] Signal processing in acoustic imaging. Keating, P. N., Sawatari, T., & Zilinskas, G.
(1979). Proceedings of the IEEE, 67(4), 496-510.
[22] Microphone array signal processing (Vol. 1) Benesty, J., Chen, J., & Huang, Y.
(2008). Springer Science & Business Media.
[23] https://www.coventor.com/blog/explanation-new-mems-microphone-technology-
design
[24] The fusion of distributed microphone arrays for sound localization. Aarabi, P.
(2003). EURASIP Journal on Applied Signal Processing, 2003, 338-347.
[25] Combination of microphone array processing and camera image processing for
visualizing sound pressure distribution. Goseki, M., Ding, M., Takemura, H., &
Mizoguchi, H. (2011, October). In Systems, Man, and Cybernetics (SMC), 2011 IEEE
International Conference on (pp. 139-143). IEEE.
65
[26] Visualizing sound pressure distribution by kinect and microphone array. Goseki, M.,
Takemura, H., & Mizoguchi, H. (2011, December). In Robotics and Biomimetics
(ROBIO), 2011 IEEE International Conference on (pp. 1243-1248). IEEE.
[27] Sound positioning using a small-scale linear microphone array. Pei, L., Chen, L.,
Guinness, R., Liu, J., Kuusniemi, H., Chen, Y., ... & Soderholm, S. (2013, October). In
Indoor Positioning and Indoor Navigation (IPIN), 2013 International Conference on (pp.
1-7). IEEE.
[28] An acoustic imaging simulation based on microphone array. Jing, Z., Bo, L., LU, D.,
& Errui, C. (2011, July). In Cross Strait Quad-Regional Radio Science and Wireless
Technology Conference (CSQRWC), 2011 (Vol. 2, pp. 1398-1401). IEEE.
[30] Beam patterns from pulsed ultrasonic transducers using linear systems theory.
Bardsley, B. G., & Christensen, D. A. (1981). The Journal of the Acoustical Society of
America, 69(1).
[31] A study on acoustic imaging based on beamformer to range spectra in the phase
interference method. Miyake, R., Hayashida, K., Nakayama, M., & Nishiura, T. (2013,
June). In Proceedings of Meetings on Acoustics (Vol. 19, No. 1, p. 055041). Acoustical
Society of America.
[32] Audio location: Accurate Low-Cost Location Sensing. Scott, James, Dragovic,
Boris - (Intel research Cambridge 2005)
[34] Design and use of microphone directional arrays for aeroacoustic measurements. W.
M. Humphreys Jr., T. F. Brooks, W. W. Hunter Jr., and K. R.
Meadows, in 36th Aerospace Sciences Meeting & Exhibit, Reno, NV, Jan 1998, AIAA
Paper No. 98-0471.
[36] Digital Signal Processing - Principles, Algorithms and Applications 3rd edition. J. G.
Proakis and D. G. Manolakis, Prentice Hall International Editions, 1996
[37] Study of DOA Estimation Using Music Algorithm. Bindu Sharma, Ghanshyam
Singh, Indranil Sarkar. International Journal of Scientific & Engineering Research,
Volume 6, Issue 7, July-2015
66