Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views

A Java-DSP Interface For Analysis of The MP3 Algorithm

This paper presents a Java-DSP interface for analyzing the MP3 audio compression algorithm. The interface incorporates an open source MP3 library and allows users to visualize and examine different modules of the MP3 encoding and decoding process, including the filter bank, polyphase filters, and windowing. The interface is intended for educational use to introduce concepts like filter banks and windowing to students.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

A Java-DSP Interface For Analysis of The MP3 Algorithm

This paper presents a Java-DSP interface for analyzing the MP3 audio compression algorithm. The interface incorporates an open source MP3 library and allows users to visualize and examine different modules of the MP3 encoding and decoding process, including the filter bank, polyphase filters, and windowing. The interface is intended for educational use to introduce concepts like filter banks and windowing to students.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/251994370

A Java-DSP interface for analysis of the MP3 algorithm

Conference Paper · January 2011


DOI: 10.1109/DSP-SPE.2011.5739206

CITATIONS READS
2 662

4 authors, including:

Jayaraman J. Thiagarajan C. S. Pattichis


Lawrence Livermore National Laboratory University of Cyprus
120 PUBLICATIONS 604 CITATIONS 431 PUBLICATIONS 5,663 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Proxy Application Validation View project

RSNA 2017 View project

All content following this page was uploaded by C. S. Pattichis on 25 June 2014.

The user has requested enhancement of the downloaded file.


A JAVA-DSP INTERFACE FOR ANALYSIS OF THE MP3 ALGORITHM

Chih-Wei Huang1, Jayaraman J. Thiagarajan1, Andreas Spanias1 and Constantinos Pattichis2


1
SenSIP Center, School of ECEE, Arizona State University, Tempe, AZ 85287-5706, USA.
2
Department of Computer Science, University of Cyprus, Nicosia, Cyprus.

ABSTRACT

Java-DSP is a freely accessible web-based software,


primarily used in signal processing education and research.
In this paper, we present Java-DSP modules that have been
developed for the study and analysis of the MPEG-1 Layer
III algorithm. We have embedded JLayer1.0, an open source
MP3 library, to Java-DSP and developed an intuitive
interface to expose undergraduate and graduate students to
the several modules in the encoding/decoding process. The
Java-DSP MP3 decoder block is an interactive function
which can be used to examine the characteristics and
visualize outputs of different modules in the algorithm.
Some of the important functions incorporated in the
proposed interface include the analysis of the hybrid filter
bank, polyphase filters and the window switching based on Figure 1. The proposed MP3 decoder interface in Java-DSP.
perceptual criteria. The MP3 algorithm represents a
compelling framework for teaching certain aspects in DSP. processing [8], control systems [9], time-frequency analysis
We are using this module to introduce filter banks and [10], analog/digital communications [11] and earth system
windowing to undergraduate students. signal processing [12].
Index Terms— MPEG-1 standard, compression, Java- Digital compression of audio data is highly important
DSP, software, audio. due to the bandwidth and storage limitations. High-fidelity
and small bit-rate are the most desirable properties of these
1. INTRODUCTION algorithms. MPEG/audio is a generic audio compression
standard. The MPEG-I Layer-3 algorithm, widely referred to
Java-Digital Signal Processing (J-DSP) is a web-based, as the MP3, is an efficient algorithm for perceptual audio
platform-independent, visual programming environment that coding [13-14]. Perceptual audio coding is a lossy coding
enables users to perform online signal processing scheme that tries to remove parts of the signal that the
calculations and simulations [1]. J-DSP was built from the human ear cannot perceive. Unlike the vocal-tract-model
ground up in Java to provide free and universal access to an coders specially tuned for speech signals [15], the MPEG
array of signal processing functions that can be used for coder gets its compression without making any assumptions
research and education [2]. All signal manipulation about the nature of the audio source. This algorithm employs
functions appear in J-DSP as “blocks” that are brought into the psychoacoustic model-II to distribute the unavoidable
the simulation environment by a drag-n-drop process. distortion in such a manner that it is hidden from the human
Signal and data flow is established by linking the blocks. auditory perception system. Although one obvious approach
The basic J-DSP environment targeted engineering will be to simply remove the signals hidden under the
algorithms for signal processing. J-DSP contains several stronger signals, the MPEG-1 algorithm includes the weaker
basic as well as advanced DSP functions. Basic J-DSP signal, but with a higher degree of distortion to it.
functions include sampling, convolution, Fast Fourier
Transform (FFT), digital filter design and arithmetic 1.1. The MP3 Algorithm
operations [3-5]. Advanced functions include statistical DSP
algorithms, speech processing functions, multirate signal The MP3 compression standard appeals to students because
processing functions and spectral analysis functions [6-7]. it is associated, along with the AAC (Advanced Audio
In addition, it is supported by toolboxes for image Coding) standard, with perhaps the most successful product
of the last decade, i.e., the Apple iPodTM. The MP3
algorithm is efficient and can be used to compress the size of
an audio file dramatically and the difference of quality
between the original audio and the compressed audio is
often not perceptible. Furthermore, Huffman coding [13]
yields a variable length code (i.e.) the length of code is
varied based on the data statistics. In addition, the MP3
algorithm can change its time/frequency resolution through
the use of different window functions that allow better
transient sound control. These features make the MP3
algorithm attractive in a variety of applications.
The MP3 encoder [17] consists of the following
modules: (a) polyphase filter analysis, (b) Modified Discrete
Cosine Transform (MDCT), (c) psychoacoustic model, (d)
Huffman coding and (e) bit-allocation. The MPEG-1 audio
codec uses a time-frequency analysis block to extract Figure 2. The block diagram in the MP3 decoding processes.
parameters from the time-domain input signal that are
suitable for quantization and encoding based on perceptual converted the library into an applet that is compatible with
criteria. The input audio stream is passed through a filter Java-DSP and embedded into the software. The Java-DSP
bank that divides the audio input into 32 uniformly spaced MP3 decoder includes a number of features which can
frequency subbands. The psychoacoustic module analyzes display the characteristics of the different modules involved
the signal, and computes the noise and signal masking as a in the encoding and decoding schemes. Several visualization
function of frequency. In order to achieve varying temporal tools are incorporated to help students understand the
and frequency resolutions, based on the signal statistics, the working of each of these modules and the parameters used.
outputs of the filter bank are processed using the MDCT. The decoder interface reads and displays the characteristics
Finally, the bit allocation process determines the number of in the header of the input MP3 file, and includes the facility
bits allocated to each of the subbands and the data is to perform frame-by-frame processing.
compressed to lower bit rates without loss in perception, and
all of this information is added into the bitstream [16-17]. 2.1. The MP3 Decoder Interface
It is important to note that the decoding of the MPEG-1
bitstream is simple enough to allow single-chip, real-time Figure 2 shows the window dialog of the MP3 decoder
decoder implementations. In this paper, we present an interface. The interface presents the user with a drop down
interactive interface developed in the Java-DSP software for menu, which has options to display the block diagram of the
study and analysis of the MP3 decoder. Note that in [18], encoder, the decoder or the synthesis plots. The execution of
Java-DSP modules for analysis of the MPEG-1 the decoding can be performed in either of the two modes:
psychoacoustic model were presented. Figure 1 illustrates (a) Auto running, (b) User preference. In the “Auto running”
the proposed interface that displays several mode, the frames will automatically move forward slowly
outputs/parameters pertinent to the algorithm. In this paper, with a pre-defined delay. Instructors can take advantage of
we present several exercises developed for the the pause to address the characteristics of each frame using
undergraduate DSP class. The exercises expose students to the six plots. In the “User preference mode”, users can
the polyphase filter banks and use of different windows for choose a specific frame number and the interface loads 10
adapting the time and frequency resolution. frames in addition to the current frame (5 previous and 5
later frames). This helps in analyzing the characteristics of
2. ORGANIZATION OF THE JAVA MODULES neighboring frames.

In this section, we describe the features of the MP3 decoder 2.2. The Impulse Response of the Polyphase Filter
interface as implemented in Java-DSP. We use the Java-DSP
environment because it can be easily used with several Efficient coding performance of a perceptual coder depends
graphical functions which allow students to experiment with on the match between the properties of the designed filter
the algorithm. The plots are used to illustrate the details bank and the characteristics of the input signal [20]. The
behind the encoding and decoding processes. The proposed filter bank used in the MP3 encoder is critically sampled or
interface allows frame-by-frame synthesis and provides maximally decimated. The impulse response of the
access to the parameters evaluated at the encoder. prototype low pass filter is shown in Figure 3. Each of the
This MP3 decoder implementation is based on the open filters in the polyphase filter bank in has a modulated
source software application, JLayer1.0 [19]. We have
Figure 6. Frequency components obtained using a long
Figure 3. Impulse response of the low pass filter. window.

Figure 4. Frequency responses of the filters.

Figure 7. Frequency components obtained using a short


window.
All zeros
2.4. Tonality

The masking characteristics of tonal and non-tonal


components are different and hence it becomes essential to
separate them. The tonality index is hence computed as a
function of frequency, which indicates the tonal nature of the
spectral component. The tonality index in each partition is
Figure 5. Tonality of the components in the 23rd subband.
calculated based on whether the signal is predictable from
impulse response. The filters provide good time resolution the spectral lines in the two previous frames. Tonal
with a reasonable frequency resolution. A cosine modulated components are highly predictable and hence have a
pseudo-QMF M-band filter bank has a nearly perfect relatively higher index. The psychoacoustic model estimates
reconstruction and has uniform, linear phase channel the JND (Just Noticeable Distortion) and removes most of
responses. The prototype impulse response contains 512 the high frequency components. Most often, these high
samples. The modulation is done by multiplying the impulse frequencies are irrelevant to human perception. Therefore,
response with a cosine of different frequency to provide a the frequency components in the middle to high frequency
frequency shift. subbands are mostly zeros. Furthermore, it computes a
likelihood measure known as the tonality index which
2.3. Frequency Responses of the Polyphase Filters determines if the component is more tone-like or noise-like.
Figure 5 represents the tonality in the 23rd subband.
Figure 4 illustrates the frequency responses of the filters.
The response of each subband is a modulated version of the 2.5. Window Switching
prototype response, with a cosine term to shift the lowpass
response to the appropriate frequency band. The center The MP3 algorithm needs to use several windows in order to
frequencies of the subbands are the odd multiples of π/64 accommodate non-stationarities. The psychoacoustic model
and the band width of each band is π/32. Though the detects the pre-echo condition and chooses short windows
analysis subband filter bank has very good characteristics, for better time resolution and long windows for improved
the lack of sharp cutoff frequencies in the subband filters can frequency resolution. The type of window determines the
cause the spectral component of a tone to leak into an number of coefficients in the MDCT. Figures 6 and 7
adjacent subband. Furthermore, the equal widths of the demonstrate cases (different subbands) where long windows
subbands do not reflect the human auditory system’s and short windows were applied respectively. The number of
frequency dependent behavior. frequency components also determines the frequency
resolution. When long windows are used for sharper
Figure 10. Window switching and transition.

In the Java-DSP MP3 decoder interface, users can see


six windows presented in the window switching plot. In fact,
one frame (1152 samples) of a signal is divided into 32
subbands across frequency in the polyphase filtering process
and it generates 36 time-domain samples in each subband.
Two granules are formed by having 18 samples of each
subband to form 576 time-frequency samples. The
windowing process is applied to each granule. Therefore,
each frame has two windows. We take into account
Figure 8. The Four types of windows.
switching across three frames to demonstrate window
switching between granules in each frame. The current frame
and two previous frames are stored and six windows are
illustrated in Figure 10.

2.6. Output waveform plot

Once the decoding process is complete, the PCM audio


wave file is reconstructed successfully. Figure 11 shows the
final output waveform and we can use the WaveRead block
in Java-DSP (which we do not describe in this paper) to see
Figure 9. Three successive short windows.
the original wave signal and we can compare the difference
between the original signal and the reconstructed signal. In
transients, the signal is coarsely quantized. In the time the case of stereo, the output waveform is changed according
domain, the quantization noise spreads over the entire block to which channel users select.
and it would create a perceptible noise at the beginning of
the audio. Pre-echo can be controlled by detecting such 2.7. Frequency Domain Output
transients and making a decision to switch to short windows.
To maintain perfect reconstruction of MDCT, the short and This plot shows students the output waveform in the
long windows are not switched instantaneously. For this frequency domain. Users can visualize the regions with the
purpose, two transition windows, start and stop are provided. most energy in the frequency domain. Figure 12 illustrates
The windows used in MDCT window switching are an example where more frequency components lie in the low
illustrated in Figure 8 and Figure 9. It can be observed that frequency region. Only few frequency components are
the size of a short block is 12 samples, whereas that of a located in the high frequency region. The signal displayed
long block is 36 samples. In the short block mode, three changes based on the channel selected.
short blocks with 50% overlap are used. Switching from a
long to short window includes a transition start window 3. UTILITY IN EDUCATION AND ASSESSMENT
while a stop window is placed in the transition from a short
A variety of exercises are grouped into four laboratories for
to a long window. Each block of data is processed by
use in an undergraduate signal processing course. These
MDCT contains 36 samples.
laboratories include analysis of the polyphase filter bank,
Figure 11. Output waveform plot.

Figure 13. An example Java-DSP simulation that performs shaping


of the white noise spectrum with the subband filter response used
in the MP3 algorithm.

students to measure the length of the different windows and


understand the reason for using short windows. We use this
exercise to explain the pre-echo occurrence and the way
Figure 12. Frequency Domain Output in dB. short windows are used to mitigate pre-echo.

MDCT, window switching and tonality. They can 3.3 Exercise-3: MDCT Frequency Components
complement the teaching material, to introduce
fundamentals of the MP3 algorithm, before the students This exercise illustrates the relationship between windows
actually study the complete MPEG-1 encoding/decoding and frequency components. It requires students to observe
system. Furthermore, the paradigm of audio coders can serve the number of frequency components based on different
as an interesting application of the signal processing windows, different frames, and different sound files. In
concepts taught in the course. Though the proposed decoder addition, students can select different subbands and observe
interface can be used a standalone tool, the J-DSP that most of the significant frequency components are
environment allows students to perform further analysis by present in low to mid-frequency subbands.
including other signal processing functions. Figure 13
illustrates a sample J-DSP code that performs shaping of the 3.4 Exercise-4: Window Switching
white noise spectrum using a subband filter from the MP3
decoder interface. We will have students from the This exercise describes the transition of window switching in
undergraduate DSP class at Arizona State University to neighboring frames. This experiment gives students an
perform these exercises and evaluate the proposed interface. understanding about the order of switching windows.
Different orders of windows result in different frequency
3.1 Exercise-1: Polyphase filter analysis components.

This exercise describes the design of the analysis filter bank


used in the MP3 encoder. A low pass prototype filter is first 4. CONCLUSIONS
developed; when shifted appropriately in the frequency
domain, band pass filters and high pass filters can be This paper presented tools for the analysis of the MP3
generated. In addition, students will use the function blocks encoding and decoding process by using a sophisticated
in Java-DSP to build a polyphase filterbank from scratch. Java-DSP MP3 decoder interface. A variety of modules in
the encoder and the decoder were explained through several
3.2 Exercise-2: MDCT Windows plots embedded in the interface. A few laboratory exercises
have been developed specifically for use in EEE 407 DSP
In this exercise the idea of applying different windows for class at ASU. These lab exercises show students the main
use in the MDCT process is demonstrated. It requires ideas of each of the modules and their role in the
compression algorithm. Also a pre-quiz and post-quiz will [11] Y. Ko, T. Duman, and A. Spanias, “J-DSP for
be given to examine what students have learned about MP3 communications,” 33rd ASEE/IEEE FIE Conference, Boulder,
by using the Java-DSP MP3 simulation tool. November 2003.

[12] K. Ramamurthy, A. Spanias, L. Hinnov, and J. Ogg, “On the


use of J-DSP in Earth systems,” Proceedings of ASEE Annual
5. ACKNOWLEDGEMNETS Conference and Exposition, Pittsburgh, PA, June 2008.

This work has been supported in part by the NSF Phase 3 [13] A. Spanias, Ted Painter, and V.Atti, Audio Signal Processing
grant award 0817596 and the ASU SenSIP consortium. and Coding, Wiley, 2007.

6. REFERENCES [14] J. J. Thiagarajan and A. Spanias, Analysis of the MPEG-1


Layer-III Algorithm using MATLAB Software (Book in final
[1] The Java-DSP web-page [on-line]; MIDL LAB, Arizona State preparation).
University: http://jdsp.asu.edu.
[15] K. N. Ramamurthy and A. Spanias, MATLAB Software for the
[2] A. Spanias, Digital Signal Processing; An Interactive Code Excited Linear Prediction Algorithm: The Federal Standard-
Approach, ISBN: 978-1-4243-2524-5, Publisher LuLu.com, 1016, Morgan & Claypool publishers, 2010.
September 2007.
[16] ISO/IEC JTC1/SC29/WG11, “Information Technology –
[3] A. Clausen, A. Spanias, and A. Xavier, “A Java signal analysis Coding of Moving Pictures and Associated Audio for Digital
tool for signal processing experiments,” in Proc. of 1998 IEEE Storage Media at up to about 1.5 Mbit/sec–IS 11172-3:Audio”,
International Conference on Acoustics Speech and Signal 1992.
Processing, pp. 1849-1852, vol. 3, May 1998, Seattle.
[17] K. Salomonsen, S. Sogaard, E.P. Larsen, “Design and
[4] A. Spanias et. al, “Development of a web-based signal and Implementation of an MPEG/Audio," Dept. Comm. Tech., Inst.
speech processing laboratory for distance learning,” ASEE Elect. System, 1997.
Computers in Education Journal, pp. 21-26, vol. X, no.2, April-
June 2000. [18] Y. Song, A. Spanias, V. Atti, and V. Berisha, “Interactive
Java modules for the MPEG-1 psychoacoustic model,” in Proc. of
[5] A. Spanias, and V. Atti, "Interactive on-line undergraduate 2005 IEEE International Conference on Acoustics, Speech, and
laboratories using Java-DSP," in IEEE Trans. on Education Signal Processing, pp. 581-584, vol 5, March 18-23, 2005.
Special Issue on Web-based Instruction, pp. 735-749, vol. 48, no.
4, Nov. 2005. [19] [J]Zoom web-page: http://www.javazoom.net

[6] V. Atti, and A. Spanias, "On-line simulation modules for [20] R. Rangachar, A, Spanias, “A simulation tool for introducing
teaching speech and audio compression," in Proc. of IEEE MPEG-audio (MP3) concepts in a DSP course”, in Proc. of 2002
Frontiers in Education (FIE-2003), pp. T4E-17 - T4E-22, vol. 1, IEEE International Conference on Acoustics Speech and Signal
Nov 5-8, 2003, Boulder. Processing, pp. 4116-4119, vol. 4, May 2002, Florida.

[7] V. Atti, A. Spanias, C. Panayiotou, and Y. Song, "Teaching


digital filter design techniques used in high-fidelity audio
applications," in Proc. of ASEE-2004 Conference, June 20-23,
2004, Salt Lake City, Utah.

[8] M. Yasin, L. J. Karam, and A. Spanias, ”On-line laboratories


for image and two-dimensional signal processing,” in Proceedings
of IEEE Frontiers in Education (FIE–2003), Nov. 2003, Boulder.

[9] T. Thrasyvoulou, K. Tsakalis and A. Spanias, "J-DSP-C, A


Control Systems Simulation Environment For Distance Learning:
Labs And Assessment," in Proceedings Of 33rd ASEE/IEEE
Frontiers in Education Conference, Boulder, CO, November 5-8,
2003.

[10] M. Zaman, A. Papandreou-Suppappola and A. Spanias,


“Advanced concepts in time-frequency signal processing made
simple”, in Frontiers in Education Conference, November 2003.

View publication stats

You might also like