Guckert Audio Compression SVD MDCT MP3 PDF
Guckert Audio Compression SVD MDCT MP3 PDF
Guckert Audio Compression SVD MDCT MP3 PDF
Audio Compression
John (Jake) Guckert
Math 2270
Spring 2012
Overview
● Overview of MP3 codec
● MP3 Encoding Algorithm
● What is the Fast Fourier Transform?
● How is the FFT implemented in MP3 encoding?
● What is the Modified Discrete Cosine
Transform?
● How is the MDCT implemented in MP3
encoding?
● Summary
Overview of MP3 Codec
● MP3 stands for MPEG-1 Audio Layer III.
● Created in 1994 by the Moving Pictures Expert
Group (MPEG).
● It is a lossy compression, meaning that not all
of the original data is preserved after the
compression algorithm is finished.
● MP3 is also based on ideas from the field of
psychoacoustics. The idea is that the the
human ear can only discern sounds from 20Hz
to 20KHz, so any data outside of this threshold
can be discarded to make the file smaller.
MP3 Encoding Algorithm
● The overall algorithm is broken up into 4 main
parts.
● Part 1 divides the audio signal into smaller
pieces, these are called frames. An MDCT filter
is then performed on the output.
● Part 2 passes the sample into a 1024-point
FFT, and then the psychoacoustic model is
applied. Another MDCT filter is performed on
the output.
MP3 Encoding Algorithm Cont.
● Part 3 quantifies and encodes each sample.
This is also known as noise allocation. The
noise allocation adjusts itself in order to meet
the bit rate and sound masking requirements.
● Part 4 formats the bitstream, called an audio
frame. An audio frame is made up of 4 parts,
The Header, Error Check, Audio Data, and
Ancillary Data.
The Fast Fourier Transform
● The FFT is an algorithm that computes the
Discrete Fourier Transform and its inverse.
● The FFT produces the exact same result as
evaluating the DFT directly, but the FFT
produces an answer much faster.
● In general the DFT is found by using the
equation:
N −1 −i2 π k
n Where X0...XN-1 are
X k =∑n=0 x n e N
complex numbers and
k = 0... N-1
The FFT Applied to MP3 Encoding
● The FFT is used as a filter bank on an audio
sample. It is used to filter out unwanted or
unneeded data from the sample.
● First, incoming audio samples, s(n) , are
normalized based the following equation x(n):
[∑ )] , 0⩽k ⩽N −1
2
( 2 π kn
N −1
P(k )=PN +10log h (n) x (n )exp − j
n=0 N
●h(n) is a Hann
Window denoted by: (
h (n )=0.5 1−cos
2πn
N −1 )
,0⩽i ⩽N −1
[( )( )]
2N−1
π 1 N 1
X k= ∑ x n cos n+ + k+
n=0 N 2 2 2
The MDCT in MP3 Encoding
● The MDCT limits the sources of output
distortion at the quantization stage.
● It is also used as and analysis filter given by:
h k ( n)=w(n)
√ [
2
M
cos
(2n+M +1)(2k+1) π
4M ]
This is a block function and it extends across
●