Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Guckert Audio Compression SVD MDCT MP3 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

The Use of FFT and MDCT in MP3

Audio Compression
John (Jake) Guckert
Math 2270
Spring 2012
Overview
● Overview of MP3 codec
● MP3 Encoding Algorithm
● What is the Fast Fourier Transform?
● How is the FFT implemented in MP3 encoding?
● What is the Modified Discrete Cosine
Transform?
● How is the MDCT implemented in MP3
encoding?
● Summary
Overview of MP3 Codec
● MP3 stands for MPEG-1 Audio Layer III.
● Created in 1994 by the Moving Pictures Expert
Group (MPEG).
● It is a lossy compression, meaning that not all
of the original data is preserved after the
compression algorithm is finished.
● MP3 is also based on ideas from the field of
psychoacoustics. The idea is that the the
human ear can only discern sounds from 20Hz
to 20KHz, so any data outside of this threshold
can be discarded to make the file smaller.
MP3 Encoding Algorithm
● The overall algorithm is broken up into 4 main
parts.
● Part 1 divides the audio signal into smaller
pieces, these are called frames. An MDCT filter
is then performed on the output.
● Part 2 passes the sample into a 1024-point
FFT, and then the psychoacoustic model is
applied. Another MDCT filter is performed on
the output.
MP3 Encoding Algorithm Cont.
● Part 3 quantifies and encodes each sample.
This is also known as noise allocation. The
noise allocation adjusts itself in order to meet
the bit rate and sound masking requirements.
● Part 4 formats the bitstream, called an audio
frame. An audio frame is made up of 4 parts,
The Header, Error Check, Audio Data, and
Ancillary Data.
The Fast Fourier Transform
● The FFT is an algorithm that computes the
Discrete Fourier Transform and its inverse.
● The FFT produces the exact same result as
evaluating the DFT directly, but the FFT
produces an answer much faster.
● In general the DFT is found by using the
equation:

N −1 −i2 π k
n Where X0...XN-1 are
X k =∑n=0 x n e N
complex numbers and
k = 0... N-1
The FFT Applied to MP3 Encoding
● The FFT is used as a filter bank on an audio
sample. It is used to filter out unwanted or
unneeded data from the sample.
● First, incoming audio samples, s(n) , are
normalized based the following equation x(n):

Where N is the FFT


s(n ) length of the sample
x (n )= b−1
N (2 ) and b is the number of
bits in the sample.
The FFT continued
● Second, the masking threshold of the sample is
found by using an estimate of the power density
spectrum, P(k). P(k) is computed by using a
1024-point FFT.

[∑ )] , 0⩽k ⩽N −1
2

( 2 π kn
N −1
P(k )=PN +10log h (n) x (n )exp − j
n=0 N

●h(n) is a Hann
Window denoted by: (
h (n )=0.5 1−cos
2πn
N −1 )
,0⩽i ⩽N −1

●PN is the power normalization term, it is


usually around 96 decibels.
What is the Modified Discrete Cosine
Transform?
● The MDCT is a Fourier related transform based
on type-IV DCT. It has an additional property of
being “lapped.”
● In general it was designed to be performed on
larger, consecutive blocks of datasets where
parts of the blocks overlapped.
● The MDCT is a linear function that has half as
many outputs as inputs.
MDCT cont.

● This linear function transforms 2N real numbers


to N real numbers according to the equation:
2N N
F :R ⇒R

[( )( )]
2N−1
π 1 N 1
X k= ∑ x n cos n+ + k+
n=0 N 2 2 2
The MDCT in MP3 Encoding
● The MDCT limits the sources of output
distortion at the quantization stage.
● It is also used as and analysis filter given by:

h k ( n)=w(n)
√ [
2
M
cos
(2n+M +1)(2k+1) π
4M ]
This is a block function and it extends across

two input blocks at a time.


The MDCT in MP3 Encoding Cont.
● The MDCT performs a series of inner products
between the input data x(n), and the analysis
filter hk(n).
● This eliminates the blocking artifacts that would
cause a problem during the reconstruction of
the sample.
● The inverse MDCT reconstructs the samples
without the blocking artifacts.
M −1
x (n )= ∑ [ X (k )h k (n)+ X P (k )h k (n+M )]
k=0
Summary
● The MP3 encoding algorithm has numerous
complex parts.
● The FFT, DFT, and MDCT play a key role in
encoding audio samples.
● These three transformations also play a role in
other media compression formats, not just MP3.

You might also like