DSP Java
DSP Java
Joren Six
The main goal of this text is to bridge the gap between a mathematical
description of a digital signal process and a working implementation. The
text is a work in progress and starts with calculating sound buffers by hand,
then proceeds to illustrate audio output and explains the connection between
the two. Along the way the format of WAV-files are explained. Then it
proceeds with operations on sound. This text is meant to be accompanying
releases of the TarsosDSP audio processing library: it should clarify the
concepts used in the source code.
In short this text should at least get you starting with audio DSP using
Java.
Contents
1 Sampled Sound Using Java 3
1.1 Audio buffers in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 And Then There Was . . . Sound . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Operations on Sound 7
2.1 Sound Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Echo Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Pitch Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Time-Scale Modification in Time Domain . . . . . . . . . . . . . . . . . . 8
2.5 Percussion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
http://tarsos.0110.be/tag/TarsosDSP
1
3 Utility functions 8
2
1 Sampled Sound Using Java
To process sound digitally some kind of conversion is needed from an analog to a digital
sound signal. This conversion is done by an ADC: an analog to digital converter. An
ADC has many intricate properties, making sure no information is lost during the con-
version. For the principle of audio processing the most important ones are the sampling
rate and bit depth [6].
The sampling rate measures samples per second, it is defined in Hertz (Hz). The
Nyquist-Shannon sampling theorem[7] states that you need to sample at twice the max-
imum frequency of the information you want to convey. If lower sampling rates are used
part of the information is lost. Speech is contained in the frequency range from 30Hz to
3000Hz. Applying Nyquist-Shannon, a sampling rate of 6kHz should be enough. Some
telephone systems use 8kHz.
The human ear is capable of detecting sounds between about 20Hz and 20kHz, de-
pending from person to person. Sampling musical signals at about twice the maximum
hearing frequency makes sense. 44100Hz, 48000Hz are common sampling rates for mu-
sical information (20kHz 2 < 44.1kHz).
The bit depth is the number of bits used to represent the value of a sample. Using
signed integers of 16 bits is common practice. The following example shows how these
concepts translate to the Java programming language.
f (x)
To use the sine wave for signal processing it needs to be sampled. On Figure 2
the sampling rate defines the horizontal granularity, the bit depth defines the vertical
3
f (x)
granularity. How to create an array containing a sampled sine wave using Java can be
seen in Listing 1.
After executing the code in Listing 1 the buffer contains a two seconds long pure tone
of 440Hz, sampled at 44.1kHz. Each sample is calculated using the Math.sin function
and is converted to a float following the advice found on http://java.sun.com/
docs/books/tutorial/java/nutsandbolts/datatypes.html:
Pure tones are not commonly found in the wild. A more realistic sound can be
generated by using equation 2. This sound consists of a base frequency and a harmonic
at 6 times the base frequency.
4
f (x)
f (x)
double f0 = 440.0;
5 double amplitude0 = 0.8;
double twoPiF0 = 2 * Math.PI * f0;
double f1 = 6 * f0;
double amplitude1 = 0.2;
10 double twoPiF1 = 2 * Math.PI * f1;
5
Listing 3: Converting floats to bytes
To make the sound audible it can be written to a WAVE file. A WAVE file consists of
a header followed by the sound data. The sound data is nothing more or less than the
PCM format we calculated. The WAVE file header1 format stems from the time that
Microsoft and IBM were still best friends, it is defined in a joint specification[3]. Writing
headers is a bit boring luckily there are a few utility classes available in the standard
Java library in the javax.sound.sampled package which make this task effortless:
Once the WAVE file is stored to disc you can listen to it using about any media player.
This is a bit of a drag so another option is to send the sound to the speakers directly. To
get this working you need a, for the Java subsystem, correctly configured default sound
card2 .
SourceDataLine line;
DataLine.Info info;
3 info = new DataLine.Info(SourceDataLine.class, format);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(format);
line.start();
1
More information can be found on this webpage: http://www-mmsp.ece.mcgill.ca/
documents/audioformats/wave/wave.html
2
By default the Java Runtime provided by Oracle or Sun does not play nice with PulseAudio on
Linux. To alliviate this problem see the tutorial here: http://tarsos.0110.be/artikels/
lees/PulseAudio_Support_for_Sun_Java_6_on_Ubuntu
6
line.write(byteBuffer, 0, byteBuffer.length);
8 line.close();
2 Operations on Sound
Operations on sound are commonly done in blocks. Operations on individual samples
are most of the time not efficient nor practical. E.g. if you would want to estimate the
frequency of audio you would need at least one complete period of the signal, surely
more than one sample.
Chaining operations is also common. An architecture that allows a chain of arbitrary
operations on audio results in a flexible processing pipeline. These concepts are imple-
mented in the TarsosDSP audio library. The following examples are excerpts from that
library and illustrate some of those basic ideas.
f (x)
7
2.4 Time-Scale Modification in Time Domain
TarsosDSP contains an implementation of the time stretching algorithm described in [8],
it can playback audio quicker or slower without affecting changing pitch. Slow playback
is e.g. very practical to transcribe the melody of a song.
2.6 Filtering
In the be.hogent.tarsos.dsp.filters package several frequency filters can be
found. With a high pass filter, audio with frequencies above a certain threshold are
kept. A low pass filter does the reverse, audio with frequencies below a threshold is
kept. Together they can create a band pass filter which can e.g. be constructed to focus
on the melodic range of a song and ignore the rest.
3 Utility functions
3.1 Write a WAV-file
3.2 Audio Playback
3.3 Interrupt a loop
3.4 Fourier Analysis
The FFT implementation used within TarsosDSP is by Piotr Wendykier and is included
in his JTransforms library. JTransforms is the first, open source, multithreaded FFT
library written in pure Java.
This document is a work in progress, for more information see the source code on
https://github.com/JorenSix/TarsosDSP ;).
References
[1] Dan Barry, Derry Fitzgerald, Eugene Coyle, and Bob Lawlor. drum source separa-
tion using percussive feature detection and spectral modulation. In Proceedings of
the Irish Signals and Systems Conference (ISSC) 2005 conference, 2005.
[2] Alain de Cheveigne and Kawahara Hideki. Yin, a fundamental frequency estimator
for speech and music. The Journal of the Acoustical Society of America, 111(4):1917
1930, 2002.
[3] IBM and Microsoft. Multimedia Programming Interface and Data Specifications 1.0.
Microsoft Press, 1991.
8
[4] Eric Larson and Ross Maddox. Real-Time Time-Domain Pitch Tracking Using
Wavelets. 2005.
[5] Philip McLeod. Fast, accurate pitch detection tools for music analysis. PhD thesis,
University of Otago. Department of Computer Science, 2009.
[6] Ken C. Pohlmann. Principles of Digital Audio / Ken C. Pohlmann. Sams, Indi-
anapolis :, 2nd. ed. edition, 1989.
[8] Werner Verhelst and Marc Roelands. An overlap-add technique based on waveform
similarity (wsola) for high quality time-scale modification of speech. In proceedings
of ICASSP-93, pages 554557, 1993.