On Room Correction PDF
On Room Correction PDF
On Room Correction PDF
M A T H I A S J O HA N S S O N , D I R A C R E S E A R C H A B
In this note I discuss some issues in filter design for equalization of sound systems. The
emphasis is on rationale, not on experiments, and I will focus on a few common
misunderstandings. I will briefly describe the basic concepts used in sound equalization, such as
FIR and IIR filters, minimum and linear phase and present basic mathematical facts as well as
give a background to the philosophy behind the Dirac Live approach. To limit the length of the
text, I assume some basic familiarity with the topics covered. I will however refrain from the
popular trend in some engineering periodicals of hiding bad ideas behind complex-looking
equations. As already mentioned, the emphasis is on the logic of different approaches to
equalization rather than experiments. Logic can be checked by the reader, whereas experiments
carried out by others always leave room for doubt concerning experiment conditions.
Furthermore, as probability theory teaches us, one lucky experiment shows nothing about the
underlying rationale, but the underlying rationale will be much more indicative of future
experimental outcomes.
Note: Parts of this material were presented by the author at the 123rd AES convention in NYC on the
workshop FIR or IIR? That is the question!.
Dirac Research AB
equations turn ugly. It is a non-linear problem, and it becomes mathematically messy to find the
right response. However, just because it is difficult doesnt mean it is impossible, so this is more
of the lazy mathematicians excuse for staying away from IIR filters. (But this excuse may still be
relevant when designing a filter in a processor of limited capacity.)
So, why is it that most simple audio equalizers still only use IIR filters? One reason is that the
filters are typically set manually, and there are closed-form formulas for constructing several
simple filter types such as peak/notch filter, lowpass/highpass, shelving, etc. You can easily
adjust the Q in real-time and listen to the difference. IIR filters have another strength: less
coefficients are needed for a given slope of e.g. a lowpass filter as compared to an FIR. The
reason is the recursion. It is like getting several filters working in succession for the price of one.
FIR filters are easier to implement (and can run on fixed-point architectures), are more flexible
to work with mathematically, can have arbitrary phase responses, but require more coefficients
for a given steepness of the filter. This last point is important when it comes to resolution in the
low-frequency area. Intuitively, the length of the FIR filter has a relation to the impact it can have
at a certain wavelength of the signal. With a sufficiently long FIR filter however, much speaks in
favor of using them instead of the numerically more cumbersome IIR cousin.
In summary, within a very good approximation we can do the same thing with IIR filters as with
FIR filters. The choice of either structure is more of an implementation issue and also to some
extent one of application. It turns out that in order to do a proper impulse response correction of
a set of acoustic transfer functions, a rather long filter is needed and little seems to be gained by
using IIR filters. On the other hand, if our processor budget only allows a low-order filter (maybe
10 biquads or 20-40 FIR taps) then impulse response correction is not possible, and then it is
wiser to focus on the magnitude response and use minimum-phase biquad filters.
Finally, it should be noted that impulse response correction requires a non-causal part, which
cannot be modeled well by an IIR model. An IIR filter can however model the causal part of a
well-design equalizer filter, which means that one of the most efficient mixed-phase equalizer
implementations consists of a combination of IIR and FIR filters.
BACK TO PHASE
A system with a linear phase response has a phase response that is a straight line when drawn as
a function of frequency (on a linear frequency scale). This also means that the system introduces
Dirac Research AB
a constant delay at all frequencies. However, as mentioned above, a linear-phase filter always
has a symmetrical (or anti-symmetrical) impulse response. This means that if such a filter has a
non-flat magnitude response it will also introduce pre- and post-ringing in the time domain. The
danger of this is sometimes exaggerated among workers in the field. In practice, if the magnitude
response of a linear-phase filter is smooth (which an equalization filter should be) the preringing will be negligible.
At this time we must also discuss the concept of a minimum phase system. This is a very peculiar
system that has the least energy delay of all systems with a given magnitude response. (In the
complex plane a minimum-phase system has all zeros and poles inside the unit circle.) It is very
important to emphasize that this does not in any way mean that the impulse response is the
shortest of all systems with that magnitude response. It means that it is earliest; the difference is
profound. That it is early is a useful feature in applications where latency needs to be kept to a
minimum. For sound quality, there is no evidence that a minimum-phase equalizer sounds
better than say a linear-phase one. In fact, a linear-phase filter typically has a shorter impulse
response than a minimum-phase one. The term minimum-phase is misleading and much would
be gained if the audio community would instead adopt the term minimum energy delay. The
exact definition is as follows: Consider a minimum-phase impulse response and an arbitrary
other linear system with identical magnitude response. Compare the accumulated energy at the
outputs of the two systems at any time t. The accumulated energy from time zero until t will
always be larger for the minimum phase system than the other system. This implies that the
response must start early, for otherwise we can always construct another system that at time t
close to zero has transferred more energy than the minimum-phase filter. Note that a minimumphase system does not imply zero latency, only lower latency than other filters.
A well-known theorem from complex analysis states that any linear time-invariant system can
be factored into two systems: a minimum-phase system and an all-pass system. If we design an
equalizer that perfectly inverts the minimum-phase system we have thus made the magnitude
response flat (this might not be our ultimate goal, but lets assume so for a moment. It does not
make any difference in this context.), but what has happened to the impulse response? On
occasion, I have heard competent people say that we have at least not made it worse, since we
took care of the minimum-phase part (the inverse of a minimum-phase system is itself a
minimum-phase system and since there is only one for any magnitude response we have
automatically taken care of it). The fallacy of this statement is obvious: the all-pass system may
have a much worse impulse response than the complete system. If you take away the minimumphase factor you may have taken away the features that removed severe ringing in the all-pass
factor. The factorization we chose is completely arbitrary and has no physical interpretation at
all. We could just as well have chosen a different factorization.
What we need to do is look at the total system response. If we factor it, wed better make sure to
use a factorization that is meaningful also from a physical or psycho-acoustical standpoint and
evaluate the final result in terms of the full response. A commonly employed model that has a
well-founded physical motivation is to model transfer functions at different listening positions
within a room as having a set of common poles and a position-variable transfer function
consisting only of zeros. So far so good. The next step is to say: ok, we want to make a robust
correction and therefore we will only invert the poles. Here the error. Even though they are
indeed common to all positions, it makes no sense to just look at them in isolation from the
position-dependent zeros! If you remove poles in the low-frequency region close to the unit
circle, it does not mean that you have improved the over-all response. First, the poles affect not
just the region of the unit circle which they happened to be closest to. They also help to keep
down the gain at frequencies far away from that region! Similarly, the position dependent zeros
Dirac Research AB
close to the high-frequency region of the unit circle will now boost the energy in the lowfrequency region.
The lesson from both these examples is that even though a certain factorization makes sense
physically or for ease of mathematical consideration, it is dangerous folly to look at each factor
as an isolated system and believe that if you improve that factor the whole system will fare
better. However, as elementary logic teaches us: a false premise implies any proposition, false or
true. So you can produce examples which will sound great on occasion even with a faulty
rationale. But that does not mean that with a better rationale you will not be able to do even
better.
boost any dip, because they are typically due to reflections. So Mr A removes his equalizer filter
and lets Mr B listen again. Mr B, however, is still not happy. It is better, but its not good. There
is something hollow about the sound. At this time Mrs D enters the conversation. Shes been
listening, sitting quietly in a corner of the room, and says: Mr A was wrong because he forgot
about the time domain. Looking only at the magnitude of the Fourier transform and interpreting
it as strongly related to our concept of frequencies, he thought that he could boost that region
and obtain better sound. The problem is that he uses minimum-phase filters and consequently
adds energy at that frequency early in time. But if we only look at the direct wave there is no
hole to be filled in the frequency response. The hole never exists if we look at a short window at
any time. Mr B frowns: So Mr C was right to say that we cannot do anything about it. But if
thats the case, why do I still hear a strange sounding oboe on my recording? Mrs D looks
sternly at him: Mr C was wrong too. The problem is due to the time domain properties; the
reflection causes the problem and it can only be corrected for by a time-domain approach. If we
design a filter that reduces the reflection, you will end up with the interesting result that the hole
will be gone and the oboe will sound more natural. But, Mrs D adds, dont take this example
as evidence that you can always correct dips this way! In this case it was possible, because all
positions experienced the same problem.
The fallacy to uncritically view the magnitude of Fourier transforms as an accurate description
of what we perceive (and thereby ignore time) is evident in debates on what a good acoustical
magnitude response should be like. The answer depends on the time-domain character of the
room that is being measured. A magnitude response that sounds good in a large hall may sound
terrible inside a car, as the reverberation ratios are completely different.
The basic problem with boosting dips is that it is typically done using a minimum-phase
equalizer. Thus, it injects energy at the wrong time. The same is obviously also true for a linear
phase filter or any other filter that does not consider what the impulse response of the total
system becomes. The lesson is: Dont mix up Fourier transforms with perceived frequency
responses. Perceived frequency responses are time-dependent. A joint time and frequency
analysis is required in order to design a good equalizer. In addition to this, the problem of
spatial variations must be considered carefully. A zero at one measurement point may be just
inside the unit circle but when moving the microphone it may end up on the outside. In such a
scenario it would be disastrous to look at the mean behavior (a zero on the unit circle) and
optimize that response. A correction which is good for the mean response is very different from
a correction which on average is good for any one of the measured responses. Zeros that are
located outside the unit circle but at approximately the same spot regardless of microphone
position are robustly invertible, others are not. (And again, this does not mean that we consider
one zero isolated from the rest of the response.) The following zoom-in on a zero plot shows
these properties. Measurements were taken on different positions in a good listening room using
good (Genelec) speakers. The zeros have been drawn using different sizes for different
measurement positions, in order to see how they move with respect to spatial movements.
Dirac Research AB
Just above 200 Hz we see a zero moving from minimum-phase to mixed-phase behavior. The
average lies just inside the unit circle so a careless minimum-phase inversion would be able to
cause a lot of serious time-domain ringing here. Between 150 and 200 Hz we see a zero outside
the unit circle that moves a lot with position. As it is close to the unit circle and moves, a robust
correction would not try to alter this behavior. The zero outside the unit circle between 100 and
150 Hz is however possible to get rid of without causing any problem at any position (the
distance to the unit circle is rather large and the variation with position is small).
Two different equalizers were designed based on these measurements. The following figure
shows how much of the total accumulated energy has been transferred at any time for the
impulse response of the original system, the system compensated with a minimum-phase
correction, and the system compensated with a mixed-phase correction. The plot was generated
taking new measurements at different listening positions than the original measurements used
for designing the filters. It should be emphasized that the minimum-phase correction uses
basically the same robust methodology as the mixed-phase inverse except that it does not
consider the zeros outside the unit circle. The upper plot shows the full bandwidth, the lower
one the responses up to 300 Hz. These are basically energy step responses, a better system will
thus have a faster rise time and ideally touch 1 immediately. It is clear that even in a good
listening room with good speakers a substantial improvement is possible using a careful mixedphase design. The minimum-phase filter is clearly doing a good job as well; not nearly as good as
the mixed-phase design it nevertheless improves the time-domain behavior. In a large welldesigned listening room the impulse response would preferably be nearly minimum-phase and
that is also the case in this room. Therefore we can cause improvements by just using plain
minimum-phase filters. This unfortunately does not carry over to trickier environments such as
car cabins. It should finally be mentioned that the pre-ringing using the mixed-phase filter was at
its highest level 60 dB below the peak in the impulse response. (Again, at other listening
positions using real measurements.) We can therefore safely conclude that mixed-phase
inversion is useful and can improve sound system performance without loss of robustness.
Dirac Research AB
We have shown that mixed-phase equalizers can improve an already good situation while
increasing robustness as compared to minimum-phase filters. Even if the perceived difference in
this specific case would not be huge for an average listener, the importance of this example lies
in the fact that the equalizer is more robust than the minimum-phase one (as the minimumphase filter does not consider the full impulse response) while at the same time it shows that
decay times of impulse responses can be reduced by turning away from minimum-phase
responses. The only disadvantage is non-zero latency, but if this is no problem, I would
recommend the mixed-phase inversion as it is simply a safer method. (As a side remark, in the
method used in the examples above a parameter controls the amount of pre-ringing vis--vis the
amount of post-ringing and thus we can reach an even better system response if listening tests
reveal that people are insensitive to pre-ringing of up to say -30 dB.)
Dirac Research AB
yield two identical recordings (assuming the room is well-damped). Both sound awful. The
second source interferes with the first one. Taking the magnitude of the Fourier transform of
both recordings reveals that they are identical and have the shape of a comb; deep notches
permeate the spectrum. Again, we find that the Fourier transform is a poor representation of our
hearing sensation. This time the main reason is that a basic Fourier transform does not
distinguish between different angles of incidence.
Let us now consider a traditional stereo loudspeaker set-up in a listening room. If the two
sources have different transfer functions then they will change the correlation properties of the
Left and Right signals. The signals at the ears are no longer correlated in the same manner as
they were on the recording. If the difference is big, the L and R signals may have been completely
decorrelated. In that case, even when L=R at the sources we hear two distinct sources instead of
the expected phantom in the middle. The sound stage becomes diffuse. If we now do a proper
impulse response correction of the individual transfer functions from the left speaker and the
right speaker then we can reconstruct the original correlation properties from the recording and
the sound stage will become distinct and coherent again. Unfortunately, the reverse situation is
sometimes true in alleged room correction systems employing high-order filters. Instead of
improving the correlation of the L and R signals, they unwittingly decorrelate the sound field.
The reason is that a high-order filter has a long impulse response that needs to be properly
controlled. If we build an inverse filter based on a simplistic average model of the original
transfer function measured in different listening positions, we might end up correcting for issues
late in the tail of that model that are not in accord with perception. Consider the following plot
taken from a rather poor, but large, listening room, using a very good speaker, measured in 12
different positions corresponding to two different seats in a sofa. The plot is a Fourier magnitude
response based on a simple delay-and-average response, i.e. the impulse responses were aligned
in time and then averaged (i.e. beamforming), before taking the Fourier transform.
Note the amount of ripple, in particular in the high-frequency region. (If we plot this data using
fractional octave smoothing, we will not see the ripple, but it is still there and will affect the final
filter unless we are careful.) The problem with this ripple is that it is not a property of the
frequency response that we perceive, only of the Fourier frequency response which does not
take spatial angle nor time into account. The ripple is inaudible, because it is only a property of
Dirac Research AB
the late reflections in the room (what can be regarded as late is of course frequency-dependent).
However, let us now assume we build an equalizer that inverts this Fourier spectrum.
Regardless of whether it is minimum-phase or not, it will still have similar variability in the highfrequency region which manifests itself in the time domain as additional high-frequency garbage
in the impulse response (spread out over time). If we now use different equalizers on the left
and on the right speaker, we end up with two widely different transfer functions that can have
any arbitrary spatial effect, typically in the form of a weak decorrelation. The nave averaging led
us to compensate for issues that are (a) nonexistent in any single listening position and (b)
psycho-acoustically irrelevant. It is well worth noting that correcting for something inaudible
may lead to something audible.
Based on the same twelve measurements we used to produce the plot above, we can form
estimates of more psycho-acoustically relevant representations by using different types of timefrequency analysis. The two following plots show two alternative representations, both in the
frequency domain, but psycho-acoustically relevant for different purposes. The first one has
been subjected to a processing which tries to resemble the human perception of transients in a
reverberant room. As we can see, the high-frequency ripple is gone but still some rather
powerful dips are present below 1 kHz. In somewhat simplified terms, we have removed late
reflections that are not from the same direction as the direct wave.
However, this picture does not give an adequate picture of what we perceive when listening for
instance to a string quartet. The late reflections will still add coloration that affects our listening
experience on more stationary harmonic signals. For that reason, a plot which gives a reasonable
estimate of our appreciation of stationary frequency-domain coloration looks something like the
following.
Dirac Research AB
My claim is that both these plots give a meaningful picture of what we actually perceive in the
listening room. The former plot for how we perceive transients. The latter for how we perceive
stationary harmonic signals. That is to say, we should not base a magnitude response correction
on the former, and we should not base an impulse response correction on the latter.
The key improvements that we obtain when using a mixed-phase correction lie in stereo and
multi-channel reproduction. Spatial information gets distorted when the left and right transfer
functions are different. Since room responses are non-minimum-phase, we need a mixed-phase
inverse in order to reestablish the correct spatial image. The difference between a proper mixedphase inverse and a minimum-phase one becomes bigger in a poor listening space, such as in a
car, in particular where the left and right transfer functions differ appreciably. It should be noted
that the difference in sound quality is typically not subtle. Although there are certainly cases
where differences are small, that is when the major room response variations can actually be
corrected for by a minimum-phase filter, the typical difference is substantial. However, as
previously warned, it is difficult to do a proper impulse response correction since one very easily
ends up with a decorrelation filter (and/or pre-echoes). That can however be detected: in
listening tests the spatial image becomes diffuse (as opposed to distinct) and the sound may
even be perceived as coming outside the region between the two speakers which is a clear sign
of phase correction gone bad; by plotting the frequency response of the filter in high resolution,
the filter should not have any fast variations in the high-frequency region, this is a sure sign of
trying to correct for properties of the late tail in the impulse response. Fast variations in the
magnitude response are indicative of similar variations in the phase response, and it is highly
implausible that phase correction of room responses should be useful at high frequencies (since
what would constitute a correction in one physical position is sure to cause a degradation just a
few centimeters away according to the laws of wave propagation).
The end advice is that mixed-phase correction is absolutely useful and is in my mind the correct
route forward to improved sound reproduction, but that it is much more difficult than a reading
of the last twenty years of research on impulse response correction would lead one to think.
Dirac Research AB
10
Dirac Research AB
11
If your DSP budget allows you to use high-order filters my recommendation is to use a careful
mixed-phase design. A well-designed mixed-phase filter gives a faster system response than can
be achieved by minimum-phase or linear-phase filters. But again, the word careful is important.
Some algorithms for designing high-order filters (be it minimum-phase or mixed-phase) will
have a hard time staying away from adding spurious high-frequency ripple in the tail of the
impulse response. This affects stereo perception in a negative way (spreading of virtual
sources). Pre-echoes can also be introduced if the algorithm tries to do too much. In short,
mixed-phase designs have a better potential both performance-wise and robustness-wise, but
that potential is considerably more difficult to achieve. (For a subwoofer channel, however,
minimum-phase inversion will typically be sufficient as the combined room and subwoofer
transfer function actually is well modeled by a minimum-phase system at such low frequencies.)
A mixed-phase design can be realized as an FIR filter or as an IIR filter, but typically the most
efficient implementation is a mix of the two.
As described above, a proper mixed-phase filter design is similar to removing reflecting surfaces
near the loudspeaker. In addition, it minimizes linear imperfections in the loudspeaker design
itself. Digital mixed-phase filters therefore make a powerful and cost-effective complement to
ordinary electroacoustic considerations in the design of loudspeaker systems and rooms.
ACKNOWLEDGEMENTS
This note represents the views of the author, but is really an accumulation of insights that have
appeared over many years in internal discussions at Dirac Research where several people have
contributed. Lars-Johan Brnnmark has provided the experimental data used in this note. My
appreciation of the subject has been highly influenced and enriched by discussions with him and
others at Dirac.
Dirac Research AB
12