Optimally Rotation-Equivariant Directional
Optimally Rotation-Equivariant Directional
1 Introduction
A wide variety of algorithms in multi-dimensional signal processing are based on
the computation of directional derivatives. For example, gradient measurements
are used in computer vision as a rst stage of many edge detection, depth-from-
stereo, and optical
ow algorithms. The motivation for such decompositions
usually stems from a desire to locally characterize signals using Taylor series
expansions (e.g., [7]).
Derivatives of discretely sampled signals are often computed as dierences
between neighboring sample values. This type of dierentiation arises naturally
from the denition of continuous derivatives, and is reasonable when the spacing
between samples is well below the Nyquist limit. For example, it is used through-
out the numerical analysis literature, where one typically has control over the
sample spacing. But such dierences are poor approximations to derivatives
when the distance between samples is large and cannot be adjusted.
In the digital signal processing community, there has been a fair amount of
work on the design of discrete dierentiators (see e.g., [9]). This work is usually
based on approximating the derivative of a continuous sinc function. The di-
culty with this approach is that the resulting kernels typically need to be quite
large in order to be accurate.
In addition to the diculties described above, these two primary methods of
dierentiation are not well suited for multi-dimensional dierentiation. In par-
ticular, one often relies on the linear-algebraic properties of multi-dimensional
derivatives (gradients) that allow dierentiation in arbitrary directions via linear
combinations of separable axis derivatives3. In the computer vision literature,
? This work was supported by ARO Grant DAAH04-96-1-0007, DARPA Grant
N00014-92-J-1647, NSF Grant SBR89-20230, and NSF CAREER Grant MIP-
9796040 to EPS.
3 For example, the derivative operator in the direction of unit vector u^ is ux @ + uy @ .
@x @y
many authors have used sampled Gaussian derivatives which exhibit better ap-
proximations to these algebraic properties than simple dierences, but less com-
putationally expensive than sinc functions. Danielsson [3] compared a number
of derivative kernels and concluded that the Sobel operators exhibited the most
rotation-equivariant behavior. Freeman and Adelson [8] characterized the com-
plete class of rotation-equivariant kernels and termed these \steerable" lters.
We are interested in the optimal design of small separable kernels for ecient
discrete dierentiation. In previous work [10, 1], we described design techniques
for matched pairs of one-dimensional kernels (a lowpass kernel and a dieren-
tiator) suitable for multi-dimensional dierentiation. Axis derivatives were com-
puted by applying the dierentiator along the axis of choice and the lowpass ker-
nel along all remaining axes. The error functional was a weighted least-squares
error in the Fourier domain between the dierentiator and the derivative of
the lowpass kernel. In this paper, we generalize these notions to form a two-
dimensional error functional that expresses the desired property of derivative
kernels discussed above. This error functional is then minimized to produce a
set of optimally rotation-equivariant derivative kernels.
2 Dierentiation of Discrete Signals
Dierentiation is an operation dened on continuous functions. The computation
of derivatives on a discretely sampled function thus requires (at least implicitly)
an intermediate interpolation step. The derivative of this interpolated continuous
function is then re-sampled at the points of the original sampling lattice.
2.1 Example: Ideal Interpolation
To make this more precise, consider the classical situation in which the sampled
function is assumed to have been formed by uniformly sampling a continuous
function at the Nyquist rate. In this case, the correct interpolation of the discrete
function f [] is:
X
f (x; y) = f [k; l] c(x , kT; y , lT ); (1)
k;l
where T is the sample spacing (assumed to be identical along both the x and y
axes), f (x; y) is the interpolated continuous function, and the interpolation func-
tion c(x; y) is a separable product of ideal lowpass (\sinc") functions, c(x; y) =
x=T ) sin(y=T ) . Assuming that the sum in Equation (1) con-
sT (x) sT (y) = sin(x=T y=T
verges uniformly, we can dierentiate both sides of the equation. Without loss
of generality, consider the partial derivative with respect to x:
X
Dx ff g(x; y) = f [k; l] Dxfcg(x , kT; y , lT ); (2)
k;l
where Dx fg indicates a functional that computes the partial derivative of its
argument in the x direction. Note that the derivative operator is only being
applied to continuous functions, f and c.
2
One arrives at a denition of the derivative of the discrete signal by sampling
both sides of the above equation on the original sampling lattice:
X
Dxff g(x; y)jx=nT;y=mT = f [k; l] Dx fcg((n , k)T; (m , l)T )
k;l
X
= f [k; l] DfsT g((n , k)T ) sT ((m , l)T )
k;l
X
= f [k; l] dT [n , k] [m , l]; (3)
k;l
where dT [] is the T -sampled sinc derivative, and [] is the T -sampled sinc
(i.e., a Kroenecker delta function). Note that the right side of this expression
is a convolution of the discretely sampled function, f [], with the separable ker-
nel dT [n , k][m , l]. The continuous interpolation need never be performed.
If the original function was sampled at the Nyquist rate, then convolution
with the sampled derivative of the sinc function will return an exact sampled
derivative. In practice, however, the coecients of this kernel decay very slowly
and accurate implementation requires very large kernels. In addition, the sinc
derivative operator has a large response at high frequencies, making it fragile in
the presence of noise.
2.2 Alternative Interpolation Functions
The limitations of the sinc function lead us to consider alternative interpolation
functions. Of course, if an interpolator other than the sinc function is used, the
resulting derivative may not be that of the original continuous function. However,
for many applications this is not a fundamental concern. Consider, for example,
the problem of determining the local orientation of an edge. This can be achieved
by measuring the gradient vector, which is perpendicular to the edge. If we use
an interpolation kernel which preserves the structure of the edge, the gradient
direction will still provide the desired information.
Since the separability of the sinc is desirable for computational eciency
(e.g., [5, 4]), we will consider an interpolator that retains this property, and
will also assume that the two axes should be treated identically. Thus, the two-
dimensional interpolator is written as a separable product, c(x; y) = d0 (x) d0 (y).
The partial derivative (with respect to x) of this interpolator is:
Dxfcg(x; y) = d1 (x) d0 (y); (4)
where d1 (x) is the derivative of d0 (x). With this interpolator, the sampled deriva-
tive (as in Equation (3)) becomes:
X
Dx ff g(x; y)jx= nT;y=mT = f [k; l] d1 ((n , k)T ) d0 ((m , l)T )
k;l
X
= f [k; l] d1 [n , k] d0 [m , l]: (5)
k;l
3
The discrete derivatives are computed using two discrete one-dimensional ker-
nels, d0 [] and d1 [], which are the T -sampled versions of d0 () and d1 (), respec-
tively. Note that the separability of the interpolator is retained in the derivative
operator. As with the sinc function, we need never make explicit the underlying
continuous function d0 ().
At this point, we could simply choose a continuous function d0 (), compute its
derivative d1 (), and T -sample the two functions. For example, it is common in
computer vision to use a sampled Gaussian and its derivative. However, because
the Gaussian is not strictly bandlimited, sampling introduces artifacts, thus de-
stroying the derivative relationship between the resulting kernels. So, instead we
choose to simultaneously design a pair of discrete kernels that optimally preserve
the required derivative relationship.
To design such a pair of discrete kernels, we must state the dierential re-
lationship between them. Previously [1], pairs of kernels, d0 [n] and d1 [n], were
designed such that their Fourier transforms approximate the correct derivative
relationship:
j!D0(!) = D1 (!); , < ! < ; (6)
where capitalized functions correspond to the (continuous but periodic) discrete-
space Fourier transform, D0 (!) = n d0 [n]e,j!n . This constraint states that
P
the derivative (in the Fourier domain) of the kernel d0 [n] is equal to the kernel
d1 [n]. For the current paper, we wish to impose a similar constraint in the
two-dimensional Fourier domain. In particular, the derivative in an arbitrary
direction (specied by a unit-vector u^) is:
Du ff g(x; y) = uxDx ff g(x; y) + uy Dy ff g(x; y);
^ (7)
where (ux; uy ) are the components of u^. Thus, the two-dimensional version of
the Fourier domain constraint of Equation (6) is:
j (^u !^ )D0 (!x )D0 (!y ) = [ux D1 (!x)D0 (!y ) + uy D0 (!x)D1 (!y )]; (8)
and should hold for , < f!x; !y g < and for all unit vectors u^ (i.e., for
all directions). We can now dene a weighted least-squares error functional by
integrating over these variables:
Z Z
E fD0; D1 g = W (^!) [j (!x ux + !y uy )D0 (!x )D0 (!y ),
!^ u^
(ux D1 (!x )D0 (!y ) + uy D0 (!x )D1 (!y ))]2 ; (9)
where W (^!) is a weighting function. In order to avoid the trivial (zero) solution,
we impose a constraint that the interpolator have unit response at D.C. (i.e., the
kernel d0 [n] has unit sum): D0 (0) = 1.
4
2.3 Higher-Order Derivatives
Higher-order derivative kernels may be designed using a similar strategy to
that introduced in the previous section. In particular, the N th-order directional
derivative in direction u^ is:
DuN^ ff g(x; y) = [uxDx + uy Dy ]N ff g(x; y)
N
X
= b[p; N ]upx u(yN ,p) Dxp Dy(N ,p) ff g(x; y): (10)
p=0
where b[p; N ] = N !=p!(N , p)! is the binomial coecient. Combining this de-
nition with the interpolation of Equation (2), and sampling both sides gives an
expression for the discrete N th-order directional derivative:
N
X
DuN ff g(x; y)jx
^ = nT;y=mT = b[p; N ]upx u(yN ,p)
p=0
X
f [k; l]dp[n , k]dN ,p [m , l]: (11)
k;l
This expression is a sum of convolutions with separable kernels composed of a
set of discrete one-dimensional derivative (and interpolation) kernels fdp [n] j p =
0; 1; : : : ; N g. As before, we place a constraint on these kernels in the Fourier
domain:
N
X
j N (^u !^ )N D0 (!x )D0 (!y ) = b[p; N ]upx u(yN ,p) Dp (!x )D(N ,p) (!y ): (12)
p=0
A least-squares error functional is formed by integrating over orientation and
the two frequency axes:
Z
E fD0 ; D1 ; : : : ; DN g = W (^!)
!^
Z !2
N
X
b[p; N ]upx u(yN ,p) j N !xp !y(N ,p) D0 (!x )D0 (!y ) , Dp (!x )DN ,p (!y ) (13)
:
u^ p=0
In order to avoid the trivial solution, we again impose a constraint that the
interpolator (d0 [n]) have unit response at D.C.: D0 (0) = 1.
3 Results
The error functionals in Equations (9) and (13) are both fourth-order in the
optimization variables and cannot be optimized analytically. In order to obtain
solutions, we x the size of the kernels, and use conjugate gradient descent. As
a starting point for the rst-order kernels, we use the solution that minimizes
the linear one-dimensional constraint of [1]:
Z
E fD0; D1 g = W 0 (!)[j!D0 (!) , D1 (!)]2 ; (14)
!
5
subject to D0 (0) = 1, with a weighting function of W 0 (!) = 1=(j!j + 4 ). For
the N th-order kernels, we start from the solution that minimizes the linear one-
dimensional set of constraints:
(j!)i,j Dj (!) = Di (!); (15)
for 0 j < i N , again subject to the constraint that D0 (0) = 1. For the
weighting functions W (^!), we choose a \fractal" weighting of W (^!) = 1=(!x2 +
!y2 + 2 =16).
Based on this design, Table 1 gives a set of rst-order derivative kernels of
dierent sizes. Figure 1 shows a comparison of the dierentiator d1 [n] with the
derivative of the interpolator d0 [n], computed in the Fourier domain. If these
kernels were perfectly matched (i.e., d1 [n] is the derivative of d0 [n]), then the
two curves should coincide. Also shown in this gure is a comparison of these
kernels to a variety of other derivative kernels. For a fair comparison, the variance
of the Gaussian in this gure was chosen so as to optimize the 1-D constraint of
Equation (6). Note that even under these conditions, our kernels better preserve
the required derivative relationship between the dierentiator and interpolator
kernel. Note also that the resulting dierentiation kernels are bandpass in nature,
and thus less susceptible to noise than typical sinc approximations.
Since the derivative kernels are designed for rotation-equivariance, we con-
sider the application of estimating the orientation of a two-dimensional sinusoidal
grating from the horizontal and vertical partial derivatives. The grating had a
xed orientation of 22:5 degrees and spatial frequency in the range [1; 21]=64 cy-
cles/pixel. Figure 2 shows the estimation error as a function of spatial frequency
for a variety of derivative kernels. In this example, orientation was determined
using a total least squares estimator over a 16 16 patch of pixels in the center of
the image. Note that the errors for our optimal kernels are substantially smaller
than the other lters. The reasonably good performance of the Gaussian is due,
in part, to our optimization of its variance. And nally, shown in Table 2 are a
set of higher-order derivative kernels of dierent sizes.
Table 1. First-order derivative kernels. Shown are pairs of derivative (d1[n]) and in-
terpolator (d0 [n]) kernels of various sizes.
6
3-tap Optimal 4-tap Optimal 5-tap Optimal 6-tap Optimal
Fig. 1. First-order derivative kernels. Illustrated in each panel are the magnitude of
the Fourier transform of the derivative kernel (solid line) and the frequency-domain
derivative of the interpolator (dashed line) for our optimally designed kernels (see
Table 1). Also illustrated, for comparison, are several other derivative kernels. Beneath
the plots are the weighted RMS errors.
Abs(Error) (degrees)
Abs(Error) (degrees)
15 15 15
10 10 10
5 5 5
0 0 0
0 0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3
Spatial Frequency (cycles/pixel) Spatial Frequency (cycles/pixel) Spatial Frequency (cycles/pixel)
Abs(Error) (degrees)
Abs(Error) (degrees)
2 2 2
1 1 1
0 0 0
0 0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3
Spatial Frequency (cycles/pixel) Spatial Frequency (cycles/pixel) Spatial Frequency (cycles/pixel)
7
d0 0.026455 0.248070 0.450951 0.248070 0.026455
d1 -0.097537 -0.309308 0 0.309308 0.097537
d2 0.236427 0.020404 -0.517610 0.020404 0.236427
d0 0.004423 0.121224 0.374352 0.374352 0.121224 0.004423
d1 -0.029091 -0.223003 -0.196061 0.196061 0.223003 0.029091
d2 0.105303 0.198956 -0.301356 -0.301356 0.198956 0.105303
d3 -0.230922 0.227030 0.503368 -0.503368 -0.227030 0.230922
d0 0.002013 0.051225 0.247548 0.398427 0.247548 0.051225 0.002013
d1 -0.008593 -0.115977 -0.240265 0 0.240265 0.115977 0.008593
d2 0.033589 0.180366 -0.028225 -0.370850 -0.028225 0.180366 0.033589
d3 -0.107517 -0.074893 0.469550 0 -0.469550 0.074893 0.107517
d4 0.201624 -0.424658 -0.252747 0.940351 -0.252747 -0.424658 0.201624
Table 2. Higher-order derivative kernels. Shown are sets of derivative (dp[n]; p > 0)
and interpolator (d0 [n]) kernels for dierentiation orders N = 2; 3; 4.
4 Conclusions
Dierentiation of discretized signals is a very basic operation, widely used in
numerical analysis, image processing, and computer vision. We have described
a framework for the design of operators that are ecient (i.e., compact and
separable) and optimally equivariant to rotations. This formulation can easily be
extended to higher dimensions (e.g., three-dimensional derivatives for motion).
References
1. E P Simoncelli. Design of multi-dimensional derivative lters. In First Int'l Conf
on Image Processing, Austin, Texas, November 1994.
2. B. Carlsson, A Ahlen, and M. Sternad. Optimal dierentiators based on stochastic
signal models. 39(2), February 1991.
3. Per-Erik Danielsson. Rotation-invariant linear operators with directional response.
In 5th Int'l Conf. Patt. Rec., Miami, December 1980.
4. J De Vriendt. Fast computation of unbiased intensity derivatives in images using
separable lters. Int'l Journal of Computer Vision, 13(3):259{269, 1994.
5. T Vieville and O Faugeras. Robust and fast computation of unbiased intensity
derivatives in images. In ECCV, pages 203{211. Springer-Verlag, 1992.
6. R Deriche. Fast algorithms for low-level vision. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 12:78{87, 1990.
7. J J Koenderink and A J van Doorn. Representation of local geometry in the visual
system. Biological Cybernetics, 55:367{375, 1987.
8. W T Freeman and E H Adelson. The design and use of steerable lters. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 13(9):891{906, 1991.
9. A V Oppenheim and R W Schafer. Discrete-Time Signal Processing. Prentice
Hall, 1989.
10. E P Simoncelli. Distributed Analysis and Representation of Visual Motion. PhD
thesis, EECS Dept., MIT, January 1993.
This article was processed using the LATEX macro package with LLNCS style