Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Structure of Images

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Biol. Cybern.

50,363 370 (1984) Biological


Cybernetics
9 Springer-Verlag 1984

The Structure of Images


Jan J. Koenderink
Department of Medical and Physiological Physics, Physics Laboratory, State University Utrecht, The Netherlands

Abstract. In practice the relevant details of images exist window (the "outer scale") as well as a limited
only over a restricted range of scale. Hence it is resolution (the "inner scale"). These limits are set by the
important to study the dependence of image structure "format" of the image, e.g. by the size of the
on the level of resolution. It seems clear enough that photographic plate and the graininess of the emulsion,
visual perception treats images on several levels of the number and spacing of photosensitive elements of
resolution simultaneously and that this fact must be a CCD array, or, in the case of the visual system, the
important for the study of perception. However, no discrete structure of the retinal receptive fields and the
applicable mathematically formulated theory to deal extent of the retina. In a number of situations the
with such problems appears to exist. In this paper it is inner scale is determined by the structure of the
shown that any image can be embedded in a one- radiation itself, e.g. in low-luminance situations (night
parameter family of derived images (with resolution as vision, image intensifiers) or scintigraphy (where the
the parameter) in essentially only one unique way if the number of gamma - quanta available is limited by
constraint that no spurious detail should be generated dosimetry). In a great many applications the inner
when the resolution is diminished, is applied. The and outer scales are set by the subject matter rather
structure of this family is governed by the well known than the image format, e.g. a treetop does not exist on
diffusion equation (a parabolic, linear, partial different- the scale of the leaves nor on that of the forest. (You
ial equation of the second order). As such the structure typically define treetops as features in volumes with
fits into existing theories that treat the front end of the an outer scale of 10 m and an inner scale of 10 cm say.)
visual system as a continuous stack of homogeneous In all of the latter cases the problem of setting outer
layers, characterized by iterated local processing scale (that is finding the subject matter, "identification")
schemes. When resolution is decreased the images and inner scale (morphometric characterization or
becomes less articulated because the extrem ("light "localization") can be acute. This is especially true in
and dark blobs") disappear one after the other. This automatic image processing, much less so in vision: the
erosion of structure is a simple process that is similar in human eye seems to possess an uncanny aptitude to
every case. As a result any image can be described as a "zoom in" on the right range of scale. Thus, for
juxtaposed and nested set of light and dark blobs, instance, to locate the heart on a cardioscintigram you
wherein each blob has a limited range of resolution in blur the image, then to study the shape of the left
which it manifests itself. The structure of the family of ventricle you increase resolution until the photon noise
derived images permits a derivation of the sampling becomes really objectionable (Hay and Chesters,
density required to sample the image at multiple scales 1977). Thus you probe what may be called the "deep
of resolution. The natural scale along the resolution structure" before dealing with the "superficial"
axis (leading to an informationally uniform sampling structure (at one level of resolution). Similar problems
density) is logarithmic, thus the structure is apt for the are well known in other fields, e.g. biology, astronomy.
description of size invariances. If you have no a priori reasons to look for certain
features, then you cannot decide on the "right scale".
(Except in certain trivial cases, e.g. once you resolve
1 The Problem of Scale and Resolution individual quantum events it is useless to increase
In every imaging situation you have to face the resolution any further - regardless of subject matter.)
problem of scale: a given image has a limited extent or Thus if you aim to retain all available structure, and yet
364

want to vary the resolution (e.g. in order to be able to primal image, i.e. I require the vertical derivative Kz at
identify global objects through blurring), then you any level to be given by a functional that depends solely
must treat the image on all levels of resolution on the function (or derived image) K(x, y, z = const).
simultaneously. Several attempts to do so have been The problem then is how to express K~ in terms of the
published (Hay and Chesters, 1977; Burt et al., 1981; derived image at a given level. It will be shown that this
Witkin, 1983). The challenge is to understand the can only be done in essentially a single sensible way.
image really on all these levels simultaneously, and not Most persons experience no difficulties when asked
as an unrelated set of derived images at different levels to point out "the same" features in two photographs
of blurring: this presupposes the existence of links, or that differ with respect to the amount of blurring if
"projections" between the different levels of resolution. these features are sufficiently coarse. It seems natural
The obvious way to proceed appears to be: to identify light spots when they occur at similar
1. Embed the original (or "primal") image in an one- locations as really the same spot, and our confidence is
parameter family of "derived" images. The parameter increased when we identify configurations of light and
measures resolution, or inner scale. The outer scale dark spots that show similar spatial relations. Let us
determines how far to proceed. (For inner scale can then start by identifying a pixel (x', y') at resolution z'
never exceed outer scale, the simplest derived image with a pixel (x, y)
contains just one logon or structural degree of K(x', y', z') = K(x, y, z) (metrical identity)
freedom.)
[ ( x ' - x)2 + ( y ' - y) 2]
2. Study the family as a family, i.e. define deep
structure, the relations between structural features of is a local minimum (structural proximity).
different derived images. Note that it is not at all guaranteed that such a
3. In a latter phase of this program (not covered in the mapping always exists, in fact a given luminance at
present paper) these mathematical structures may be some level of resolution need not at all survive if you
incorporated in more detailed mechanistic models of blur that image. Here I introduce the first hypothesis,
the visual system composed of homogeneous that of causality: any feature at a coarse level of
processing layers with a specific across-layer structure. resolution is required to possess a (not necessarily
In the sequel I show that under a few rather general unique) "cause" at a finer level of resolution although
constraints there exists really only one reasonable way the reverse need not be true. This asymmetry leads to a
to generate the one-parameter family and that the rather strong constraint. The hypothesis in effect
induced deep structure can be used to define the forbids the generation of "spurious resolution". Let me
"projections" unequivocally. formalize the constraint first: Consider a surface
K(x, y, z) = Ao (a constant) in (x, y, z)-space [or "scale
2 The Unique One-Parameter Family Generated space" (Witkin, 1983)]. Then you can formulate the
by an Image constraint for the stationary points of the derived
images (the points K x = Kr = 0). Note that you have
For the present discussion an "image" is just a real extrema if the Hessian Kx~Kry-K~r is positive (a
function of two real variables: minimum or dark blob if K ~ + Krr is positive, a light
L:R2-~R Nob if it is negative) and a saddle if the Hessian is
negative. If the primal image is generic (an assumption
L(r)=L(x,y)=2 r~R 2, ,~ER. that is easily eliminated later on), then the Hessian
The coordinates (x, y) are understood as the Cartesian never vanishes at the stationary points for z = 0. It may
coordinates in the image plane, the value 2 will be vanish at stationary points for certain finite values of z,
called the "luminance" h e r e - for ease of reference but however. Now the assumption of causality implies that
may be interpreted in many different ways. I shall not the surface K = A o should point its convex side
require 2 be positive, e.g. a "reference luminance" may towards the direction of decreasing resolution at the
be subtracted. extrema. For otherwise the more blurred image would
The aim is to define a real function K of three possess luminance values that could not be traced to
variables the less blurred images, contrary to the hypothesis.
The curvature of the surface K(x, y, z) = A o is easily
K:R3-.R obtained with standard methods (Spivak, 1975). First
K(R) = K ( x , y, z) = A R ~ R 3, A E R
note that the unit surface normal n may be defined as:
n = p/p with p = (K~, K,, K~).
in such a way that K(x, y, O) = L(x, y) for all x, y, and
such that the parameter z measures inner scale. I The signs of the principal curvatures are defined
require that the family depends "causally" on the with respect to this choice of orientation of the surface.
365

The principal curvatures are image). Then I define K(x, y, t) as the solution of the
heat conduction equation with as boundary condition
2i i=1,2, K(x,y,O)=L(x,y)-L*(x,y) and K(OS, t)=O. [Note
that L* would lead to Kt(x,y,O)=O anyway: it is an
where the 2~ are the roots of the (quadratic !) equation invariant component of the primal image.]
In retrospect you can obtain any derived image
IKxx-2 K~y K~ K~ directly from the primal image through convolution
Kyx Kyy--2 Kyz Ky with the gaussian kernel
det =0
] Kzx K~ K~z- 2 K~ K(r, r3 = exp ( - Ir - r'l 2/4t)/4zct.
I
/ K~ Ky Ks 0 In fact any derived image at level t can be derived from
or (because we consider the case Kx=O, Ky=0, any other derived image at level t'<t through
Kz=0): convolution with a suitable gaussian kernel (or point
spread function). Thus if spurious resolution is
,~2_ ~(Gx + K . ) + ( K ~ G , - K~,) = 0. prohibited (the first hypothesis), then the family of
By hypothesis KxxKyy-K2xyis positive (I consider gaussians is unique (Note 1). Gaussian blurring is the
extrema, not saddle points), thus both roots have equal only sensible way to embed a primal image into a one-
sign. This sign is given by the sign of K~x + Kyy = AK, parameter family.
whereas convexity (concavity) is defined relative to the Interestingly enough the structure proposed here
sign of the third component of the surface normal. has several features that can be traced to well known
(That is the sign of K~.) Thus the constraint can finally models of the visual system. For instance, the study of
be written zero crossings for images subjected to different degrees
of blurring (Marr et al., 1977) and the studies on
AK = 0~2(X, y, z)K~,
processing in layered media (Marko, 1969; Roehler,
where ~ denotes an arbitrary but nowhere vanishing 1976). The latter study even explicitely incorporates
real function. [-Note that this equation has really been the diffusion equation.
derived at the location of the extrema solely. But then,
for images that are not a priori known, these extrerna
3 Image Structure - The Superficial Structure
might be anywhere ! Thus the equation must hold at all
points of the image, which is why I introduced the In the preceding paragraphs I have glibly spoken of
function e(x, y, z).] Consequently I have arrived at a light and dark spots in the image. Obviously such
partial differential equation that has to be satisfied by image features are of importance, but how do you
the family of derived images. delimit a light blob (say) in a blurred image? In one-
In order to proceed I introduce a second dimensional images, such as time signals, one defines
hypothesis at this point: homogeneity and isotropy. peaks and troughs either by way of extrema (Ehrich
The inner scale depends only on the parameter z, and and Foith, 1976; e.g. a "peak" is a region between two
in no way on x or y. Thus I do not permit space variant successive minima) or through points of inflexion
blurring. Clearly this is not essential to the issue, but it (Witkin, 1983). Both methods are not easily transposed
simplifies the analysis greatly. The hypothesis means to two dimensions. The two-dimensional equivalent of
that c~(x,y, z) depends only on z. Then I can introduce a a point of inflexion would be a parabolic curve
new scale parameter t (say) in such a way that t = ~o(z) 2
(KxxKyy-Kxy=O), but parabolic curves sometimes
where q~ is a monotonically increasing function, and fail to enclose single extrema. People have attempted
AK =K~. to use "zero-crossings (AL=0)" (Mart et al., 1977) for
This is the well known heat conduction or diffusion the purpose, but these curves suffer from the same
equation. This equation governs the deep structure of drawback. One nice method is the one commonly used
the image. in geography, and introduced in mathematics by
Consequently, I define the family of derived images Cayley (1859) and Maxwell (1870) ("Hills" and
K(x,y,t) as the solution of the heat conduction "Dales", separated by "watercourses" and
equation with the boundary condition K(x,y,O) "watersheds"). A quite similar method - that seems
= L(x, y). This works fine if the image extends over the more natural for the present purpose - is to employ the
whole of R 2. If the primal image is only defined over a foliation of the image induce~l by the family of
finite region S, say a square or disc, etc. (the usual case), equiluminance curves (Koende'rink and van Doorn,
I proceed a little different. First I define L*(x, y) as the 1979).
solution of AE* = 0 with as boundary condition that From differential topology it is known (Guillemin
L*(x, y) = L(x, y) restricted to 8S (the boundary of the and Pollack, 1974) that almost all (in a precise sense)
366

model of this singularity, see Fig. 1):


K(x o + 6x, Yo + 3y, t o + &)
6x 3 3y z
= A ~ --6-- + 2 + & ( 6 x + 1).
(Note that K satisfies the diffusion equation.)
For 6t < 0 the extremum is at @ = 0, 6x = ~ ,

y\ x
the saddle at @ = 0 , 6 x = - ~ . For t > 0 both
have vanished. Note that K(x, y, to) is not a generic
image. In all practical cases the family of derived
images is versal; that is all but a finite number of
J
isolated derived images are generic.
Fig. 1. The surface K = Ao. The point P is (xo, Yo,to). In the The non-generic images occur as images in which
regions alfa and beta the lines of steepest descent have the
singular paths through P as asymptotes. In the region gamma an extremum merges with a saddle-point. Thus you
these lines issue from P. Surfaces K=A1 with AI>Ao have can unequivocally assign extrema to saddle-points.
extrema nor saddle points, whereas surfaces K = A2 with A2 < Ao The isoluminance curve through the saddle-point must
have one extremum and one saddle point: at point P you have a encircle that extremum, and thus serves to define the
"collision" of a saddle and an extremum boundary of the light or dark blob. There exists an even
more natural method to do this, however.
The requirement that in two "successive" derived
images, say K(x,y,t) and K ( x , y , t + & ) (with x,y
images are generic, that is:
variable), corresponding points have equal luminance
- stationary points (K~ = Ky = 0) are isolated,
- K ~ K r r - K ~ y2 = 0 at stationary points, and are as close as possible, yields a simple rule of
- stationary values are distinct.
projection between images: the orbits of the projection
are the integral curves of the vector field
Then singular equiluminance curves are points (at the
extrema) and curves with self-intersections (at the s = (-- KtK~, - KtK,, K~ + K~).
saddle-points, Maxwell's "false extrema"). The
This is easily proved as follows: The point r + dr at
extrema and false extrema can be put into a natural
the image t + dt that is connected to the point r at the
partial order (of inclusion) as follows: Each saddle image t, must satisfy d L = V K . d r + K i l t = O .
point defines a closed equiluminance curve with a Moreover, the steepest descent is in the direction of the
single self-intersection, the two loops define two gradient (VK). Thus
disjunct families of closed equi!uminance curves that
contain either extrema or false extrema (containing dr~dr = - (KdVK. VK). VK.
other - possibly false - extrema, etc.). In this manner
you obtain a nested family of (false) extrema and the The vector (VK. vK)dd~ has everywhere the same
inclusion defines a partial order. The boundary of the direction as dr~dr, and its singularities coincide with
image does not lead to complications if you first those of dr/dt: thus the integral curves of these vector
subtract the invariant image (as noted earlier): then the fields are the same.
boundary itself is a closed equiluminance curve. F r o m The stationary points of the images are just the
the vantage point of visual perception this method of singularities of the vector field s (because K~ + K~ = 0).
treating an image in terms of a hierarchy of nested and When you project some region of a derivative image
juxtaposed light and dark blotches appears as a very towards the primal image plane, it is apparent that not
natural one. all points in the latter plane can be reached by the
integral curves of s: each extremum-saddle-point pair
defines a region that remains blank. These regions are
4 The D e e p Structure
described through the integral curves that pass
When you blur an image, you loose structure: the total through the extremum and those through the saddle
number of extrema cannot increase, and generally that do reach the plane t = 0 (Fig. 2). These regions are
decreases if the blurring is sufficiently strong. A single topologically equivalent to discs, in the primal image
process accounts for this (an immediate consequence of plane the saddle-point lies on the border, the
Thorn's theorem (Thorn, 1972): when t is increased it extremum inside it.
may happen that an extremum merges with a I propose to call these regions the "ranges" of the
saddle-point, whereon both are annihilated. An extrema, they can be taken to define the light and dark
example is (this is at the same time the general affine blobs defined by the extremum-saddle-point pairs.
367

~ t~
For a certain finite range of resolution the blobs
can be identified (that is if t is less than the value at
which the extremum meets its saddle-point), and in a
still more limited range the blob exists in its pure form,
unarticulated. For too high a resolution the blob may
be difficult to detect because it is articulated with
irrelevant smaller detail (e.g. blurring really helps to
find objects in scintigrams), whereas for too low a
resolution the blobs loose identity (e.g. in a
cardioscintigram the left and right ventricles may
merge). Details thus have a limited range of resolution
in which they can be said to exist. We can define this
range from the top of the realm to the next lower top of
Fig. 2. If the derived image at t = t2 is down-projected to the plane any included subrealm.
t = tl, the dotted region is left o p e n : i t contains detail that is not Some details exist over a long range of resolution,
present in the blurred image
others are more ephemeral and at once desintegrate
once you identify them. There is some evidence that
"stable features" (those that exist over long ranges) are
the visually most conspicuous ones (Witkin, 1983).
Note that you cannot "reconstruct" the primal
image from a highly blurred image through the device
of downprojection: surely this sharpens or "deblurs"
r i t
the image, but at the cost of the introduction of blank
spaces (the ranges of extrema on the primal image).
Thus you have to bring in extra information at the
levels of resolution where - by downprojection - new
realms appear. A complete description of the image
on the coarsest possible scale entails:
1) the image at some (coarse) level of resolution,
2) the luminance values on the loci of extrema (a
Fig. 3. A tube A defined by the saddle point Q, extremum R set of curves in (x, y, t) space). Downprojection from
pair in the primal image Z. The top of the tube is the singular these entities completely fills the primal image plane,
point P where saddle point and extremum meet. (It has a thus if you add,
horizontal tangent plane.) The surface E contains orbits that end 3) the geometrical structure of the family of
on saddle points on the a r c P Q and from there split into two downprojecting paths, you have completely
branches on A. Any orbit inside the tube ends on an extremum on
the arcPR. No orbit from outside A can enter its inside. Thus
characterized the image.
downprojection from a level above P leaves the realm on the Concerning the geometrical structure of the
primal image ~ uncovered downprojecting paths, they alone are sufficient
description ! For the structure of the s-field determines
the surfaces K = c o n s t : s A ( s - ( s . e t ) e t ) y i e l d s the
If you don't project down to the primal image direction of the normal to these surfaces. (et
plane, but to some intermediary image plane, you a unit vector in the t-direction.) Consequently, the
obtain the range at that level of resolution- at least if it image is determined except for a transformation of
exists there. These ranges sweep out tube-like volumes the type K'(x, y, t) = ~(K(x, y, t)). Obviously this
(with t as parameter, Fig. 3) in scale space. The tubes transformation must conserve the property that
are closed on one side. (This highest point being the A K = K t , thus AK'=K~. This latter equation can be
merge of the extremum-saddle-point pair.) These tubes shown to be equivalent to:
define the volumes in scale space at which the blobs
manifest themselves, I propose to call them the 02~1) ]17K]2 q_ ~49 A K
0K 2 ~( -Kt)=0.
"realms" of the extrema. In complicated images many
different realms coexist, both juxtaposed and nested to
arbitrary depth. (Because of the structure of the s field Thus~ = 0, or K' = ~K + fl (:r and fl const).
the boundaries of different realms can never meet.)
Thus you may really speak of light blobs containing But then the image is determined up to a
other light or dark blobs, containing.., etc. multiplicative and an additive constant through the
368

projection orbits in scale space alone! In fact you If you start with a Gaussian spectrum
obtain V ln[VK] through the s-field, and thus by 7'(k, to) = 7'0 e x p ( - k 2 / 2 k 2 ) , you have that
integration K except for scale and offset.
7"(k, t) = 7"0 exp [ ( - kZ/2) 9( 2 ( t - to) + 1/k2)] 9
One nice feature of this description is that it permits
a logical filtering in the scale domain. For every range Thus the spectrum remains Gaussian but the width
in the primal image plane you may solve A E = 0 within decreases as
the range with the boundary value E = L on the (1/k 2 + 2 ( t - to)) 1/2
boundary of the range. Then you may "lift off" the
detail by defining it as L - E within the range and zero If you start out with a white spectrum (ko-+ o0), the
outside. In this way the whole primal image can be width just goes as 1/1/27. (I will set t o = 0 in the sequel.)
written as a superposition of the light and dark blobs. The highest significant frequency is obviously
A subfamily may be defined for each subimage, and kmax(t), for which 7"(kmax,t) = e x p ( - R)V2(0, t). Thus
because the diffusion equation is linear the original
kmax = ] / ~ , or in other words the Nyquist sample
family is just the superposition of the subfamilies. Now
you may choose, for instance, to use only summands density must use a spacing d = n/kma x = n]/t/R.
belonging to features existing in a certain range of Another problem concerns the spacing that is
scales. This is in effect a logical filtering in the scale required along the t-axis. The characteristic decay
domain. You may even compose images in which length for the highest frequency component is d2/n 2
details in different scale ranges have been blurred with d as defined above. Thus this wave damps with a
differentially, etc. factor
Finally, note that the diffusion equation may also (1-n2Ot/d 2) over a distance at((~t~d2/~2).
be used backwards to enhance the image. This process
Now there are two problems to consider: that of the
may end, however. E.g. the primal image
accuracy of the representation and that of the stability
exp(--(x2+y2)/4#)/(4rt#) can only be sharpened to
(in the numerical sense) of the representation. Let us
t = - # , then it has been shrunk to an impulse.
consider accuracy first. The approximation
I ( r , & ) ~ I ( r , 0 ) + & . A I ( r , 0 ) can easily be shown to
have a relative error bounded by
5 The Sampling of Images in Scale Space
d4at 2
Two basic solutions of the heat equation are
-- 27C4 "
cp(r, t) = exp(-[r]2/4t)/4nt (S cpdr = 1, ~p(r,0) = a(r)) The requirement that e < e x p ( - R ) then yields the
condition
~v(r, t) = Re e x p ( - ik. r - k2t).
Both are convenient when you want to construct a t < ]//2e-R/2 d2 .
7~2
solutions of the heat conduction equation through the
principle of superposition. I use these simple solutions Next consider stability. For a spatial frequency ~o the
here to demonstrate some principles that pertain to the transfer function from layer t = 0 to layer t = a t is
sampling of the image in scale space. This is of obvious (1 -coZat). Stability requires that the absolute value of
importance to practical (i.e. numerical) applications. the transfer function remains less than unity: otherwise
Let the metrical resolution be given, e.g. the arbitrarily small errors will soon grow without bounds.
, %
luminance (or rather the flux in a resolution cell) is
measured with a relative accuracy of e x p ( - R ) (R > 1, This leads to the requirement at< ~ d 2. For any
thus the accuracy is R/ln2 "bits"). Take ~o(r, t) as a basic reasonable value of R stability is guaranteed when
solution, then if you require that at any level of
resolution a cell centered at the origin samples at least at = (]//~/n2) exp(-- R/2) . d 2 .
(1 - e x p ( - R ) ) t h part of the total flux, such a cell must This can also be written (making use of the relation
have a radius of 2]/tR. At a center spacing o f [ / ~ such d = nt ~ ) as
cells sample uncorrelated fluxes if the points in the
ground plane were uncorrelated. You may also inquire &/t = ]/~ exp( - R/Z)/R = const.
after the required Nyquist sample frequency. Consider
Thus you need a logarithmic spacing of sample
the basic solution ~v(r,t): a spatial frequency
planes along the t-axis. This is in accord with the
component with wavelength 2 = 2n/k damps
intuitive notion that there can be no preferred scale,
exponentially with characteristic decay length
thus a uniform sampling density on a logarithmically
Atl/e= 1/k 2 = )~2/4:rr scaled axis is indicated.
369

From these basic results it appears that the The relation to the diffusion equation appears to have
reciprocal of t (say q = t-1) is an even more natural been overlooked previously, although it is this
measure of resolution from the standpoint of structural equation that explicitely defines the deep structure of
information theory: in (r, q)-space the resolution cells the image.
have constant volume. This volume is Another main result is that if the mutual
immutability of details with respect to blurring is taken
~=AqAr=At.d2/t z into consideration, then you are able to define a true
(or "linear") order of extrema: the image can be
= 1/2re2 e x p ( " R / 2 ) / R 2 ;
described unambiguously as a set of nested and
juxtaposed light and dark blobs that vanish in a well
it depends on the metrical resolution alone. This is a
defined sequence on progressive blurring. Note that
basic "uncertainty relation" for scale space: a "blob" of
such a linear order cannot be established at just one
area A A exists only over a resolution interval
single level of resolution: e.g. for a pseudo-maximum
Aq=~/AA.
consisting of two maxima (that is a light blob
The so-called hierarchical "pyramid" structures
containing two smaller light blobs) it cannot be
that are in widespread use today for multiresolution
decided which of the sub-maxima is actually
image analysis are all much coarser than this (Burt et
subordinate to the other, whereas on blurring this
al., 1981). Consequently the family of derived images
becomes clear: at some degree of blurring one of the
cannot be derived simply from these structures, except
two must vanish and yield to the other. Thus the image
by the trivial measure of starting all over from the
can be truely segmented into nested and juxtaposed
primal image. Thus quantization effects must be rather
light and dark blobs. Moreover, to each blob can be
severe. Yet algorithms based on these structures are
assigned three characteristic ranges of resolution: in
admittedly powerful and these structures behave at
one of them the blob is non-existent (or unresolved), in
least qualitatively very much like the system discussed
another it manifests itself purely as a simple blob, and
in this paper. Note that the correctly sampled image is
in a final one finer detail intrudes on its territory. Thus
also a "pyramid", but one that tapers much less swiftly.
The total number of samples needed to represent the the effects of progressive erosion clarify the deep
structure of the image. In typical image processing
structure can be easily obtained as follows: For a
square image with sides L and resolution 6 the total applications this structure can be used for "logical
resolution space has a volume LZqmax=7~2L2/R2(~2 filtering with respect to scale". Such a filtering can for
= ~ 2 N / R 2 , where N is the total number of independent
instance be based on the relative stability of the blobs
with respect to erosion.
image elements in the original images. Dividing the
In the family of derived images as described in this
volume by the volume of a resolution cell, we find the
paper the structural information is not coded very
number of samples (M) needed: M = N e x p ( R / 2 ) / ] / 2 . efficiently: the primal image - thus the values of the
This number is seen to grow exponentially with the luminances in just one plane of scale space - contains
required metrical accuracy (R). For small values of R, already all information! This may be remedied by
however, M is of the order of N, e.g. for an accuracy of considering K~, the derivative with respect to scale,
1% you have M ~ 7N. Thus the human visual system
instead of K. This function is equivalent to A K , the
contains certainly sufficient hardware as far as mere
Laplacian of the derived images, and thus it is just
numbers are concerned to accommodate the retinal
Marr and Hildreth's (1980) scheme (although these
image in this manner!
authors arrived at their method in a different, rather
ad hoc, manner). Obviously, K~ contains the same
information as K, except for a possible difference that
6 Conclusions is invariant against blurring. If you consider a primal
One main result is that there appears to be essentially a image with detail in a very limited range of scale, e.g.
single sensible way to embed an image into a one- the function ~p(r, t), you find that
parameter family of derived images, with resolution as Kt = - k z e x p ( - k 2t) COS k. r.
a parameter: namely by a diffusion process, or
convolution with a family of Gaussian point spread Thus at a given level (fixed t) you find the spatial
functions. This result must have seemed obvious to spectrum of the primal image filtered with a bandpass
some previous investigators in this field who started filter with a relative (half-height) bandwidth of 0.5. The
out from the family of Gaussians (or rather "DOG's": curves K~= 0 are the "zero-crossings" of Marr and
"difference of Gaussians") or from iterated blurrings Hildreth (1980). In our scheme they are singled out
(which asymptotically leads to diffusion) in an through the property that they are (locally) stable with
apparently ad hoc fashion (Marr and Hildreth, 1980). respect to erosion.
370

As I have shown before (Koenderink et al., 1978, Ehrich, R.W., Foith, J.P.: Representation of random waveforms
1982) the visual system is extensive enough to be able by relational trees. IEEE Trans. Comput. 25, 725-736 (1976)
to represent the retinal image at all levels of resolution Guillemin, V., Pollack, A.: Differential topology. Englewood
Cliffs, NJ: Prentice-Hall 1974
simultaneously. The initialization of this data structure Hay, G.A., Chesters, M.S.: A model of visual threshold detection.
is simple diffusion which can be effected in an J. Theor. Biol. 67, 221-240 (1977)
extremely simple manner by layered neural structures Koenderink, J.J., Doom, A.J. van: The structure of two-
(Roehler, 1976; Marko, 1969). In this paper I have dimensional scalar fields with applications to vision. Biol.
shown how to use such a structure, that is how to "read Cybern. 33, 151-158 (1979)
it out". This requires projections between different Koenderink, J.J., Doom, A.J. van: Visual detection of spatial
contrast: influence of location in the visual field, target extent
layers of the structure, guided by the activity in the
and illuminance level. Biol. Cybern. 30, 157-167 (1978)
network itself. Koenderink, J.J., Doorn, A.J. van: Invariant features of contrast
detection: an explanation in terms of self-similar detector
Note arrays. J. Opt. Soc. Am. 72, 83-87 (1982)
Marko, H.: Die Systemtheorie homogener Schichten.
The theorem that Gaussian blurring uniquely avoids Kybernetik 5, 221 (1969)
spurious resolution, but only for the case of one- Mart, D., Poggio, T., Ullman, S.: Bandpass channels, zero-
dimensional images, was brought to my attention by crossings, and early visual information processing. J. Opt.
Andrew Witkin at the M a r r Conference held at Cold Soc. Am. 69, 914-916 (1977)
Marr, D., Hildreth, E.: Theory of edge detection. P roc. Royal Soc.
Spring Harbor, 1983. Apparently the proof was Lond. B207, 187-217 (1980)
complicated and yielded no intuitive insight. I Roehler, R.: Ein Modell zur 6rtlich-zeitlichen Signal/ibertragung
immediately realized that an existing proof by m y s e l f - im visuellen System des Menschen auf der Basis der
that diffusion only destroys structure but cannot linearen Systemtheorie kontinuierlichen Medien. Biol.
generate it - could easily be adjusted to proof the Cybern. 22, 9%105 (1976)
theorem in a very simple manner in the more general 2- Spivak, M.: A comprehensive introduction to differential
dimensional case. (The one-dimensional case follows geometry, Vol. III. Berkeley, CA: Publish or Perish Inc. 1975
Thorn, R.: Stabilit6 structurelle et morphogen6s& Reading, MA:
directly from the 2-dimensional case.) Benjamin 1972
Witkin, A.P.: Scale-space filtering. Proc. of IJCAI, 1019-1021,
Karlsruhe 1983
References
Burt, P.J., Hong, Tsai-Hong, Rosenfeld, A.: Segmentation and Received: April 20, 1984
estimation of image region properties through cooperative
hierarchical computation. IEEE Trans. SMC-11, 802-825
(1981) Prof. Dr. J. J. Koenderink
Cayley, A.: On contour and slope lines. The London, Edinburgh, Rijksuniversiteit Utrecht
and Dublin Philosophical Magazine and J. of Science 18 Fysisch Laboratorium
(120), 264-268 (Oct. 1859) Princetonplein 5
Maxwell, J.C.:On hills and dales. The London, Edinburgh, and Postbus 80.000
Dublin Philosophical Magazine and J. of Science 4th Series 3508 TA Utrecht
40 (269), 421-425 (Dec. 1870) The Netherlands

You might also like