Computer Graphics - Opengl
Computer Graphics - Opengl
Computer Graphics
1
David M. Mount
Department of Computer Science
University of Maryland
Fall 2011
1
Copyright, David M. Mount, 2011, Dept. of Computer Science, University of Maryland, College Park, MD, 20742. These lecture notes were
prepared by David Mount for the course CMSC 427, Computer Graphics, at the University of Maryland. Permission to use, copy, modify, and
distribute these notes for educational purposes and without fee is hereby granted, provided that this copyright notice appear in all copies.
Lecture Notes 1 CMSC 427
Lecture 1: Introduction to Computer Graphics
Computer Graphics: Computer graphics is concerned with producing images and animations (or sequences of im-
ages) using a computer. The eld of computer graphics dates back to the early 1960s with Ivan Sutherland,
one of the pioneers of the eld. This began with the development of the (by current standards) very simple
software for performing the necessary mathematical transformations to produce simple line-drawings of 2- and
3-dimensional scenes. As time went on, and the capacity and speed of computer technology improved, succes-
sively greater degrees of realism were achievable. Today, it is possible to produce images that are practically
indistinguishable from photographic images (or at least that create a pretty convincing illusion of reality).
Computer graphics has grown tremedously over the past 2030 years with the advent of inexpensive interactive
display technology. The availability of high resolution, highly dynamic, colored displays has enabled computer
graphics to serve a role in intelligence amplication, where a human working in conjunction with a graphics-
enabled computer can engage in creative activities that would be difcult or impossible without this enabling
technology. An important aspect of this interaction is that vision is the sensory mode of highest bandwidth.
Because of the importance of vision and visual communication, computer graphics has found applications in
numerous areas of science, engineering, and entertainment. These include:
Computer-Aided Design: The design of 3-dimensional manufactured objects such as appliances, planes, au-
tomobiles, and the parts that they are made of.
Drug Design: The design and analysis drugs based on their geometric interactions with molecules such as
proteins and enzymes.
Architecture: Designing buildings by computer with the capability to perform virtual y throughs of the
structure and investigation of lighting properties.
Medical Imaging: Visualizations of the human body produced by 3-dimensional scanning technology.
Computational Simulations: Visualizations of physical simulations, such as airow analysis in computational
uid dynamics or stresses on bridges.
Entertainment: Film production and computer games.
Interaction versus Realism: One of the most important tradeoffs faced in the design of interactive computer graphics
systems is the balance between the speed of interactivity and degree of visual realism. To provide a feeling
of interaction, images should be rendered at speeds of at least 2030 frames (images) per second. However,
producing a high degree of realism at these speeds for very complex scenes is difcult. This difculty arises
from a number of sources:
Large Geometric Models: Large-scale models, such as factories, city-scapes, forests and jungles, and crowds
of people, can involve vast numbers of geometric elements.
Complex Geometry: Many natural objects (such as hair, fur, trees, plants, water, and clouds) have very sophis-
ticate geometric structure, and they move and interact in complex manners.
Complex Illumination: Many natural objects (such as human hair and skin, plants, and water) reect light in
complex and subtle ways.
The Scope of Computer Graphics: Graphics is both fun and challenging. The challenge arises from the fact that
computer graphics draws from so many different areas, including:
Mathematics and Geometry: Modeling geometric objects. Representing and manipulating surfaces and shapes.
Describing 3-dimensional transformations such as translation and rotation.
Physics (Kinetics): Understanding how physical objects behave when acted upon by various forces.
Physics (Illumination): Understanding how physical objects reect light.
Lecture Notes 2 CMSC 427
Computer Science: The design of efcient algorithms and data structures for rendering.
Software Engineering: Software design and organization for large and complex systems, such as computer
games.
High-Performance Computing: Interactive systems, like computer games, place high demands on the ef-
ciency of processing, which relies on parallel programming techniques. It is necessary to understand how
graphics processors work in order to produce the most efcient computation times.
The Scope of this Course: There has been a great deal of software produced to aid in the generation of large-scale
software systems for computer graphics. Our focus in this course will not be on how to use these systems
to produce these images. (If you are interested in this topic, you should take courses in the art technology
department). As in other computer science courses, our interest is not in how to use these tools, but rather in
understanding how these systems are constructed and how they work.
Course Overview: Given the state of current technology, it would be possible to design an entire university major
to cover everything (important) that is known about computer graphics. In this introductory course, we will
attempt to cover only the merest fundamentals upon which the eld is based. Nonetheless, with these funda-
mentals, you will have a remarkably good insight into how many of the modern video games and Hollywood
movie animations are produced. This is true since even very sophisticated graphics stem from the same basic
elements that simple graphics do. They just involve much more complex light and physical modeling, and more
sophisticated rendering techniques.
In this course we will deal primarily with the task of producing a both single images and animations from a 2- or
3-dimensional scene models. Over the course of the semester, we will build from a simple basis (e.g., drawing
a triangle in 3-dimensional space) all the way to complex methods, such as lighting models, texture mapping,
motion blur, morphing and blending, anti-aliasing.
Let us begin by considering the process of drawing (or rendering) a single image of a 3-dimensional scene. This
is crudely illustrated in the gure below. The process begins by producing a mathematical model of the object to
be rendered. Such a model should describe not only the shape of the object but its color, its surface nish (shiny,
matte, transparent, fuzzy, scaly, rocky). Producing realistic models is extremely complex, but luckily it is not our
main concern. We will leave this to the artists and modelers. The scene model should also include information
about the location and characteristics of the light sources (their color, brightness), and the atmospheric nature of
the medium through which the light travels (is it foggy or clear). In addition we will need to know the location
of the viewer. We can think of the viewer as holding a synthetic camera, through which the image is to be
photographed. We need to know the characteristics of this camera (its focal length, for example).
Light sources
Viewer
Image plane
Object model
Fig. 1: A typical rendering situation.
Based on all of this information, we need to perform a number of steps to produce our desired image.
Projection: Project the scene from 3-dimensional space onto the 2-dimensional image plane in our synthetic
camera.
Lecture Notes 3 CMSC 427
Color and shading: For each point in our image we need to determine its color, which is a function of the
objects surface color, its texture, the relative positions of light sources, and (in more complex illumination
models) the indirect reection of light off of other surfaces in the scene.
Surface Detail: Are the surfaces textured, either with color (as in a wood-grain pattern) or surface irregularities
(such as bumpiness).
Hidden surface removal: Elements that are closer to the camera obscure more distant ones. We need to deter-
mine which surfaces are visible and which are not.
Rasterization: Once we know what colors to draw for each point in the image, the nal step is that of mapping
these colors onto our display device.
By the end of the semester, you should have a basic understanding of how each of the steps is performed. Of
course, a detailed understanding of most of the elements that are important to computer graphics will beyond
the scope of this one-semester course. But by combining what you have learned here with other resources (from
books or the Web) you will know enough to, say, write a simple video game, write a program to generate highly
realistic images, or produce a simple animation.
The Course in a Nutshell: The process that we have just described involves a number of steps, from modeling to
rasterization. The topics that we cover this semester will consider many of these issues.
Basics:
Graphics Programming: OpenGL, graphics primitives, color, viewing, event-driven I/O, GL toolkit,
frame buffers.
Geometric Programming: Review of linear algebra, afne geometry, (points, vectors, afne transforma-
tions), homogeneous coordinates, change of coordinate systems.
Implementation Issues: Rasterization, clipping.
Modeling:
Model types: Polyhedral models, hierarchical models, fractals and fractal dimension.
Curves and Surfaces: Representations of curves and surfaces, interpolation, Bezier, B-spline curves and
surfaces, NURBS, subdivision surfaces.
Surface nish: Texture-, bump-, and reection-mapping.
Projection:
3-d transformations and perspective: Scaling, rotation, translation, orthogonal and perspective trans-
formations, 3-d clipping.
Hidden surface removal: Back-face culling, z-buffer method, depth-sort.
Issues in Realism:
Light and shading: Diffuse and specular reection, the Phong and Gouraud shading models, light trans-
port and radiosity.
Ray tracing: Ray-tracing model, reective and transparent objects, shadows.
Color: Gamma-correction, halftoning, and color models.
Although this order represents a reasonable way in which to present the material. We will present the topics
in a different order, mostly to suit our need to get material covered before major programming assignments.
Lecture Notes 4 CMSC 427
Lecture 2: Basics of Graphics Systems and Architectures
Elements of 2-dimensional Graphics: Computer graphics is all about rendering images (either realistic or stylistic)
by computer. The process of producing such images involves a number of elements. The most basic of these is
the generation of the simplest two-dimensional elements, from which complex scenes are then constructed. Let
us begin our exploration of computer graphics by discussing these simple two-dimensional primitives. Examples
of the primitive drawing elements include line segments, polylines, curves, lled regions, and text.
Polylines: A polyline (or more properly a polygonal curve) is a nite sequence of line segments joined end to
end. These line segments are called edges, and the endpoints of the line segments are called vertices. A
single line segment is a special case. A polyline is closed if it ends where it starts (see Fig. 2). It is simple if
it does not self-intersect. Self-intersections include such things as two edge crossing one another, a vertex
intersecting in the interior of an edge, or more than two edges sharing a common vertex. A simple, closed
polyline is also called a simple polygon.
closed polyline simple polyline (simple) polygon convex polygon
Fig. 2: Polylines.
If a polygon is simple and all its internal angle are at most 180
0 =
0 +v = v.
Note that we did not dene a zero point or origin for afne space. This is an intentional omission. No point
special compared to any other point. (We will eventually have to break down and dene an origin in order to
have a coordinate system for our points, but this is a purely representational necessity, not an intrinsic feature of
afne space.)
You might ask, why make a distinction between points and vectors? Although both can be represented in
the same way as a list of coordinates, they represent very different concepts. For example, points would be
appropriate for representing a vertex of a mesh, the center of mass of an object, the point of contact between two
colliding objects. In contrast, a vector would be appropriate for representing the velocity of a moving object,
the vector normal to a surface, the axis about which a rotating object is spinning. (As computer scientists the
idea of different abstract objects sharing a common representation should be familiar. For example, stacks and
queues are two different abstract data types, but they can both be represented as a 1-dimensional array.)
Because points and vectors are conceptually different, it is not surprising that the operations that can be applied
to them are different. For example, it makes perfect sense to multiply a vector and a scalar. Geometrically,
this corresponds to stretching the vector by this amount. It also makes sense to add two vectors together. This
involves the usual head-to-tail rule, which you learn in linear algebra. It is not so clear, however, what it means
to multiply a point by a scalar. (For example, the top of the Washington monument is a point. What would it
mean to multiply this point by 2?) On the other hand, it does make sense to add a vector to a point. For example,
if a vector points straight up and is 3 meters long, then adding this to the top of the Washington monument
would naturally give you a point that is 3 meters above the top of the monument.
We will use the following notational conventions. Points will usually be denoted by lower-case Roman letters
such as p, q, and r. Vectors will usually be denoted with lower-case Roman letters, such as u, v, and w, and
often to emphasize this we will add an arrow (e.g., u, v, w). Scalars will be represented as lower case Greek
letters (e.g., , , ). In our programs, scalars will be translated to Roman (e.g., a, b, c). (We will sometimes
violate these conventions, however. For example, we may use c to denote the center point of a circle or r to
denote the scalar radius of a circle.)
Afne Operations: The table belowlists the valid combinations of scalars, points, and vectors. The formal denitions
are pretty much what you would expect. Vector operations are applied in the same way that you learned in linear
algebra. For example, vectors are added in the usual tail-to-head manner (see Fig. 13). The difference pq of
two points results in a free vector directed fromq to p. Point-vector addition r +v is dened to be the translation
of r by displacement v. Note that some operations (e.g. scalar-point multiplication, and addition of points) are
explicitly not dened.
vector scalar vector, vector vector/scalar scalar-vector multiplication
vector vector + vector, vector vector vector vector-vector addition
vector point point point-point difference
point point + vector, point point vector point-vector addition
Lecture Notes 23 CMSC 427
u
v
u + v
q
p
p q
r
v
r + v
Vector addition Point subtraction Point-vector addition
Fig. 13: Afne operations.
Afne Combinations: Although the algebra of afne geometry has been careful to disallow point addition and scalar
multiplication of points, there is a particular combination of two points that we will consider legal. The operation
is called an afne combination.
Lets say that we have two points p and q and want to compute their midpoint r, or more generally a point r that
subdivides the line segment pq into the proportions and 1 , for some [0, 1]. (The case = 1/2 is the
case of the midpoint). This could be done by taking the vector q p, scaling it by , and then adding the result
to p. That is,
r = p +(q p).
Another way to think of this point r is as a weighted average of the endpoints p and q. Thinking of r in these
terms, we might be tempted to rewrite the above formula in the following (illegal) manner:
r = (1 )p +q.
Observe that as ranges from 0 to 1, the point r ranges along the line segment from p to q. In fact, we may
allow to become negative in which case r lies to the left of p (see Fig. 14), and if > 1, then r lies to the right
of q. The special case when 0 1, this is called a convex combination.
p
r = p +
2
3
(q p)
q
p
1
3
p +
2
3
q
q
p
q
< 0
0 < < 1
> 1
(1 )p + q
Fig. 14: Afne combinations.
In general, we dene the following two operations for points in afne space.
Afne combination: Given a sequence of points p
1
, p
2
, . . . , p
n
, an afne combination is any sum of the form
1
p
1
+
2
p
2
+. . . +
n
p
n
,
where
1
,
2
, . . . ,
n
are scalars satisfying
i
= 1.
Convex combination: Is an afne combination, where in addition we have
i
0 for 1 i n.
Afne and convex combinations have a number of nice uses in graphics. For example, any three noncollinear
points determine a plane. There is a 11 correspondence between the points on this plane and the afne combina-
tions of these three points. Similarly, there is a 11 correspondence between the points in the triangle determined
by the these points and the convex combinations of the points. In particular, the point (1/3)p+(1/3)q +(1/3)r
is the centroid of the triangle.
We will sometimes be sloppy, and write expressions of the following sort (which is clearly illegal).
r =
p +q
2
.
Lecture Notes 24 CMSC 427
We will allow this sort of abuse of notation provided that it is clear that there is a legal afne combination that
underlies this operation.
To see whether you understand the notation, consider the following questions. Given three points in the 3-space,
what is the union of all their afne combinations? (Ans: the plane containing the 3 points.) What is the union
of all their convex combinations? (Ans: The triangle dened by the three points and its interior.)
Euclidean Geometry: In afne geometry we have provided no way to talk about angles or distances. Euclidean
geometry is an extension of afne geometry which includes one additional operation, called the inner product.
The inner product is an operator that maps two vectors to a scalar. The product of u and v is denoted commonly
denoted (u, v). There are many ways of dening the inner product, but any legal denition should satisfy the
following requirements
Positiveness: (u, u) 0 and (u, u) = 0 if and only if u =
0.
Symmetry: (u, v) = (v, u).
Bilinearity: (u, v + w) = (u, v) + (u, w), and (u, v) = (u, v). (Notice that the symmetric forms follow by
symmetry.)
See a book on linear algebra for more information. We will focus on a the most familiar inner product, called the
dot product. To dene this, we will need to get our hands dirty with coordinates. Suppose that the d-dimensional
vector u is represented by the coordinate vector (u
0
, u
1
, . . . , u
d1
). Then dene
u v =
d1
i=0
u
i
v
i
,
Note that inner (and hence dot) product is dened only for vectors, not for points.
Using the dot product we may dene a number of concepts, which are not dened in regular afne geometry
(see Fig. 15). Note that these concepts generalize to all dimensions.
Length: of a vector v is dened to be |v| =
v v.
Normalization: Given any nonzero vector v, dene the normalization to be a vector of unit length that points
in the same direction as v, that is, v/|v|. We will denote this by v.
Distance between points: dist(p, q) = |p q|.
Angle: between two nonzero vectors u and v (ranging from 0 to ) is
ang(u, v) = cos
1
_
u v
|u||v|
_
= cos
1
( u v).
This is easy to derive from the law of cosines. Note that this does not provide us with a signed angle. We
cannot tell whether u is clockwise our counterclockwise relative to v. We will discuss signed angles when
we consider the cross-product.
Orthogonality: u and v are orthogonal (or perpendicular) if u v = 0.
Orthogonal projection: Given a vector u and a nonzero vector v, it is often convenient to decompose u into
the sum of two vectors u = u
1
+u
2
, such that u
1
is parallel to v and u
2
is orthogonal to v.
u
1
=
(u v)
(v v)
v, u
2
= u u
1
.
(As an exercise, verify that u
2
is orthogonal to v.) Note that we can ignore the denominator if we know
that v is already normalized to unit length. The vector u
1
is called the orthogonal projection of u onto v.
Lecture Notes 25 CMSC 427
v
u
u
v
u
1
u
2
Angle between vectors Orthogonal projection and its complement
Fig. 15: The dot product and its uses.
Lecture 6: More on Geometry and Geometric Programming
Bases, Vectors, and Coordinates: Last time we presented the basic elements of afne and Euclidean geometry:
points, vectors, and operations such as afne combinations. However, as of yet we have no mechanism for
dening these objects. Today we consider the lower level issues of how these objects are represented using
coordinate frames and homogeneous coordinates.
The rst question is how to represent points and vectors in afne space. We will begin by recalling how to
do this in linear algebra, and generalize from there. We know from linear algebra that if we have 2-linearly
independent vectors, u
0
and u
1
in 2-space, then we can represent any other vector in 2-space uniquely as a
linear combination of these two vectors (see Fig. 16(a)):
v =
0
u
0
+
1
u
1
,
for some choice of scalars
0
,
1
. Thus, given any such vectors, we can use them to represent any vector in
terms of a triple of scalars (
0
,
1
). In general d linearly independent vectors in dimension d is called a basis.
The most convenient basis to work with consists of two vectors, each of unit length, that are orthogonal to each
other. Such a collection of vectors is said to be orthonormal. The standard basis consisting of the x- and y-unit
vectors is orthonormal (see Fig. 16(b)).
u
0
u
1
v
u
0
u
1
(a)
v = 2u
0
+ 3u
1
x
y
(b)
Fig. 16: Bases and linear combinations in linear algebra (a) and the standard basis (b).
Note that we are using the term vector in two different senses here, one as a geometric entity and the other as
a sequence of numbers, given in the form of a row or column. The rst is the object of interest (i.e., the abstract
data type, in computer science terminology), and the latter is a representation. As is common in object oriented
programming, we should think in terms of the abstract object, even though in our programming we will have
to get dirty and work with the representation itself.
Coordinate Frames and Coordinates: Now let us turn from linear algebra to afne geometry. To dene a coordinate
frame for an afne space we would like to nd some way to represent any object (point or vector) as a sequence
of scalars. Thus, it seems natural to generalize the notion of a basis in linear algebra to dene a basis in afne
space. Note that free vectors alone are not enough to dene a point (since we cannot dene a point by any
Lecture Notes 26 CMSC 427
combination of vector operations). To specify position, we will designate an arbitrary a point, denoted o, to
serve as the origin of our coordinate frame. Observe that for any point p, p o is just some vector v. Such a
vector can be expressed uniquely as a linear combination of basis vectors. Thus, given the origin point o and
any set of basis vectors u
i
, any point p can be expressed uniquely as a sum of o and some linear combination of
the basis vectors:
p =
0
u
0
+
1
u
1
+
2
u
2
+ o,
for some sequence of scalars
0
,
1
,
2
. This is how we will dene a coordinate frame for afne spaces. In
general we have:
Denition: A coordinate frame for a d-dimensional afne space consists of a point, called the origin (which
we will denote o) of the frame, and a set of d linearly independent basis vectors.
In Fig. 17 we show a point p and vector
w = 2 F.e
0
+ 1 F.e
1
p = 1 G.e
0
+ 2 G.e
1
+G.o
w = 1 G.e
0
+ 0 G.e
1
Notice that the position of
w is immaterial, because in afne geometry vectors are free to oat where they like.
w
p
G.o
G.e
1
G.e
0
F.e
0
F.e
1
F.o
w
p
G.o
G.e
1
G.e
0
F.e
0
F.e
1
F.o
p
[F]
= (3, 2, 1) w
[F]
= (2, 1, 0) p
[G]
= (1, 2, 1) w
[F]
= (1, 0, 0)
Fig. 17: Coordinate Frames.
The Coordinate Axiom and Homogeneous Coordinates: Recall that our goal was to represent both points and vec-
tors as a list of scalar values. To put this on a more formal footing, we introduce the following axiom.
Coordinate Axiom: For every point p in afne space, 0 p =
0, and 1 p = p.
This is a violation of our rules for afne geometry, but it is allowed just to make the notation easier to understand.
Using this notation, we can now write the point and vector of the gure in the following way.
p = 3 F.e
0
+ 2 F.e
1
+ 1 F.o
w = 2 F.e
0
+ 1 F.e
1
+ 0 F.o
Thus, relative to the coordinate frame F = F.e
0
, F.e
1
, F.o, we can express p and
w as coordinate vectors
relative to frame F as
p
[F]
=
_
_
3
2
1
_
_
and
w
[F]
=
_
_
2
1
0
_
_
.
Lecture Notes 27 CMSC 427
We will call these homogeneous coordinates relative to frame F. In some linear algebra conventions, vectors are
written as row vectors and some as column vectors. We will stick with OpenGLs conventions, of using column
vectors, but we may be sloppy from time to time.
As we said before, the term vector has two meanings: one as a free vector in an afne space, and now as a
coordinate vector. Usually, it will be clear from context which meaning is intended.
In general, to represent points and vectors in d-space, we will use coordinate vectors of length d+1. Points have
a last coordinate of 1, and vectors have a last coordinate of 0. Some authors put the homogenizing coordinate
rst rather than last. There are actually good reasons for doing this. But we will stick with standard engineering
conventions and place it last.
Properties of homogeneous coordinates: The choice of appending a 1 for points and a 0 for vectors may seem to
be a rather arbitrary choice. Why not just reverse them or use some other scalar values? The reason is that this
particular choice has a number of nice properties with respect to geometric operations.
For example, consider two points p and q whose coordinate representations relative to some frame F are p
[F]
=
(3, 2, 1)
T
and q
[F]
= (5, 1, 1)
T
, respectively. Consider the vector
v = p q.
If we apply the difference rule that we dened last time for points, and then convert this vector into it coordinates
relative to frame F, we nd that v
[F]
= (2, 1, 0)
T
. Thus, to compute the coordinates of p q we simply take
the component-wise difference of the coordinate vectors for p and q. The 1-components nicely cancel out, to
give a vector result (see Fig. 18).
p
F.e
0
F.e
1
F.o
q
p q
p
[F]
=
3
2
1
q
[F]
=
5
1
1
(p q)
[F]
=
2
1
0
e
x
e
y
e
z
u
x
u
y
u
z
v
x
v
y
v
z
.
Here e
x
, e
y
, and e
z
are the three coordinate unit vectors for the standard basis. Note that the cross product is
only dened for a pair of free vectors and only in 3-space. Furthermore, we ignore the homogeneous coordinate
here. The cross product has the following important properties:
Skew symmetric: u v = (v u) (see Fig. 19(b)). It follows immediately that u u = 0 (since it is equal
to its own negation).
Nonassociative: Unlike most other products that arise in algebra, the cross product is not associative. That is
(u v) w ,= u (v w).
Bilinear: The cross product is linear in both arguments. For example:
u (v) = (u v),
u (v + w) = (u v) + (u w).
Perpendicular: If u and v are not linearly dependent, then u v is perpendicular to u and v, and is directed
according the right-hand rule.
Lecture Notes 29 CMSC 427
Angle and Area: The length of the cross product vector is related to the lengths of and angle between the
vectors. In particular:
[u v[ = [u[[v[ sin ,
where is the angle between u and v. The cross product is usually not used for computing angles because
the dot product can be used to compute the cosine of the angle (in any dimension) and it can be computed
more efciently. This length is also equal to the area of the parallelogram whose sides are given by u and
v. This is often useful.
The cross product is commonly used in computer graphics for generating coordinate frames. Given two basis
vectors for a frame, it is useful to generate a third vector that is orthogonal to the rst two. The cross product
does exactly this. It is also useful for generating surface normals. Given two tangent vectors for a surface, the
cross product generate a vector that is normal to the surface.
Orientation: Given two real numbers p and q, there are three possible ways they may be ordered: p < q, p = q, or
p > q. We may dene an orientation function, which takes on the values +1, 0, or 1 in each of these cases.
That is, Or
1
(p, q) = sign(q p), where sign(x) is either 1, 0, or +1 depending on whether x is negative, zero,
or positive, respectively. An interesting question is whether it is possible to extend the notion of order to higher
dimensions.
The answer is yes, but rather than comparing two points, in general we can dene the orientation of d + 1
points in d-space. We dene the orientation to be the sign of the determinant consisting of their homogeneous
coordinates (with the homogenizing coordinate given rst). For example, in the plane and 3-space the orientation
of three points p, q, r is dened to be
Or
2
(p, q, r) = sign det
_
_
1 1 1
p
x
q
x
r
x
p
y
q
y
r
y
_
_
, Or
3
(p, q, r, s) = sign det
_
_
_
_
1 1 1 1
p
x
q
x
r
x
s
x
p
y
q
y
r
y
s
y
p
z
q
z
r
z
s
z
_
_
_
_
.
What does orientation mean intuitively? The orientation of three points in the plane is +1 if the triangle PQR
is oriented counter-clockwise, 1 if clockwise, and 0 if all three points are collinear (see Fig. 20). In 3-space, a
positive orientation means that the points follow a right-handed screw, if you visit the points in the order PQRS.
A negative orientation means a left-handed screw and zero orientation means that the points are coplanar. Note
that the order of the arguments is signicant. The orientation of (p, q, r) is the negation of the orientation of
(p, r, q). As with determinants, the swap of any two elements reverses the sign of the orientation.
p q
r
q
p
r
r
q
p
p
q
r
s
p
r
q
s
Or(p, q, r) = +1
Or(p, q, r) = 0
Or(p, q, r) = 1
Or(p, q, r, s) = +1
Or(p, q, r, s) = 1
Fig. 20: Orientations in 2 and 3 dimensions.
You might ask why put the homogeneous coordinate rst? The answer a mathematician would give you is that
is really where it should be in the rst place. If you put it last, then positive oriented things are right-handed in
even dimensions and left-handed in odd dimensions. By putting it rst, positively oriented things are always
right-handed in orientation, which is more elegant. Putting the homogeneous coordinate last seems to be a
convention that arose in engineering, and was adopted later by graphics people.
The value of the determinant itself is the area of the parallelogram dened by the vectors q p and r p, and
thus this determinant is also handy for computing areas and volumes. Later we will discuss other methods.
Lecture Notes 30 CMSC 427
Lecture 7: Drawing in OpenGL: Transformations
More about Drawing: So far we have discussed how to draw simple 2-dimensional objects using OpenGL. Suppose
that we want to draw more complex scenes. For example, we want to draw objects that move and rotate or to
change the projection. We could do this by computing (ourselves) the coordinates of the transformed vertices.
However, this would be inconvenient for us. It would also be inefcient. OpenGL provides methods for down-
loading large geometric specications directly to the GPU. However, if the coordinates of these object were
changed with each display cycle, this would negate the benet of loading them just once.
For this reason, OpenGL provides tools to handle transformations. Today we consider how this is done in
2-space. This will form a foundation for the more complex transformations, which will be needed for 3-
dimensional viewing.
Transformations: Linear and afne transformations are central to computer graphics. Recall from your linear alge-
bra class that a linear transformation is a mapping in a vector space that preserves linear combinations. Such
transformations include rotations, scalings, shearings (which stretch rectangles into parallelograms), and com-
binations thereof.
As you might expect, afne transformations are transformations that preserve afne combinations. For example,
if p and q are two points and m is their midpoint, and T is an afne transformation, then the midpoint of T(p)
and T(q) is T(m). Important features of afne transformations include the facts that they map straight lines to
straight lines, they preserve parallelism, and they can be implemented through matrix multiplication. They arise
in various ways in graphics.
Moving Objects: As needed in animations.
Change of Coordinates: This is used when objects that are stored relative to one reference frame are to be
accessed in a different reference frame. One important case of this is that of mapping objects stored in a
standard coordinate system to a coordinate system that is associated with the camera (or viewer).
Projection: Such transformations are used to project objects from the idealized drawing window to the view-
port, and mapping the viewport to the graphics display window. (We shall see that perspective projection
transformations are more general than afne transformations, since they may not preserve parallelism.)
Mapping between Surfaces: This is useful when textures are mapped onto object surfaces as part of texture
mapping.
OpenGL has a very particular model for how transformations are performed. Recall that when drawing, it
was convenient for us to rst dene the drawing attributes (such as color) and then draw a number of objects
using that attribute. OpenGL uses much the same model with transformations. You specify a transformation
rst, and then this transformation is automatically applied to every object that is drawn afterwards, until the
transformation is set again. It is important to keep this in mind, because it implies that you must always set the
transformation prior to issuing drawing commands.
Because transformations are used for different purposes, OpenGL maintains three sets of matrices for perform-
ing various transformation operations. These are:
Modelview matrix: Used for transforming objects in the scene and for changing the coordinates into a form
that is easier for OpenGL to deal with. (It is used for the rst two tasks above).
Projection matrix: Handles parallel and perspective projections. (Used for the third task above.)
Texture matrix: This is used in specifying how textures are mapped onto objects. (Used for the last task
above.)
We will discuss the texture matrix later in the semester, when we talk about texture mapping. There is one more
transformation that is not handled by these matrices. This is the transformation that maps the viewport to the
display. It is set by glViewport().
Lecture Notes 31 CMSC 427
Understanding how OpenGL maintains and manipulates transformations through these matrices is central to
understanding how OpenGL and other modern immediate-mode rendering systems (such as DirectX) work.
Matrix Stacks: For each matrix type, OpenGL maintains a stack of matrices. The current matrix is the one on the
top of the stack. It is the matrix that is being applied at any given time. The stack mechanism allows you to save
the current matrix (by pushing the stack down) and restoring it later (by popping the stack). We will discuss the
entire process of implementing afne and projection transformations later in the semester. For now, well give
just basic information on OpenGLs approach to handling matrices and transformations.
OpenGL has a number of commands for handling matrices. In order to know which matrix (Modelview, Pro-
jection, or Texture) to which an operation applies, you can set the current matrix mode. This is done with the
following command
glMatrixMode(mode);
where mode is either GL MODELVIEW, GL PROJECTION, or GL TEXTURE. The default mode is GL MODELVIEW.
GL MODELVIEW is by far the most common mode, the convention in OpenGL programs is to assume that
you are always in this mode. If you want to modify the mode for some reason, you rst change the mode
to the desired mode (GL PROJECTION or GL TEXTURE), perform whatever operations you want, and then
immediately change the mode back to GL MODELVIEW.
Once the matrix mode is set, you can perform various operations to the stack. OpenGL has an unintuitive way
of handling the stack. Note that most operations below (except glPushMatrix()) alter the contents of the matrix
at the top of the stack.
glLoadIdentity(): Sets the current matrix to the identity matrix.
glLoadMatrix*(M): Loads (copies) a given matrix over the current matrix. (The * can be either f or d
depending on whether the elements of M are GLoat or GLdouble, respectively.)
glMultMatrix*(M): Post-multiplies the current matrix by a given matrix and replaces the current matrix with
this result. Thus, if C is the current matrix on top of the stack, it will be replaced with the matrix product
C M. (As above, the * can be either f or d depending on M.)
glPushMatrix(): Pushes a copy of the current matrix on top the stack. (Thus the stack now has two copies of
the top matrix.)
glPopMatrix(): Pops the current matrix off the stack.
Warning: OpenGL assumes that all matrices are 4 4 homogeneous matrices, stored in column-major order.
That is, a matrix is presented as an array of 16 values, where the rst four values give column 0 (for x), then
column 1 (for y), then column 2 (for z), and nally column 3 (for the homogeneous coordinate, usually called
w). For example, given a matrix M and vector v, OpenGL assumes the following representation:
M v =
_
_
_
_
m[0] m[4] m[8] m[12]
m[1] m[5] m[9] m[13]
m[2] m[6] m[10] m[14]
m[3] m[7] m[11] m[15]
_
_
_
_
_
_
_
_
v[0]
v[1]
v[2]
v[3]
_
_
_
_
An example is shown in Fig. 21. We will discuss how matrices like M are presented to OpenGL later in the
semester. There are a number of other matrix operations, which we will also discuss later.
Automatic Evaluation and the Transformation Pipeline: Now that we have described the matrix stack, the next
question is how do we apply the matrix to some point that we want to transform? Understanding the answer
is critical to understanding how OpenGL (and actually display processors) work. The answer is that it happens
automatically. In particular, every vertex (and hence virtually every geometric object that is drawn) is passed
through a series of matrices, as shown in Fig. 22. This may seem rather inexible, but it is because of the
Lecture Notes 32 CMSC 427
A
B
C
initial
stack
A
B
I
load
identity
A
B
M
load
matrix(M)
A
B
M
mult
matrix(T)
A
B
M
push
matrix
MT
A
B
M
pop
matrix
M
Fig. 21: Matrix stack operations.
simple uniformity of sending every vertex through this transformation sequence that makes graphics cards run
so fast. As mentioned above, these transformations behave much like drawing attributesyou set them, do
some drawing, alter them, do more drawing, etc.
Modelview
Matrix
Projection
Matrix
Viewport
Transform
Perspective
normalization
and clipping
Standard
coordinates
Camera (or eye)
coordinates
Normalized
device
coordinates
Window
coordinates
Points
(glVertex)
Fig. 22: Transformation pipeline.
A second important thing to understand is that OpenGLs transformations do not alter the state of the objects
you are drawing. They simply modify things before they get drawn. For example, suppose that you draw a unit
square (U = [0, 1] [0, 1]) and pass it through a matrix that scales it by a factor of 5. The square U itself has
not changed; it is still a unit square. If you wanted to change the actual representation of U to be a 5 5 square,
then you need to perform your own modication of Us representation.
You might ask, what if I do not want the current transformation to be applied to some object? The answer is,
tough luck. There are no exceptions to this rule (other than commands that act directly on the viewport). If
you do not want a transformation to be applied, then to achieve this, you load an identity matrix on the top of
the transformation stack, then do your (untransformed) drawing, and nally pop the stack.
Example: Rotating a Rectangle (rst attempt): The Modelview matrix is useful for applying transformations to
objects, which would otherwise require you to perform your own linear algebra. Suppose that rather than
drawing a rectangle that is aligned with the coordinate axes, you want to draw a rectangle that is rotated by 20
degrees (counterclockwise) and centered at some point (x, y). The desired result is shown in Fig. 23. Of course,
as mentioned above, you could compute the rotated coordinates of the vertices yourself (using the appropriate
trigonometric functions), but OpenGL provides a way of doing this transformation more easily.
Suppose that we are drawing within the square, 0 x, y 10, and we have a 4 4 sized rectangle to be drawn
centered at location (x, y). We could draw an unrotated rectangle with the following command:
glRectf(x - 2, y - 2, x + 2, y + 2);
Note that the arguments should be of type GLoat (2.0f rather than 2), but we will let the compiler cast the
integer constants to oating point values for us.
Now let us draw a rotated rectangle. Let us assume that the matrix mode is GL MODELVIEW (this is the default).
Generally, there will be some existing transformation (call it M) currently present in the Modelview matrix.
Lecture Notes 33 CMSC 427
(x, y)
20
10
0
0 10
4
4
Fig. 23: Desired drawing. (Rotated rectangle is shaded).
This usually represents some more global transformation, which is to be applied on top of our rotation. For this
reason, we will compose our rotation transformation with this existing transformation.
Because the OpenGL rotation function destroys the contents of the Modelview matrix, we will begin by saving
it, by using the command glPushMatrix(). Saving the Modelview matrix in this manner is not always required,
but it is considered good form. Then we will compose the current matrix M with an appropriate rotation matrix
R. Then we draw the rectangle (in upright form). Since all points are transformed by the Modelview matrix
prior to projection, this will have the effect of rotating our rectangle. Finally, we will pop off this matrix (so
future drawing is not rotated).
To perform the rotation, we will use the command glRotatef(ang, x, y, z). All arguments are GLoats. (Or, recall-
ing OpenGLs naming convention, we could use glRotated() which takes GLdouble arguments.) This command
constructs a matrix that performs a rotation in 3-dimensional space counterclockwise by angle ang degrees,
about the vector (x, y, z). It then composes (or multiplies) this matrix with the current Modelview matrix. In
our case the angle is 20 degrees. To achieve a rotation in the (x, y) plane the vector of rotation would be the
z-unit vector, (0, 0, 1). Here is how the code might look (but beware, this conceals a subtle error).
Drawing an Rotated Rectangle (First Attempt)
glPushMatrix(); // save the current matrix
glRotatef(20, 0, 0, 1); // rotate by 20 degrees CCW
glRectf(x-2, y-2, x+2, y+2); // draw the rectangle
glPopMatrix(); // restore the old matrix
(x, y)
20
10
0
0 10
Fig. 24: The actual drawing produced by the previous example. (Rotated rectangle is shaded).
The order of the rotation relative to the drawing command may seem confusing at rst. You might think,
Shouldnt we draw the rectangle rst and then rotate it?. The key is to remember that whenever you draw
(using glRectf() or glBegin()...glEnd()), the points are automatically transformed using the current Modelview
matrix. So, in order to do the rotation, we must rst modify the Modelview matrix, then draw the rectangle. The
rectangle will be automatically transformed into its rotated state. Popping the matrix at the end is important,
otherwise future drawing requests would also be subject to the same rotation.
Lecture Notes 34 CMSC 427
Example: Rotating a Rectangle (correct): Something is wrong with this example given above. What is it? The
answer is that the rotation is performed about the origin of the coordinate system, not about the center of the
rectangle and we want.
Fortunately, there is an easy x. Conceptually, we will draw the rectangle centered at the origin, then rotate it by
20 degrees, and nally translate (or move) it by the vector (x, y). To do this, we will need to use the command
glTranslatef(x, y, z). All three arguments are GLoats. (And there is version with GLdouble arguments.) This
command creates a matrix which performs a translation by the vector (x, y, z), and then composes (or multiplies)
it with the current matrix. Recalling that all 2-dimensional graphics occurs in the z = 0 plane, the desired
translation vector is (x, y, 0).
So the conceptual order is (1) draw, (2) rotate, (3) translate. But remember that you need to set up the transfor-
mation matrix before you do any drawing. That is, if v represents a vertex of the rectangle, and R is the rotation
matrix and T is the translation matrix, and M is the current Modelview matrix, then we want to compute the
product
M(T(R(v))) = M T R v.
Since M is on the top of the stack, we need to rst apply translation (T) to M, and then apply rotation (R) to the
result, and then do the drawing (v). Note that the order of application is the exact reverse from the conceptual
order. This may seems confusing (and it is), so remember the following rule.
Drawing/Transformation Order in OpenGLs
First, conceptualize your intent by drawing about the origin and then applying the appro-
priate transformations to map your object to its desired location. Then implement this by
applying transformations in reverse order, and do your drawing. It is always a good idea to
enclose everything in a push-matrix and pop-matrix pair.
Although this may seem backwards, it is the way in which almost all object transformations are performed in
OpenGL:
(1) Push the matrix stack,
(2) Apply (i.e., multiply) all the desired transformation matrices with the current matrix, but in the reverse
order from which you would like them to be applied to your object,
(3) Draw your object (the transformations will be applied automatically), and
(4) Pop the matrix stack.
The nal and correct fragment of code for the rotation is shown in the code block below.
Drawing an Rotated Rectangle (Correct)
glPushMatrix(); // save the current matrix (M)
glTranslatef(x, y, 0); // apply translation (T)
glRotatef(20, 0, 0, 1); // apply rotation (R)
glRectf(-2, -2, 2, 2); // draw rectangle at the origin
glPopMatrix(); // restore the old matrix (M)
Projection Revisited: Last time we discussed the use of gluOrtho2D() for doing simple 2-dimensional projection.
This call does not really do any projection. Rather, it computes the desired projection transformation and
multiplies it times whatever is on top of the current matrix stack. So, to use this we need to do a few things. First,
set the matrix mode to GL PROJECTION, load an identity matrix (just for safety), and the call gluOrtho2D().
Because of the convention that the Modelview mode is the default, we will set the mode back when we are done.
If you only set the projection once, then initializing the matrix to the identity is typically redundant (since this
is the default value), but it is a good idea to make a habit of loading the identity for safety. If the projection does
Lecture Notes 35 CMSC 427
Two Dimensional Projection
glMatrixMode(GL_PROJECTION); // set projection matrix
glLoadIdentity(); // initialize to identity
gluOrtho2D(left, right, bottom top); // set the drawing area
glMatrixMode(GL_MODELVIEW); // restore Modelview mode
not change throughout the execution of our program, and so we include this code in our initializations. It might
be put in the reshape callback if reshaping the window alters the projection.
How is it done (Optional): How does gluOrtho2D() and glViewport() set up the desired transformation from the ide-
alized drawing window to the viewport? Well, actually OpenGL does this in two steps, rst mapping from the
window to canonical 2 2 window centered about the origin, and then mapping this canonical window to the
viewport. The reason for this intermediate mapping is that the clipping algorithms are designed to operate on this
xed sized window (recall the gure given earlier). The intermediate coordinates are often called normalized
device coordinates.
As an exercise in deriving linear transformations, let us consider doing this all in one shot. Let W denote
the idealized drawing window and let V denote the viewport. Let w
r
, w
l
, w
b
, and w
t
denote the left, right,
bottom and top of the window. Dene v
r
, v
l
, v
b
, and v
t
similarly for the viewport. We wish to derive a linear
transformation that maps a point (x, y) in window coordinates to a point (x
, y
, y
)
v
t
v
b
v
l
w
r
window
viewport
Fig. 25: Window to Viewport transformation.
Let f(x, y) denote the desired transformation. Since the function is linear, and it operates on x and y indepen-
dently, we have
(x
, y
) = f(x, y) = (s
x
x +t
x
, s
y
y +t
y
),
where s
x
, t
x
, s
y
and t
y
, depend on the window and viewport coordinates. Lets derive what s
x
and t
x
are using
simultaneous equations. We know that the x-coordinates for the left and right sides of the window (w
l
and w
r
)
should map to the left and right sides of the viewport (v
l
and v
r
). Thus we have
s
x
w
l
+t
x
= v
l
and s
x
w
r
+t
x
= v
r
.
We can solve these equations simultaneously. By subtracting them to eliminate t
x
we have
s
x
=
v
r
v
l
w
r
w
l
.
Plugging this back into to either equation and solving for t
x
we have
t
x
= v
l
s
x
w
l
= v
l
v
r
v
l
w
r
w
l
w
l
=
v
l
w
r
v
r
w
l
w
r
w
l
.
Lecture Notes 36 CMSC 427
A similar derivation for s
y
and t
y
yields
s
y
=
v
t
v
b
w
t
w
b
t
y
=
v
b
w
t
v
t
w
b
w
t
w
b
.
These four formulas give the desired nal transformation.
f(x, y) =
_
(v
r
v
l
)x + (v
l
w
r
v
r
w
l
)
w
r
w
l
,
(v
t
v
b
)y + (v
b
w
t
v
t
w
b
)
w
t
w
b
_
.
This can be expressed in matrix form as
_
_
_
_
_
v
r
v
l
w
r
w
l
0
v
l
w
r
v
r
w
l
w
r
w
l
0
v
t
v
b
w
t
w
b
v
b
w
t
v
t
w
b
w
t
w
b
0 0 1
_
_
_
_
_
_
_
x
y
1
_
_
,
which is essentially what OpenGL stores internally.
Lecture 8: Afne Transformations
Afne Transformations: So far we have been stepping through the basic elements of geometric programming. We
have discussed points, vectors, and their operations, and coordinate frames and how to change the representation
of points and vectors from one frame to another. Our next topic involves how to map points from one place to
another. Suppose you want to draw an animation of a spinning ball. How would you dene the function that
maps each point on the ball to its position rotated through some given angle?
We will consider a limited, but interesting class of transformations, called afne transformations. These include
(among others) the following transformations of space: translations, rotations, uniform and nonuniform scalings
(stretching the axes by some constant scale factor), reections (ipping objects about a line) and shearings
(which deform squares into parallelograms). They are illustrated in Fig. 26.
rotation translation uniform
scaling
nonuniform
scaling
reection shearing
Fig. 26: Examples of afne transformations.
These transformations all have a number of things in common. For example, they all map lines to lines. Note that
some (translation, rotation, reection) preserve the lengths of line segments and the angles between segments.
Others (like uniform scaling) preserve angles but not lengths. Others (like nonuniform scaling and shearing) do
not preserve angles or lengths.
All of the transformation listed above preserve basic afne relationships. (In fact, this is the denition of an
afne transformation.) For example, given any transformation T of one of the above varieties, and given two
points p and q, and any scalar ,
r = (1 )p +q T(r) = (1 )T(p) +T(q).
Lecture Notes 37 CMSC 427
(We will leave the proof that each of the above transformations is afne as an exercise.) Putting this more
intuitively, if r is the midpoint of segment PQ, before applying the transformation, then it is the midpoint after
the transformation.
Matrix Representation of Afne Transformations: Let us concentrate on transformations in 3-space. An important
consequence of the preservation of afne relations is the following.
r =
0
F.e
0
+
1
F.e
1
+
2
F.e
2
+
3
o
T(r) =
0
T(F.e
0
) +
1
T(F.e
1
) +
2
T(F.e
2
) +
3
T(o).
Here
3
is either 0 (for vectors) or 1 (for points). The equation on the left is the representation of a point or
vector r in terms of the coordinate frame F. This implication shows that if we know the image of the frame
elements under the transformation, then we know the image r under the transformation.
From the previous lecture we know that the homogeneous coordinate representation of r relative to frame F
is r[F] = (
0
,
1
,
2
,
3
)
T
. (Recall that the superscript T in this context means to transpose this row vector
into a column vector, and should not be confused with the transformation T.) Thus, we can express the above
relationship in the following matrix form.
T(r)[F] =
_
T(F.e
0
)[F]
T(F.e
1
)[F]
T(F.e
2
)[F]
T(F.o)[F]
_
_
_
_
_
3
_
_
_
_
.
Here the columns of the array are the representation (relative to F) of the images of the elements of the frame
under T. This implies that applying an afne transformation (in coordinate form) is equivalent to multiplying
the coordinates by a matrix. In dimension d this is a (d + 1) (d + 1) matrix.
If this all seems a bit abstract. In the remainder of the lecture we will give some concrete examples of trans-
formations. Rather than considering this in the context of 2-dimensional transformations, lets consider it in
the more general setting of 3-dimensional transformations. The two dimensional cases can be extracted by just
ignoring the rows and columns for the z-coordinates.
Translation: Translation by a xed vector v maps any point p to p+v. Note that, since vectors have no position
in space, free vectors are not altered by translation. (See Fig. 27.)
Suppose that relative to the standard frame, v[F] = (
x
,
y
,
z
, 0)
T
are the homogeneous coordinates
of v. The three unit vectors are unaffected by translation, and the origin is mapped to o + v, whose
homogeneous coordinates are (
x
,
y
,
z
, 1). Thus, by the rule given earlier, the homogeneous matrix
representation for this translation transformation is
T(v) =
_
_
_
_
1 0 0
x
0 1 0
y
0 0 1
z
0 0 0 1
_
_
_
_
.
This is the matrix used by OpenGL in the call glTranslatef(
x
,
y
,
z
).
Scaling: Uniform scaling is a transformation which is performed relative to some central xed point. We will
assume that this point is the origin of the standard coordinate frame. (We will leave the general case as
an exercise.) Given a scalar , this transformation maps the object (point or vector) with coordinates
(
x
,
y
,
z
,
w
)
T
to (
x
,
y
,
z
,
w
)
T
.
In general, it is possible to specify separate scaling factors for each of the axes. This is called nonuniform
scaling. The unit vectors are each stretched by the corresponding scaling factor, and the origin is unmoved.
Lecture Notes 38 CMSC 427
o e
0
e
1
o + v
e
0
e
1
o e
0
e
1
2e
0
2e
1
o
translation by v uniform scaling by 2
Fig. 27: Derivation of transformation matrices.
Thus, the transformation matrix has the following form:
S(
x
,
y
,
z
) =
_
_
_
_
x
0 0 0
0
y
0 0
0 0
z
0
0 0 0 1
_
_
_
_
.
Observe that both points and vectors are altered by scaling. This is the matrix used by OpenGL in the call
glScalef(
x
,
y
,
z
).
Reection: A reection in the plane is given a line and maps points by ipping the plane about this line. A
reection in 3-space is given a plane, and ips points in space about this plane. In this case, reection is
just a special case of scaling, but where the scale factor is negative. For example, to reect points about the
yz-coordinate plane, we want to scale the x-coordinate by 1. Using the scaling matrix above, we have
the following transformation matrix:
F
x
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
.
The cases for the other two coordinate frames are similar. Reection about an arbitrary line or plane is left
as an exercise.
Rotation: In its most general form, rotation is dened to take place about some xed point, and around some
xed vector in space. We will consider the simplest case where the xed point is the origin of the coordinate
frame, and the vector is one of the coordinate axes. There are three basic rotations: about the x, y and
z-axes. In each case the rotation is through an angle (given in radians). The rotation is assumed to be in
accordance with a right-hand rule: if your right thumb is aligned with the axes of rotation, then positive
rotation is indicated by your ngers.
Consider the rotation about the z-axis. The z-unit vector and origin are unchanged. The x-unit vector
is mapped to (cos , sin , 0, 0)
T
, and the y-unit vector is mapped to (sin , cos , 0, 0)
T
(see Fig. 28).
Thus the rotation matrix is:
R
z
() =
_
_
_
_
cos sin 0 0
sin cos 0 0
0 0 1 0
0 0 0 1
_
_
_
_
.
Observe that both points and vectors are altered by rotation. This is the matrix used by OpenGL in the call
glRotatef( 180/, 0, 0, 1).
Lecture Notes 39 CMSC 427
x
y
(cos , sin )
(sin , cos )
z
Rotation (about z) Shear (along x and y)
(h
x
, h
y
)
z
x
y
z
x
y
Fig. 28: Rotation and shearing.
For the other two axes we have
R
x
() =
_
_
_
_
1 0 0 0
0 cos sin 0
0 sin cos 0
0 0 0 1
_
_
_
_
, R
y
() =
_
_
_
_
cos 0 sin 0
0 1 0 0
sin 0 cos 0
0 0 0 1
_
_
_
_
.
Shearing: A shearing transformation is perhaps the hardest of the group to visualize. Think of a shear as a
transformation that maps a square into a parallelogram by sliding one side parallel to itself while keeping
the opposite side xed. In 3-dimensional space, it maps a cube into a parallelepiped by sliding one face
parallel while keeping the opposite face xed (see Fig. 28). We will consider the simplest form, in which
we start with a unit cube whose lower left corner coincides with the origin. Consider one of the axes, say
the z-axis. The face of the cube that lies on the xy-coordinate plane does not move. The face that lies
on the plane z = 1, is translated by a vector (h
x
, h
y
). In general, a point p = (p
x
, p
y
, p
z
, 1) is translated
by the vector p
z
(h
x
, h
y
, 0, 0). This vector is orthogonal to the z-axis, and its length is proportional to the
z-coordinate of p. This is called an xy-shear. (The yz- and xz-shears are dened analogously.)
Under the xy-shear, the origin and x- and y-unit vectors are unchanged. The z-unit vector is mapped to
(h
x
, h
y
, 1, 0)
T
. Thus the matrix for this transformation is:
H
xy
(h
x
, h
y
) =
_
_
_
_
1 0 h
x
0
0 1 h
y
0
0 0 1 0
0 0 0 1
_
_
_
_
.
Shears involving any other pairs of axes are dened analogously.
H
yz
(h
y
, h
z
) =
_
_
_
_
1 0 0 0
h
y
1 0 0
h
z
0 1 0
0 0 0 1
_
_
_
_
H
zx
(h
z
, h
x
) =
_
_
_
_
1 h
x
0 0
0 1 0 0
0 h
z
1 0
0 0 0 1
_
_
_
_
.
Lecture 9: 3-d Viewing and Projections
Viewing in OpenGL: For the next couple of lectures we will discuss how viewing and perspective transformations
are handled for 3-dimensional scenes. In OpenGL, and most similar graphics systems, the process involves
the following basic steps, of which the perspective transformation is just one component. We assume that all
objects are initially represented relative to a standard 3-dimensional coordinate frame, in what are called world
coordinates.
Modelview transformation: Maps objects (actually vertices) fromtheir world-coordinate representation to one
that is centered around the viewer. The resulting coordinates are variously called camera coordinates, view
coordinates, or eye coordinates. (Specied by the OpenGL command gluLookAt.)
Lecture Notes 40 CMSC 427
projection: This projects points in 3-dimensional eye-coordinates to points on a plane called the image plane.
This projection process consists of three separate parts: the projection transformation (afne part), clip-
ping, and perspective normalization. Each will be discussed below. The output coordinates are called
normalized device coordinates. (Specied by the OpenGL commands such as gluOrtho2D, glOrtho, gl-
Frustum, and gluPerspective.)
Mapping to the viewport: Convert the point from these idealized normalized device coordinates to the view-
port. The coordinates are called window coordinates or viewport coordinates. (Specied by the OpenGL
command glViewport.)
We have ignored a number of issues, such as lighting and hidden surface removal. These will be considered
separately later. The process is illustrated in Fig. 29. We have already discussed the viewport transformation, so
it sufces to discuss the rst two transformations.
eye
v
x
v
y
v
z
Image Plane
View Frame
Scene
Viewport
Viewport transformation
Fig. 29: OpenGL Viewing Process.
Converting to Viewer-Centered Coordinate System: As we shall see below, the perspective transformation is sim-
plest when the center of projection, the location of the viewer, is the origin and the image plane (sometimes
called the projection plane or view plane), onto which the image is projected, is orthogonal to one of the axes,
say the z-axis. Let us call these camera coordinates. However the user represents points relative to a coordinate
system that is convenient for his/her purposes. Let us call these world coordinates. This suggests that, prior
to performing the perspective transformation, we perform a change of coordinate transformation to map points
from world coordinates to camera coordinates.
In OpenGL, there is a nice utility for doing this. The procedure gluLookAt generates the desired transformation to
perform this change of coordinates and multiplies it times the transformation at the top of the current transforma-
tion stack. (Recall OpenGLs transformation structure from the previous lecture on OpenGL transformations.)
This should be done in Modelview mode.
Conceptually, this change of coordinates is performed last, after all other Modelview transformations are per-
formed, and immediately before the projection. By the reverse rule of OpenGL transformations, this implies
that this change of coordinates transformation should be the rst transformation on the Modelview transfor-
mation matrix stack. Thus, it is almost always preceded by loading the identity matrix. Here is the typical
calling sequence. This should be called when the camera position is set initially, and whenever the camera is
(conceptually) repositioned in space.
The arguments are all of type GLdouble. The arguments consist of the coordinates of two points and vector,
in the standard coordinate system. The point eye = (e
x
, e
y
, e
z
)
T
is the viewpoint, that is the location of they
viewer (or the camera). To indicate the direction that the camera is pointed, a central point at which the camera
is directed is given by at = (a
x
, a
y
, a
z
)
T
. The at point is signicant only in that it denes the viewing vector,
which indicates the direction that the viewer is facing. It is dened to be at eye (see Fig. 30).
Lecture Notes 41 CMSC 427
Typical Structure of Redisplay Callback
void myDisplay ( ) {
// clear the buffer
glClear( GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );
glLoadIdentity( ); // start fresh
// set up camera frame
gluLookAt(eyeX, eyeY, eyeZ, atX, atY, atZ, upX, upY, upZ);
myWorld.draw( ); // draw your scene
glutSwapBuffers( ); // make it all appear
}
eye
v
x
v
y
v
z view direction
w
z
w
y
w
x
View Frame World Frame
at
up
Fig. 30: The world frame, parameters to gluLookAt, and the camera frame.
These points dene the position and direction of the camera, but the camera is still free to rotate about the
viewing direction vector. To x last degree of freedom, the vector
up = (u
x
, u
y
, u
z
)
T
provides the direction
that is up relative to the camera. Under typical circumstances, this would just be a vector pointing straight up
(which might be (0, 0, 1)
T
in your world coordinate system). In some cases (e.g. in a ight simulator, when the
plane banks to one side) you might want to have this vector pointing in some other direction (e.g., up relative
to the pilots orientation). This vector need not be perpendicular to the viewing vector. However, it cannot be
parallel to the viewing direction vector.
The Camera Frame: OpenGL uses the arguments to gluLookAt to construct a coordinate frame centered at the viewer.
The x- and y-axes are directed to the right and up, respectively, relative to the viewer. It might seem natural that
the z-axes be directed in the direction that the viewer is facing, but this is not a good idea.
To see why, we need to discuss the distinction between right-handed and left-handed coordinate systems. Con-
sider an orthonormal coordinate system with basis vectors v
x
, v
y
and v
z
. This system is said to be right-handed
if v
x
v
y
= v
z
, and left-handed otherwise (v
x
v
y
= v
z
). Right-handed coordinate systems are used by
default throughout mathematics. (Otherwise computation of orientations is all screwed up.) Given that the x-
and y-axes are directed right and up relative to the viewer, if the z-axis were to point in the direction that the
viewer is facing, this would result in left-handed coordinate system. The designers of OpenGL wisely decided
to stick to a right-handed coordinate system, which requires that the z-axes is directed opposite to the viewing
direction.
Building the Camera Frame: How does OpenGL implement this change of coordinate transformation? This turns
out to be a nice exercise in geometric computation, so lets try it. We want to construct an orthonormal frame
whose origin is the point eye, whose z-basis vector is parallel to the view vector, and such that the
up vector
projects to the up direction in the nal projection. (This is illustrated in the Fig. 31, where the x-axis is pointing
outwards from the page.)
Let V = (V.v
x
, V.v
y
, V.v
z
, V.o)
T
denote this frame, where (v
x
, v
y
, v
z
)
T
are the three unit vectors for the frame
and o is the origin. Clearly V.o = eye. As mentioned earlier, the view vector
view is directed from eye to at.
Lecture Notes 42 CMSC 427
at
V.v
y
V.v
z
V.v
x
up
eye eye
Fig. 31: The camera frame.
The z-basis vector is the normalized negation of this vector.
view
(Recall that normalization operation divides a vector by its length, thus resulting in a vector having the same
direction and unit length.)
Next, we want to select the x-basis vector for our camera frame. It should be orthogonal to the viewing direction,
it should be orthogonal to the up vector, and it should be directed to the cameras right. Recall that the cross
product will produce a vector that is orthogonal to any pair of vectors, and directed according to the right hand
rule. Also, we want this vector to have unit length. Thus we choose
V.v
x
= normalize(
view
up).
The result of the cross product must be a nonzero vector. This is why we require that the view direction and up
vector are not parallel to each other. We have two out of three vectors for our frame. We can extract the last one
by taking a cross product of the rst two.
V.v
y
= (V.v
z
V.v
x
).
There is no need to normalize this vector, because it is the cross product of two orthogonal vectors, each of unit
length.
Camera Transformation Matrix (Optional): Now, all we need to do is to construct the change of coordinates matrix
from the standard world frame W to our camera frame V . We will not dwell on the linear algebra details, but
the change of coordinate matrix is formed by considering the matrix M whose columns are the basis elements
of V relative to W, and then inverting this matrix. The matrix before inversion is:
M =
_
(V.v
x
)
[W]
(V.v
y
)
[W]
(V.v
z
)
[W]
(V.o)
[W]
_
=
_
_
_
_
v
xx
v
yx
v
zx
o
x
v
xy
v
yy
v
zy
o
y
v
xz
v
yz
v
zz
o
z
0 0 0 1
_
_
_
_
.
OpenGL uses some tricks to compute the inverse, M
1
, efciently. Normally, inverting a matrix would involve
invoking a linear algebra procedure (e.g., based on Gauss elimination). However, because M is constructed
from an orthonormal frame, there is a much easier way to construct the inverse. In particular, the upper 3 3
portion of the matrix can be inverted by taking its transpose. Let R be the linear part of matrix M, and let T be
the negation of the translation part:
R =
_
_
_
_
v
xx
v
yx
v
zx
0
v
xy
v
yy
v
zy
0
v
xz
v
yz
v
zz
0
0 0 0 1
_
_
_
_
and T =
_
_
_
_
1 0 0 o
x
0 1 0 o
y
0 0 1 o
z
0 0 0 1
_
_
_
_
.
Lecture Notes 43 CMSC 427
It can be shown that the nal invertex matrix is given by the following formula.
M
1
= R
T
T.
Projections: The next part of the process involves performing the projection. Projections fall into two basic groups,
parallel projections, in which the lines of projection are parallel to one another, and perspective projection, in
which the lines of projection converge a point.
In spite of their supercial similarities, parallel and perspective projections behave quite differently with respect
to geometry. Parallel projections are afne transformations, while perspective projections are not. (In particular,
perspective projections do not preserve parallelism, as is evidenced by a perspective view of a pair of straight
train tracks, which appear to converge at the horizon.) Because parallel projections are rarely used, we will skip
them and consider perspective projections only.
Perspective Projection: Perspective transformations are the domain of an interesting area of mathematics called
projective geometry. Let us assume that we are 3-dimensional space, and (through the use of the view transfor-
mation) we assume that objects are represented in camera coordinates. Projective transformations map lines to
lines. However, projective transformations are not afne, since (except for the special case of parallel projection)
do not preserve afne combinations and do not preserve parallelism. For example, consider the perspective pro-
jection T shown in Fig. 32. Let r be the midpoint of segment pq. As seen in the gure, T(r) is not necessarily
the midpoint of T(p) and T(q).
p
q
r
T(p)
T(r)
T(q)
eye
Fig. 32: Perspective transformations do not necessarily preserve afne combinations, since the midpoint of pq does
not map to the midpoint of the projected segment.
Projective Geometry: In order to gain a deeper understanding of projective transformations, it is best to start with an
introduction to projective geometry. Projective geometry was developed in the 17th century by mathematicians
interested in the phenomenon of perspective. Intuitively, the basic idea that gives rise to projective geometry is
rather simple, but its consequences are somewhat surprising.
In Euclidean geometry we know that two distinct lines intersect in exactly one point, unless the two lines are
parallel to one another. This special case seems like an undesirable thing to carry around. Suppose we make the
following simplifying generalization. In addition to the regular points in the plane (with nite coordinates) we
will also add a set of ideal points (or points at innity) that reside innitely far away. Now, we can eliminate the
special case and say that every two distinct lines intersect in a single point. If the lines are parallel, then they
intersect at an ideal point. But there seem to be two such ideal points (one at each end of the parallel lines).
Since we do not want lines intersecting more than once, we just imagine that the projective plane wraps around
so that two ideal points at the opposite ends of a line are equal to each other. This is very elegant, since all lines
behave much like closed curves (somewhat like a circle of innite radius).
For example, in Fig. 33(a), the point p is a point at innity. Since p is innitely far away it does have a position
(in the sense of afne space), but it can be specied by pointing to it, that is, by a direction. All lines that are
parallel to one another along this direction intersect at p. In the plane, the union of all the points at innity forms
a line, called the line at innity. (In 3-space the corresponding entity is called the plane at innity.) Note that
every other line intersects the line at innity exactly once. The regular afne plane together with the points and
line at innity dene the projective plane. It is easy to generalize this to arbitrary dimensions as well.
Lecture Notes 44 CMSC 427
Although the points at innity seem to be special in some sense, an important tenet of projective geometry is that
they are essentially no different from the regular points. In particular, when applying projective transformations
we will see that regular points may be mapped to points at innity and vice versa.
Orientability and the Projective Space: Projective geometry appears to both generalize and simplify afne geom-
etry, so why we just dispensed with afne geometry and use projective geometry instead? The reason is that,
along with the good, come some rather strange consequences. For example, the projective plane wraps around
itself in a rather strange way. In particular, it does not form a sphere as you might expect. (Try cutting it out of
paper and gluing the edges together if you need proof.)
One nice feature of the Euclidean planes is that each line partitions the plane into two halves, one above and one
below (or left and right, if the line is vertical). This is not true for the projective plane (since each ideal point is
both above and below and given line).
As another example of the strange things that occur in projective geometry, consider Fig. 33. Consider two
ideal points p and q on the projective plane. We start with a standard clock. Imagine that we translate the clock
through innity, so that it passes between p and q, with the little hand pointing from p to q. When it wraps
around, the little hand still points from p to q, but in order to achieve this (like a M obius strip), the clock has
ipped upside-down, thus changing its orientation. (If we were to follow the same course, it would ip back
to its proper orientation on the second trip.) The implication is that there is no consistent way to dene the
concepts of clockwise and counterclockwise in projective geometry.
p
p
p
p
q
q
(a) (b)
Little hand points
from p to q
Little hand points
from p to q
My watch is
running backwards!
Fig. 33: The wacky world of projective geometry. The projective plane behaves much like M obius strip. As you wrap
around through innity, there is a twist, which ips orientations.
In topological terms, we say that the projective plane is a nonorientable manifold. In contrast, the Euclidean
plane and the sphere are both orientable surfaces.
For these reasons, we choose not to use projective space as a domain in which to do most of our geometric
computations. Instead, we will do almost all of our geometrical computations in the afne plane. We will
briey enter the domain of projective geometry to do our projective transformations. We will have to take care
that when object map to points to innity, since we cannot map these points back to Euclidean space.
New Homogeneous Coordinates: How do we represent points in projective space? It turns out that we can do this
by homogeneous coordinates. However, there are some differences with the homogeneous coordinates that we
introduced with afne geometry. First off, we will not deal with free vectors in projective space, just points.
Consider a regular point p in the plane, with standard (nonhomogeneous) coordinates (x, y)
T
. There will not be
a unique representation for this point in projective space. Rather, it will be represented by any coordinate vector
Lecture Notes 45 CMSC 427
of the form:
_
_
w x
w y
w
_
_
, for w ,= 0.
Thus, if p = (4, 3)
T
are ps standard Cartesian coordinates, the homogeneous coordinates (4, 3, 1)
T
, (8, 6, 2)
T
,
and (12, 9, 3)
T
are all legal representations of p in projective plane. Because of its familiarity, we will use
the case w = 1 most often.
Given the homogeneous coordinates of a regular point p = (x, y, w)
T
, the projective normalization of p is the
coordinate vector (x/w, y/w, 1)
T
. (This term is confusing, because it is quite different from the process of
length normalization, which maps a vector to one of unit length. In computer graphics this operation is also
referred as perspective division or perspective normalization.)
How do we represent ideal points? Consider a line passing through the origin with slope of 2. The following is
a list of the homogeneous coordinates of some of the points lying on this line:
_
_
1
2
1
_
_
,
_
_
2
4
1
_
_
,
_
_
3
6
1
_
_
,
_
_
4
8
1
_
_
, . . . ,
_
_
x
2x
1
_
_
.
Clearly these are equivalent to the following
_
_
1
2
1
_
_
,
_
_
1
2
1/2
_
_
,
_
_
1
2
1/3
_
_
,
_
_
1
2
1/4
_
_
, . . . ,
_
_
1
2
1/x
_
_
.
(This is illustrated in Fig. 34.) We can see that as x tends to innity, the limiting point has the homogeneous
coordinates (1, 2, 0)
T
. So, when w = 0, the point (x, y, w)
T
is the point at innity, that is pointed to by the
vector (x, y)
T
(and (x, y)
T
as well by wraparound).
(1, 2, 1)
(2, 4, 1)
(3, 6, 1)
(x, 2x, 1)
(a)
(1, 2, 1)
(1, 2, 1/2)
(1, 2, 1/3)
(1, 2, 1/x)
(b)
limit: (1, 2, 0)
Fig. 34: Homogeneous coordinates for ideal points.
Important Note: In spite of the similarity of the names, homogeneous coordinates in projective geometry and
homogeneous coordinates in afne are entirely different concepts, and should not be mixed. This is because the
two geometric systems are entirely different.
Lecture 10: More on 3-d Viewing and Projections
Perspective Projection Transformations: We shall see today that is possible to dene a general perspective projec-
tion using a 44 matrix, just as we did with afne transformations. However, due to the differences in projective
Lecture Notes 46 CMSC 427
homogeneous coordinates, we will need treat projective transformations somewhat differently. We assume that
we will be transforming points only, not vectors. (Typically we will be transforming the vertices of geometric
objects, e.g., as presented by calls to glVertex.) Let us assume for now that the points to be transformed are all
strictly in front of the eye. We will see that objects behind the eye must eventually be clipped away, but we will
consider this later.
Let us consider the following viewing situation. We assume that the center of projection is located at the origin
of the coordinate frame. This is normally the camera coordinate frame, as generated by gluLookAt. The viewer
is facing the z direction. (Recall that this needed so that the coordinate frame is right-handed.) The x-axis
points to the viewers right and the y-axis points upwards relative to the viewer (see Fig. 35(a)).
Suppose that we are projecting points onto a projection plane that is orthogonal to the z-axis and is located at
distance d from the origin along the z axis. (Note that d is given as a positive number, not a negative. This is
consistent with OpenGLs conventions.) Since it is hard to draw good perspective drawings in 3-space, we will
take a side view and consider just the y and z axes for now (see Fig. 35(b)). Everything we do with y we will
do symmetrically with x later.
(a) (b)
x z
y
y
y
z
d
z
Image plane
p = (y, z)
p
y
z/d
, d
Fig. 35: Perspective transformation. (On the right, imagine that the x-axis is pointing towards you.)
Consider a point p = (y, z)
T
in the plane. (Note that z is negative but d is positive.) Where should this point
be projected to on the image plane? Let p
= (y
, z
)
T
denote the coordinates of this projection. By similar
triangles it is easy to see that the following ratios are equal:
y
z
=
y
d
,
implying that y
= y/(z/d) (see Fig. 35(b)). We also have z = d. Generalizing this to 3-space, the point
with coordinates (x, y, z, 1)
T
is transformed to the point with homogeneous coordinates
_
_
_
_
x/(z/d)
y/(z/d)
d
1
_
_
_
_
.
Unfortunately, there is no 4 4 matrix that can realize this result. (Generally a matrix will map a point
(x, y, z, 1)
T
to a point whose coordinates are of the form ax + by + cz + d. The problem here is that z is
in the denominator.)
However, there is a 44 matrix that will generate an equivalent point, with respect to homogeneous coordinates.
Lecture Notes 47 CMSC 427
In particular, if we multiply the above vector by (z/d) we obtain:
_
_
_
_
x
y
z
z/d
_
_
_
_
.
The coordinates of this vector are all linear function of x, y, and z, and so we can write the perspective transfor-
mation in terms of the following matrix.
M =
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1/d 0
_
_
_
_
.
After we have the coordinates of a (afne) transformed point p
=
z
.
Depending on the values we choose for and , this is a (nonlinear) monotonic function of z. In particular,
depth increases as the z-values decrease (since we view down the negative z-axis), so if we set < 0, then the
depth value z
_
_
_
_
_
_
1
1
(f +n)
n f
+
2f
n f
1
_
_
_
_
_
_
=
_
_
_
_
1
1
1
1
_
_
_
_
.
Notice that this is the upper corner of the canonical view volume on the near (z = 1) side, as desired.
Similarly, consider a point that lies on the bottom side of the view frustum. We have z/(y) = cot /2 = c,
implying that y = z/c. If we take the point to lie on the far clipping plane, then we have z = f, and so
y = f/c. Further, if we assume that it lies on the lower left corner of the frustum (relative to the viewers
position) then x = af/c. Thus the homogeneous coordinates of the lower corner on the far clipping plane
(shown as a black dot in Fig. 37) are (af/c, f/c, f, 1)
T
. If we apply the above transformation, this is
mapped to
M
_
_
_
_
af/c
f/c
f
1
_
_
_
_
=
_
_
_
_
_
_
f
f
f(f +n)
n f
+
2fn
n f
f
_
_
_
_
_
_
_
_
_
_
_
_
1
1
(f +n)
n f
+
2n
n f
1
_
_
_
_
_
_
=
_
_
_
_
1
1
1
1
_
_
_
_
.
This is the lower corner of the canonical view volume on the far (z = 1) side, as desired.
Lecture 11: Lighting and Shading
Lighting and Shading: We will now take a look at the next major element of graphics rendering: light and shading.
This is one of the primary elements of generating realistic images. This topic is the beginning of an important
shift in approach. Up until now, we have discussed graphics from are purely mathematical (geometric) perspec-
tive. Light and reection brings us to issues involved with the physics of light and color and the physiological
aspects of how humans perceive light and color.
What we see is a function of the light that enters our eye. Light sources generate energy, which we may
think of as being composed of extremely tiny packets of energy, called photons. The photons are reected and
transmitted in various ways throughout the environment. They bounce off various surfaces and may be scattered
by smoke or dust in the air. Eventually, some of them enter our eye and strike our retina. We perceive the
resulting amalgamation of photons of various energy levels in terms of color. The more accurately we can
simulate this physical process, the more realistic lighting will be. Unfortunately, computers are not fast enough
to produce a truly realistic simulation of indirect reections in real time, and so we will have to settle for much
simpler approximations.
OpenGL, like most interactive graphics systems, supports a very simple lighting and shading model, and hence
can achieve only limited realism. This was done primarily because speed is of the essence in interactive graphics.
OpenGL assumes a local illumination model, which means that the shading of a point depends only on its
relationship to the light sources, without considering the other objects in the scene.
This is in contrast to a global illumination model, in which light reected or passing through one object might
affects the illumination of other objects. Global illumination models deal with many affects, such as shadows,
indirect illumination, color bleeding (colors from one object reecting and altering the color of a nearby object),
caustics (which result when light passes through a lens and is focused on another surface). An example of some
of the differences between a local and global illumination model are shown in Fig. 38.
For example, OpenGLs lighting model does not model shadows, it does not handle indirect reection from other
objects (where light bounces off of one object and illuminates another), it does not handle objects that reect
Lecture Notes 52 CMSC 427
Global Illumination Model Local Illumination Model
Fig. 38: Local versus global illumination models.
or refract light (like metal spheres and glass balls). OpenGLs light and shading model was designed to be very
efcient. Although it is not physically realistic, the OpenGL designers provided many ways to fake realistic
illumination models. Modern GPUs support programmable shaders, which offer even greater realism, but we
will not discuss these now.
Light: A detailed discussion of light and its properties would take us more deeply into physics than we care to go.
For our purposes, we can imagine a simple model of light consisting of a large number of photons being emitted
continuously from each light source. Each photon has an associated energy, which (when aggregated over
millions of different reected photons) we perceive as color. Although color is complex phenomenon, for our
purposes it is sufcient to consider color to be a modeled as a triple of red, green, and blue components. (We
will consider color later this semester.)
The strength or intensity of the light at any location can be modeled in terms of the ux, that is, the amount of
illumination energy passing through a xed area over a xed amount of time. Assuming that the atmosphere is a
vacuum (in particular there is no smoke or fog), a photon of light travels unimpeded until hitting a surface, after
which one of three things can happen (see Fig. 39):
Reection: The photon can be reected or scattered back into the atmosphere. If the surface were perfectly
smooth (like a mirror or highly polished metal) the refection would satisfy the rule angle of incidence
equals angle of reection and the result would be a mirror-like and very shiny in appearance. On the
other hand, if the surface is rough at a microscopic level (like foam rubber, say) then the photons are
scattered nearly uniformly in all directions. We can further distinguish different varieties of reection:
Pure reection: Perfect mirror-like reectors
Specular reection: Imperfect reectors like brushed metal and shiny plastics.
Diffuse reection: Uniformly scattering, and hence not shiny.
Absorption: The photon can be absorbed into the surface (and hence dissipates in the form of heat energy).
We do not see this light. Thus, an object appears to be green, because it reects photons in the green part
of the spectrum and absorbs photons in the other regions of the visible spectrum.
Transmission: The photon can pass through the surface. This happens perfectly with transparent objects (like
glass and polished gem stones) and with a signicant amount of scattering with translucent objects (like
human skin or a thin piece of tissue paper).
All of the above involve how incident light reacts with a surface. Another way that light may result from a
surface is through emission, which will be discussed below.
Of course, real surfaces possess various combinations of these elements, and these element can interact in
complex ways. For example, human skin and many plastics are characterized by a complex phenomenon called
Lecture Notes 53 CMSC 427
Light source
perfect
reection
specular
reection
diuse
reection
absorption perfect
transmission
translucent
transmission
Fig. 39: The ways in which a photon of light can interact with a surface.
subsurface scattering, in which light is transmitted under the surface and then bounces around and is reected
at some other point.
Light Sources: Before talking about light reection, we need to discuss where the light originates. In reality, light
sources come in many sizes and shapes. They may emit light in varying intensities and wavelengths according
to direction. The intensity of light energy is distributed across a continuous spectrum of wavelengths.
To simplify things, OpenGL assumes that each light source is a point, and that the energy emitted can be
modeled as an RGB triple, called a luminance function. This is described by a vector with three components
L = (L
r
, L
g
, L
b
), which indicate the intensities of red, green, and blue light respectively. We will not concern
ourselves with the exact units of measurement, since this is very simple model. Note that, although your display
device will have an absolute upper limit on how much energy each color component of each pixel can generate
(which is typically modeled as an 8-bit value in the range from 0 to 255), in theory there is no upper limit on the
intensity of light. (If you need evidence of this, go outside and stare at the sun for a while!)
Lighting in real environments usually involves a considerable amount of indirect reection between objects of
the scene. If we were to ignore this effect and simply consider a point to be illuminated only if it can see the
light source, then the resulting image in which objects in the shadows are totally black. In indoor scenes we are
accustomed to seeing much softer shading, so that even objects that are hidden from the light source are partially
illuminated. In OpenGL (and most local illumination models) this scattering of light modeled by breaking the
light sources intensity into two components: ambient emission and point emission.
Ambient emission: Refers to light that does not come from any particular location. Like heat, it is assumed to
be scattered uniformly in all locations and directions. A point is illuminated by ambient emission even if
it is not visible from the light source.
Point emission: Refers to light that originates from a single point. In theory, point emission only affects points
that are directly visible to the light source. That is, a point p is illuminate by light source q if and only if
the open line segment pq does not intersect any of the objects of the scene.
Unfortunately, determining whether a point is visible to a light source in a complex scene with thousands of
objects can be computationally quite expensive. So OpenGL simply tests whether the surface is facing towards
the light or away from the light. Surfaces in OpenGL are polygons, but let us consider this in a more general
setting. Suppose that have a point p lying on some surface. Let n denote the normal vector at p, directed
outwards from the objects interior, and let denote the directional vector from p to the light source ( = q p),
then p will be illuminated if and only if the angle between these vectors is acute. We can determine this by
testing whether their dot produce is positive, that is, n > 0.
For example, in the Fig. 40, the point p is illuminated. In spite of the obscuring triangle, point p
is also
illuminated, because other objects in the scene are ignored by the local illumination model. The point p
is
clearly not illuminated, because its normal is directed away from the light.
Lecture Notes 54 CMSC 427
Light source
n
n
n
illuminated illuminated
not illuminated
p
p
q
Fig. 40: Point light source visibility using a local illumination model. Note that p
n
r
viewer
v
n
r
(a) (b)
Fig. 42: Vectors used in Phong Shading.
Normal vector: A vector n that is perpendicular to the surface and directed outwards from the surface. There
are a number of ways to compute normal vectors, depending on the representation of the underlying object.
For our purposes, the following simple method is sufcient. Given any three noncollinear points, p
0
, p
1
,
and p
2
, on a polygon, we can compute a normal to the surface of the polygon as a cross product of two of
the associated vectors.
n = normalize((p
1
p
0
) (p
2
p
0
)).
The vector will be directed outwards if the triple p
0
, p
1
, p
2
has a counterclockwise orientation when seen
from outside.
View vector: A vector v that points in the direction of the viewer (or camera).
Light vector: A vector
that points towards the light source.
Reection vector: A vector r that indicates the direction of pure reection of the light vector. (Based on the law
that the angle of incidence with respect to the surface normal equals the angle of reection.) The reection
vector computation reduces to an easy exercise in vector arithmetic. First, let us decompose
into two
parts, one parallel to n and one orthogonal to n. Since n is of unit length (and recalling the properties of
the dot product) we have
where
=
(n
)
(n n)
n = (n
)n and
.
To get r observe that we need add two copies of
to
. Thus we have
r =
2
=
2(
) = 2(n
)n
.
Lecture Notes 57 CMSC 427
Halfway vector: A vector
h that is midway between
and v. Since this is half way between
and v, and
both have been normalized to unit length, we can compute this by simply averaging these two vectors and
normalizing (assuming that they are not pointing in exactly opposite directions). Since we are normalizing,
the division by 2 for averaging is not needed.
h = normalize
_
+v
2
_
= normalize(
+v).
Phong Lighting Equations: There almost no objects that are pure diffuse reectors or pure specular reectors. The
Phong reection model is based on the simple modeling assumption that we can model any (nontextured) ob-
jects surface to a reasonable extent as some mixture of purely diffuse and purely specular components of
reection along with emission and ambient reection. Let us ignore emission for now, since it is the rarest of
the group, and will be easy to add in at the end of the process.
The surface material properties of each object will be specied by a number of parameters, indicating the
intrinsic color of the object and its ambient, diffuse, and specular reectance. Let C denote the RGB factors of
the objects base color. For consistency with OpenGLs convention, we assume that the lights energy is given
by three RGB vectors: its ambient intensity L
a
, its diffuse intensity L
d
, and its specular intensity, L
s
. (In any
realistic physical model, these three are all equal to each other).
Ambient light: Ambient light is the simplest to deal with. Let I
a
denote the intensity of reected ambient light. For
each surface, let
0
a
1
denote the surfaces coefcient of ambient reection, that is, the fraction of the ambient light that is reected
from the surface. The ambient component of illumination is
I
a
=
a
L
a
C
Note that this is a vector equation (whose components are RGB).
Diffuse reection: Diffuse reection arises from the assumption that light from any direction is reected uniformly
in all directions. Such an reector is called a pure Lambertian reector. The physical explanation for this type
of reection is that at a microscopic level the object is made up of microfacets that are highly irregular, and these
irregularities scatter light uniformly in all directions.
The reason that Lambertian reectors appear brighter in some parts that others is that if the surface is facing (i.e.
perpendicular to) the light source, then the energy is spread over the smallest possible area, and thus this part of
the surface appears brightest. As the angle of the surface normal increases with respect to the angle of the light
source, then an equal among of the lights energy is spread out over a greater fraction of the surface, and hence
each point of the surface receives (and hence reects) a smaller amount of light.
It is easy to see from the Fig. 43 that as the angle between the surface normal n and the vector to the light
source
increases (up to a maximum of 90 degrees) then amount of light intensity hitting a small differential
area of the surface dA is proportional to the area of the perpendicular cross-section of the light beam, dAcos .
The is called Lamberts Cosine Law.
The key parameter of surface nish that controls diffuse reection is
d
, the surfaces coefcient of diffuse re-
ection. Let I
d
denote the diffuse component of the light source. If we assume that
). If (n
) < 0, then the point is on the dark side of the object. The diffuse component
of reection is:
I
d
=
d
max(0, n
)L
d
C.
This is subject to attenuation depending on the distance of the object from the light source.
Lecture Notes 58 CMSC 427
n
dA
light
energy
n
dA
light
energy
(a) (b)
dAcos
h) be the geometric parameter which will dene the strength of the specular component. (The
original Phong model uses the factor (r v) instead.)
The parameters of surface nish that control specular reection are
s
, the surfaces coefcient of specular
reection, and shininess, denoted (see Fig. 44). As increases, the specular reection drops off more quickly,
and hence the size of the resulting shiny spot on the surface appears smaller as well. Shininess values range
from 1 for low specular reection up to, say, 1000, for highly specular reection. The formula for the specular
component is
I
s
=
s
max(0, n
h)
L
s
.
As with diffuse, this is subject to attenuation.
(a) (b)
diuse specular
Fig. 44: Diffuse and specular reection.
Putting it all together: Combining this with I
e
(the light emitted from an object), the total reected light from a point
on an object of color C, being illuminated by a light source L, where the point is distance d from the light source
using this model is:
I = I
e
+I
a
+
1
a +bd +cd
2
(I
d
+I
s
)
= I
e
+
a
L
a
C +
1
a +bd +cd
2
(
d
max(0, n
)L
d
C +
s
max(0, n
h)
L
s
),
Lecture Notes 59 CMSC 427
As before, note that this a vector equation, computed separately for the R, G, and B components of the lights
color and the objects color. For multiple light sources, we add up the ambient, diffuse, and specular components
for each light source.
Lighting and Shading in OpenGL: To describe lighting in OpenGL there are three major steps that need to be per-
formed: setting the lighting and shade model (smooth or at), dening the lights, their positions and properties,
and nally dening object material properties.
Lighting/Shading model: There are a number of global lighting parameters and options that can be set through
the command glLightModel*(). It has two forms, one for scalar-valued parameters and one for vector-valued
parameters.
glLightModelf(GLenum pname, GLfloat param);
glLightModelfv(GLenum pname, const GLfloat
*
params);
Perhaps the most important parameter is the global intensity of ambient light (independent of any light sources).
Its pname is GL LIGHT MODEL AMBIENT and params is a pointer to an RGBA vector.
One important issue is whether polygons are to be drawn using at shading, in which every point in the polygon
has the same shading, or smooth shading, in which shading varies across the surface by interpolating the vertex
shading. This is set by the following command, whose argument is either GL SMOOTH(the default) or GL FLAT.
glShadeModel(GL_SMOOTH); --OR-- glShadeModel(GL_FLAT);
In theory, shading interplation can be handled in one of two ways. In the classical Gouraud interpolation the
illumination is computed exactly at the vertices (using the above formula) and the values are interpolated across
the polygon. In Phong interpolation, the normal vectors are given at each vertex, and the system interpolates
these vectors in the interior of the polygon. Then this interpolated normal vector is used in the above lighting
equation. This produces more realistic images, but takes considerably more time. OpenGL uses Gouraud
shading. Just before a vertex is given (with glVertex*()), you should specify its normal vertex (with glNormal*()).
The commands glLightModel and glShadeModel are usually invoked in your initializations.
Create/Enable lights: To use lighting in OpenGL, rst you must enable lighting, through a call to glEnable(GL LIGHTING).
OpenGL allows the user to create up to 8 light sources, named GL LIGHT0 through GL LIGHT7. Each light
source may either be enabled (turned on) or disabled (turned off). By default they are all disabled. Again, this
is done using glEnable() (and glDisable()). The properties of each light source is set by the command glLight*().
This command takes three arguments, the name of the light, the property of the light to set, and the value of this
property.
Let us consider a light source 0, whose position is (2, 4, 5, 1)
T
in homogeneous coordinates, and which has a
red ambient intensity, given as the RGB triple (0.9, 0, 0), and white diffuse and specular intensities, given as the
RGB triple (1.2, 1.2, 1.2). (Normally all the intensities will be of the same color, albeit of different strengths.
We have made them different just to emphasize that it is possible.) There are no real units of measurement
involved here. Usually the values are adjusted manually by a designer until the image looks good.
Light intensities are actually expressed in OpenGL as RGBA, rather than just RGB triples. The A component
can be used for various special effects, but for now, let us just assume the default situation by setting A to 1.
Here is an example of how to set up such a light in OpenGL. The procedure glLight*() can also be used for setting
other light properties, such as attenuation.
Dening Surface Materials (Colors): When lighting is in effect, rather than specifying colors using glColor() you
do so by setting the material properties of the objects to be rendered. OpenGL computes the color based on the
lights and these properties. Surface properties are assigned to vertices (and not assigned to faces as you might
Lecture Notes 60 CMSC 427
Setting up a simple lighting situation
glClearColor(0.0, 1.0, 0.0, 1.0); // intentionally background
glEnable(GL_NORMALIZE); // normalize normal vectors
glShadeModel(GL_SMOOTH); // do smooth shading
glEnable(GL_LIGHTING); // enable lighting
// ambient light (red)
GLfloat ambientIntensity[4] = {0.9, 0.0, 0.0, 1.0};
glLightModelfv(GL_LIGHT_MODEL_AMBIENT, ambientIntensity);
// set up light 0 properties
GLfloat lt0Intensity[4] = {1.5, 1.5, 1.5, 1.0}; // white
glLightfv(GL_LIGHT0, GL_DIFFUSE, lt0Intensity);
glLightfv(GL_LIGHT0, GL_SPECULAR, lt0Intensity);
GLfloat lt0Position[4] = {2.0, 4.0, 5.0, 1.0}; // location
glLightfv(GL_LIGHT0, GL_POSITION, lt0Position);
// attenuation params (a,b,c)
glLightf (GL_LIGHT0, GL_CONSTANT_ATTENUATION, 0.0);
glLightf (GL_LIGHT0, GL_LINEAR_ATTENUATION, 0.0);
glLightf (GL_LIGHT0, GL_QUADRATIC_ATTENUATION, 0.1);
glEnable(GL_LIGHT0);
think). In smooth shading, this vertex information (for colors and normals) are interpolated across the entire
face. In at shading the information for the rst vertex determines the color of the entire face.
Every object in OpenGL is a polygon, and in general every face can be colored in two different ways. In most
graphic scenes, polygons are used to bound the faces of solid polyhedra objects and hence are only to be seen
from one side, called the front face. This is the side from which the vertices are given in counterclockwise
order. By default OpenGL, only applies lighting equations to the front side of each polygon and the back side
is drawn in exactly the same way. If in your application you want to be able to view polygons from both sides,
it is possible to change this default (using glLightModel() so that each side of each face is colored and shaded
independently of the other. We will assume the default situation.
Surface material properties are specied by glMaterialf() and glMaterialfv().
glMaterialf(GLenum face, GLenum pname, GLfloat param);
glMaterialfv(GLenum face, GLenum pname, const GLfloat
*
params);
It is possible to color the front and back faces separately. The rst argument indicates which face we are col-
oring (GL FRONT, GL BACK, or GL FRONT AND BACK). The second argument indicates the parameter name
(GL EMISSION, GL AMBIENT, GL DIFFUSE, GL AMBIENT AND DIFFUSE, GL SPECULAR, GL SHININESS).
The last parameter is the value (either scalar or vector). See the OpenGL documentation for more information.
Recall from the Phong model that each surface is associated with a single color and various coefcients are
provided to determine the strength of each type of reection: emission, ambient, diffuse, and specular. In
OpenGL, these two elements are combined into a single vector given as an RGB or RGBA value. For example,
in the traditional Phong model, a red object might have a RGB color of (1, 0, 0) and a diffuse coefcient of 0.5.
In OpenGL, you would just set the diffuse material to (0.5, 0, 0). This allows objects to reect different colors
of ambient and diffuse light (although I know of no physical situation in which this arises).
Other options: You may want to enable a number of GL options using glEnable(). This procedure takes a single
argument, which is the name of the option. To turn each option off, you can use glDisable(). These optional
include:
Lecture Notes 61 CMSC 427
Typical drawing with lighting
GLfloat color[] = {0.0, 0.0, 1.0, 1.0}; // blue
GLfloat white[] = {1.0, 1.0, 1.0, 1.0}; // white
// set object colors
glMaterialfv(GL_FRONT_AND_BACK, GL_AMBIENT_AND_DIFFUSE, color);
glMaterialfv(GL_FRONT_AND_BACK, GL_SPECULAR, white);
glMaterialf(GL_FRONT_AND_BACK, GL_SHININESS, 100);
glPushMatrix();
glTranslatef(...); // your transformations
glRotatef(...);
glBegin(GL_POLYGON); // draw your shape
glNormal3f(...); glVertex(...); // remember to add normals
glNormal3f(...); glVertex(...);
glNormal3f(...); glVertex(...);
glEnd();
glPopMatrix();
GL CULL FACE: Recall that each polygon has two sides, and typically you know that for your scene, it is
impossible that a polygon can only be seen from its back side. For example, if you draw a cube with six
square faces, and you know that the viewer is outside the cube, then the viewer will never see the back
sides of the walls of the cube. There is no need for OpenGL to attempt to draw them. This can often save a
factor of 2 in rendering time, since (on average) one expects about half as many polygons to face towards
the viewer as to face away.
Backface culling is the process by which faces which face away from the viewer (the dot product of the
normal and view vector is negative) are not drawn.
By the way, OpenGL actually allows you to specify which face (back or front) that you would like to have
culled. This is done with glCullFace() where the argument is either GL FRONT or GL BACK (the latter
being the default).
GL NORMALIZE: Recall that normal vectors are used in shading computations. You supply these normal to
OpenGL. These are assumed to be normalized to unit length in the Phong model. Enabling this option
causes all normal vectors to be normalized to unit length automatically. If you know that your normal
vectors are of unit length, then you will not need this. It is provided as a convenience, to save you from
having to do this extra work.
Computing Surface Normals (Optional): We mentioned one way for computing normals above based on taking the
cross product of two vectors on the surface of the object. Here are some other ways.
Normals by Cross Product: Given three (nocollinear) points on a polygonal surface, p
0
, p
1
, and p
2
, we can
compute a normal vector by forming the two vectors and taking their cross product.
u
1
= p
1
p
0
u
2
= p
2
p
0
n = normalize(u
1
u
2
).
This will be directed to the side from which the points appear in counterclockwise order.
Normals by Area: The method of computing normals by considering just three points is subject to errors if
the points are nearly collinear or not quite coplanar (due to round-off errors). A more robust method is
to consider all the points on the polygon. Suppose we are given a planar polygonal patch, dened by a
sequence of m points p
0
, p
1
, . . . , p
m1
. We assume that these points dene the vertices of a polygonal
patch.
Here is a nice method for determining the plane equation,
ax +by +cz +d = 0.
Lecture Notes 62 CMSC 427
Once we have determined the plane equation, the normal vector has the coordinates n = (a, b, c)
T
, which
can be normalized to unit length.
This leaves the question of to compute a, b, and c? An interesting method makes use of the fact that the
coefcients a, b, and c are proportional to the signed areas of the polygons orthogonal projection onto the
yz-, xz-, and xy-coordinate planes, respectively. By a signed area, we mean that if the projected polygon
is oriented clockwise the signed area is positive and otherwise it is negative. So how do we compute the
projected area of a polygon? Let us consider the xy-projection for concreteness. The formula is:
c =
1
2
m
i=1
(y
i
+y
i+1
)(x
i
x
i+1
).
But where did this formula come from? The idea is to break the polygons area into the sum of signed
trapezoid areas (see Fig. 45).
1
2
(y
2
+ y
3
)
area =
1
2
(y
2
+ y
3
)(x
2
x
3
)
x
y
p
2
p
0
p
1
p
3
p
4
p
5
p
6
Fig. 45: Area of polygon.
Assume that the points are oriented counterclockwise around the boundary. For each edge, consider the
trapezoid bounded by that edge and its projection onto the x-axis. (Recall that this is the product of the
length of the base times the average height.) The area of the trapezoid will be positive if the edge is directed
to the left and negative if it is directed to the right. The cute observation is that even though the trapezoids
extend outside the polygon, its area will be counted correctly. Every point inside the polygon is under one
more left edge than right edge and so will be counted once, and each point under the polygon is under the
same number of left and right edges, and these areas will cancel.
The nal computation of the projected areas is, therefore:
a =
1
2
m
i=1
(z
i
+z
i+1
)(y
i
y
i+1
)
b =
1
2
m
i=1
(x
i
+x
i+1
)(z
i
z
i+1
)
c =
1
2
m
i=1
(y
i
+y
i+1
)(x
i
x
i+1
)
Normals for Implicit Surfaces: Given a surface dened by an implicit representation, e.g. the set of points
that satisfy some equation, f(x, y, z) = 0, then the normal at some point is given by gradient vector,
denoted . This is a vector whose components are the partial derivatives of the function at this point
n = normalize() =
_
_
f/x
f/y
f/z
_
_
.
Lecture Notes 63 CMSC 427
As usual this should be normalized to unit length. (Recall that f/x is computed by taking the derivative
of f with respect to x and treating y and z as though they are constants.)
For example, consider a bowl shaped paraboloid, dened by the equal x
2
+y +2 = z. The corresponding
implicit equation is f(x, y, z) = x
2
+y
2
z = 0. The gradient vector is
(x, y, z) =
_
_
2x
2y
1
_
_
.
Consider a point (1, 2, 5)
T
on the surface of the paraboloid. The normal vector at this point is (1, 2, 5) =
(2, 4, 1)
T
.
Normals for Parametric Surfaces: Surfaces in computer graphics are more often represented parametrically.
A parametric representation is one in which the points on the surface are dened by three function of 2
variables or parameters, say u and v:
x =
x
(u, v),
y =
y
(u, v),
z =
z
(u, v).
We will discuss this representation more later in the semester, but for now let us just consider how to
compute a normal vector for some point (
x
(u, v),
y
(u, v),
z
(u, v)) on the surface.
To compute a normal vector, rst compute the gradients with respect to u and v,
u
=
_
_
x
/u
y
/u
z
/u
_
_
v
=
_
_
x
/v
y
/v
z
/v
_
_
,
and then return their cross product
n =
u
v
.
Lecture 13: Texture Mapping
Surface Detail: We have discussed the use of lighting as a method of producing more realistic images. This is ne
for smooth surfaces of uniform color (plaster walls, plastic cups, metallic objects), but many of the objects that
we want to render have some complex surface nish that we would like to model. In theory, it is possible to try
to model objects with complex surface nishes through extremely detailed models (e.g. modeling the cover of
a book on a character by character basis) or to dene some sort of regular mathematical texture function (e.g. a
checkerboard or modeling bricks in a wall). But this may be infeasible for very complex unpredictable textures.
Textures and Texture Space: Although originally designed for textured surfaces, the process of texture mapping can
be used to map (or wrap) any digitized image onto a surface. For example, suppose that we want to render a
picture of the Mona Lisa, or wrap an image of the earth around a sphere, or draw a grassy texture on a soccer
eld. We could download a digitized photograph of the texture, and then map this image onto surface as part of
the rendering process.
There are a number of common image formats which we might use. We will not discuss these formats. Instead,
we will think of an image simply as a 2-dimensional array of RGB values. Let us assume for simplicity that the
image is square, of dimensions n n (OpenGL requires that n is a power of 2 for its internal representation. If
you image is not of this size, you can pad it out with unused additional rows and columns.) Images are typically
Lecture Notes 64 CMSC 427
indexed row by row with the upper left corner as the origin. The individual RGB pixel values of the texture
image are often called texels, short for texture elements.
Rather than thinking of the image as being stored in an array, it will be a little more elegant to think of the image
as function that maps a point (s, t) in 2-dimensional texture space to an RGB value. That is, given any pair
(s, t), 0 s, t < 1, the texture image denes the value of T(s, t) is an RGB value. Note that the interval [0, 1)
does not depend on the size of the images. This has the advantage an image of a different size can be substituted
without the need of modifying the wrapping process.
For example, if we assume that our image array I [n][n] is indexed by row and column from 0 to n 1 with (as
is common with images) the origin in the upper left corner. Our texture space T(s, t) is coordinatized with axes
s (horizontal) and t (vertical) where (following OpenGLs conventions) the origin is in the lower left corner. We
could then apply the following function to round a point in image space to the corresponding array element:
T(s, t) = I
_
(1 t)n
_
sn
, for 0 s, t < 1.
(See Fig. 46.)
0 n1
0
n1
I
T
s
t
s
t
Repeated texture
space
Texture space
(single copy)
Image
j
i
Fig. 46: Texture space.
In many cases, it is convenient to think of the texture is an innite function. We do this by imagining that the
texture image is repeated cyclically throughout the plane. (This handy when applying a small texture, such as
a patch of grass, to a very large surface, like the surface of a soccer eld.) This is sometimes called a repeated
texture. In this case we can modify the above function to be
T(s, t) = I
_
(1 t)n mod n
_
sn mod n
, for any s, t.
Inverse Wrapping Function and Parameterizations: Suppose that we wish to wrap a 2-dimensional texture im-
age onto the surface of a 3-dimensional ball of unit radius, that is, a unit sphere. We need to dene a wrapping
function that achieves this. The surface resides in 3-dimensional space, so the wrapping function would need to
map a point (s, t) in texture space to the corresponding point (x, y, z) in 3-space. That is, the wrapping function
can be thought of as a function W(s, t) that maps a point in 2-dimensional texture space to a point (x, y, z) in
three dimensional space.
Later we will see that it is not the wrapping function that we need to compute, but rather its inverse. So, let us
instead consider the problem of computing a function W
1
that maps a point (x, y, z) on the sphere to a point
(s, t) in parameter space. This is called the inverse wrapping function.
This is typically done by rst computing a 2-dimensional parameterization of the surface. This means that
we associate each point on the object surface with two coordinates (u, v) in surface space. Implicitly, we can
think of this as three functions, x(u, v), y(u, v) and z(u, v), which map the parameter pair (u, v) to the x, y, z-
coordinates of the corresponding surface point.
Our approach to solving the inverse wrapping problem will be to map a point (x, y, z) to the corresponding
parameter pair (u, v), and then map this parameter pair to the desired point (s, t) in texture space.
Lecture Notes 65 CMSC 427
ExampleParameterizing a Sphere: Lets make this more concrete with an example. Our shape is a unit sphere
centered at the origin. We want to nd the inverse wraping function W
1
that maps any point (x, y, z) on the
of the sphere to a point (s, t) in texture space.
We rst need to come up with a surface parameterization for the sphere. We can represent any point on the sphere
with two angles, representing the points latitude and longitude. We will use a slightly different approach. Any
point on the sphere can be expressed by two angles, and , which are sometimes called spherical coordinates.
(These will take the roles of the parameters u and v mentioned above.)
x
y
z
0 2
0
0 1
1
0
t
s
W
1
param
Fig. 47: Parameterization of a sphere.
Consider a vector from the origin to the desired point on the sphere. Let denote the angle in radians between
this vector and the z-axis (north pole). So is related to, but not equal to, the latitude. We have 0 . Let
denote the counterclockwise angle of the projection of this vector onto the xy-plane. Thus 0 < 2. (This
is illustrated in Fig. 47.)
Our next task is to determine how to convert a point (x, y, z) on the sphere to a pair (, ). It will be a bit easier
to approach this problem in the reverse direction, by determining the (x, y, z) value that corresponds to a given
parameter pair (, ).
The z-coordinate is just cos , and clearly this ranges from 1 to 1 as increases from 0 to . To determine
the value of , let us consider the projection of this vector onto the x, y-plane. Since the vertical component is
of length cos , and the overall length is 1 (since its a unit sphere), by the Pythagorean theorem the horizontal
length is =
_
1 cos
2
= sin . The lengths of the projections onto the x and y coordinate axes are
x = cos and y = sin . Putting this all together, it follows that the (x, y, z) coordinates corresponding to
the spherical coordinates (, ) are
z(, ) = cos , x(, ) = sin cos , y(, ) = sin sin .
But what we wanted to know was how to map (x, y, z) to (, ). To do this, observe rst that = arccos z.
It appears at rst that will be much messier, but there is an easy way to get its value. Observe that y/x =
sin / cos = tan . Therefore, = arctan(y/x). In summary:
= arccos z = arctan(y/x),
(Remember that this can be computed accurately as atan2(y, x).)
The nal step is to map the parameter pair (, ) to a point in (s, t) space. To get the s coordinate, we just scale
from the range [0, 2] to [0, 1]. Thus, s = /(2).
The value of t is trickier. The value of increases from 0 at the north pole at the south pole, but the value of t
decreases from 1 at the north pole to 0 at the south pole. After a bit of playing around with the scale factors, we
nd that t = 1 (/). Thus, as goes from 0 to , this function goes from 1 down to 0, which is just what
we want. In summary, the desired inverse wrapping function is W
1
(x, y, z) = (s, t), where:
s =
2
where = arctan
y
x
t = 1
where = arccos z.
Lecture Notes 66 CMSC 427
Note that at the north and south poles there is a singularity in the sense that we cannot derive a unique value for
. This is phenomenon is well known to cartographers. (What is the longitude of the north or south pole?)
To summarize, the inverse wrapping function W
1
(x, y, z) maps a point on the surface to a point (s, t) in texture
space. This is often done through a two step process, rst determining the parameter values (u, v) associated
with this point, and then mapping (u, v) to texture-space coordinates (s, t).This unwrapping function, maps
the surface back to the texture. For this simple example, lets just set this function to the identity, that is,
W
1
(u, v) = (u, v). In general, we may want to stretch, translate, or rotate the texture to achieve the exact
placement we desire.
The Texture Mapping Process: Suppose that the inverse wrapping function W
1
, and a parameterization of the
surface are given. Here is an overview of an idealized version of the texture mapping process (see Fig. 48). (The
actual implementation in common graphics systems differs for the sake of efciency.)
Project pixel to surface: First we consider a pixel that we wish to draw. We determine the fragment of the
objects surface that projects onto this pixel, by determining which points of the object project through the
corners of the pixel. (We will describe methods for doing this below.) Let us assume for simplicity that
a single surface covers the entire fragment. Otherwise we should average the contributions of the various
surfaces fragments to this pixel.
Parameterize: We compute the surface space parameters (u, v) for each of the four corners of the fragment.
This generally requires a function for converting from the (x, y, z) coordinates of a surface point to its
(u, v) parameterization.
Unwrap and average: Then we apply the inverse wrapping function to determine the corresponding region of
texture space. Note that this region may generally have curved sides, if the inverse wrapping function is
nonlinear. We compute the average intensity of the texels in this region of texture space by a process called
ltering. For example, this might involve computing the weighted sum of texel values that overlap this
region, and then assign the corresponding average color to the pixel.
v
u
parameter space
image plane
texture space
s
t
image
j
i
pixel
eye
lter
Fig. 48: Texture mapping overview.
We have covered the basic mathematical elements of texture mapping. In the next lecture, we will consider how
to make this happen in OpenGL.
Lecture 14: More on Texture Mapping
Recap: Last time we discussed the basic principles of texture mapping, and in particular we discuss the inverse
wrapping function and the use surface parameterizations as a means of computing them. Today, we will see how
texture mapping is implemented in OpenGL.
Lecture Notes 67 CMSC 427
Texture Mapping in OpenGL: Recall that all objects in OpenGL are rendered as polygons or generally meshes of
polygons. This simplies the texture mapping process because it means that we need only provide the inverse
wrapping function for the vertices, and we can rely on simple interpolation to ll in the polygons interior. For
example, suppose that a triangle is being drawn. When the vertices of the polygon are given, the user also
species the corresponding (s, t) coordinates of these points in texture space. These are called the vertices
texture coordinates. This implicitly denes the inverse wrapping function from the surface of the polygon to a
point in texture space.
As with surface normals (which were used for lighting computations) texture vertices are specied before each
vertex is drawn. For example, a texture-mapped object in 3-space with shading might be drawn using the
following general form, where n = (n
x
, n
y
, n
z
) is the surface normal, (s, t) are the texture coordinates, and
p = (p
x
, p
y
, p
z
) is the vertex position.
glBegin(GL_POLYGON);
glNormal3f(nx, ny, nz); glTexCoord2f(s, t); glVertex3f(px, py, pz);
// ...
glEnd();
Interpolating Texture Coordinates: Given the texture coordinates, the next question is how to interplate the texture
coordinates for the points in the interior of the polygon. An obvious approach is to rst project the vertices of
the triangle onto the viewport. This gives us three points p
0
, p
1
, and p
2
for the vertices of the triangle in 2-space.
Let q
0
, q
1
and q
2
denote the three texture coordinates, corresponding to these points. Now, for any pixel in the
triangle, let p be its center. We can represent p uniquely as an afne combination
p =
0
p
0
+
1
p
1
+
2
p
2
for
0
+
1
+
2
= 1.
(Computing the values for an arbitrary point of the polygon generally involves solving a system of linear equa-
tions, but there are simple and efcient methods which are discussed under the topic of polygon rasterization.)
Once we have computed the
i
s the corresponding point in texture space is just
q =
0
q
0
+
1
q
1
+
2
q
2
.
Now, we can just apply our indexing function to obtain the corresponding point in texture space, and use its
RGB value to color the pixel.
What is wrong with this direct approach? The rst problem has to do with perspective. The direct approach
makes the incorrect assumption that afne combinations are preserved under perspective projection. This is not
true (see Fig. 49(c)).
Texture Linear
interpolation
Perspective
corrected
(a) (c) (d) (b)
Surface
(triangulated)
Fig. 49: Perspective correction.
There are a number of ways to x this problem. One approach is to use a more complex formula for interpo-
lation, which corrects for perspective distortion. (See the text for details. An example of the result is shown in
Fig. 49(d)). This can be activated using the following OpenGL command:
glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST);
Lecture Notes 68 CMSC 427
(The other possible choice is GL FASTEST, which does simple linear interpolation.)
The other method involves slicing the polygon up into sufciently small polygonal pieces, such that within
each piece the amount of distortion due to perspective is small. Recall that with Gauroud shading, it was often
necessary to subdivide large polygons into small pieces for accurate lighting computation. If you have already
done this, then the perspective distortion due to texture mapping may not be a signicant issue for you.
The second problem has to do with something called aliasing. Remember that we said that after determining
the fragment of texture space onto which the pixel projects, we should average the colors of the texels in this
fragment. The above procedure just considers a single point in texture space, and does no averaging. In situations
where the pixel corresponds to a point in the distance and hence covers a large region in texture space, this may
produce very strange looking results, because the color of the entire pixel is determined entirely by a single
point in texture space that happens to correspond (say) to the pixels center coordinates (see Fig. 50(a)).
Aliasing and Mipmapping
Without With
(a) (b)
mipmapping mipmapping
Fig. 50: Aliasing and mipmapping.
Dealing with aliasing in general is a deep topic, which is studied in the eld of signal processing. OpenGL
applies a simple method for dealing with the aliasing of this sort. The method is called mipmapping. (The
acronym mip comes from the Latin phrase multum in parvo, meaning much in little.)
The idea behind mipmapping is to generate a series of texture images at decreasing levels of resolution. For
example, if you originally started with a 128 128 image, a mipmap would consist of this image along with a
64 64 image, a 32 32 image, a 16 16 image, etc. All of these are scaled copies of the original image. Each
pixel of the 64 64 image represents the average of a 2 2 block of the original. Each pixel of the 32 32
image represents the average of a 4 4 block of the original, and so on (see Fig. 51).
128
128
64 32
64
32
Fig. 51: Mipmapping.
Now, when OpenGL needs to apply texture mapping to a screen pixel that overlaps many texture pixels (that
is, when minimizing the texture), it determines the mipmap in the hiearchy that is at the closest level of
resolution, and uses the corresponding averaged pixel value from that mipmapped image. This results in more
nicely blended images (see Fig. 50(b)).
Lecture Notes 69 CMSC 427
If you wish to use mipmapping in OpenGL (and it is a good idea), it is good to be aware of the command
gluBuild2DMipmaps, which can be used to automatically generate them.
Texture mapping in OpenGL: OpenGL supports a fairly general mechanism for texture mapping. The process in-
volves a bewildering number of different options. You are referred to the OpenGL documentation for more
detailed information. By default, objects are not texture mapped. If you want your objects to be colored using
texture mapping, you need to enable texture mapping before you draw them. This is done with the following
command.
glEnable(GL_TEXTURE_2D);
After drawing textured objects, you can disable texture mapping.
If you plan to use more than one texture, then you will need to request that OpenGL generate texture objects.
This is done with the following command:
glGenTextures(GLsizei n, GLuint
*
textureIDs);
This requests that n new texture objects be created. The n new texture ids are stored as unsigned integers in
the array textureIDs. Each texture ID is an integer greater than 0. (Typically, these are just integers 1 through
n, but OpenGL does not require this.) If you want to generate just one new texture, set n = 1 and pass it the
address of the unsigned int that will hold the texture id.
By default, most of the texture commands apply to the active texture. How do we specify which texture object
is the active one? This is done by a process called binding, and is dened by the following OpenGL command:
glBindTexture(GLenum target, GLuint textureID);
where target is one of GL TEXTURE 1D, GL TEXTURE 2D, or GL TEXTURE 3D, and textureID is one of the
texture IDs returned by glGenTextures. The target will be GL TEXTURE 2D for the sorts of 2-dimensional
image textures we have been discussing so far. The textureID parameter indicates which of the texture IDs will
become the active texture. If this texture is being bound for the rst time, a new texture object is created and
assigned the given texture ID. If textureID has been used before, the texture object with this ID becomes the
active texture.
Presenting your Texture to OpenGL: The next thing that you need to do is to input your texture and present it to
OpenGL in a format that it can access efciently. It would be nice if you could just point OpenGL to an image
le and have it convert it into its own internal format, but OpenGL does not provide this capability. You need
to input your image le into an array of RGB (or possibly RGBA) values, one byte per color component (e.g.
three bytes per pixel), stored row by row, from upper left to lower right. By the way, OpenGL requires images
whose height and widths are powers of two.
Once the image array has been input, you need to present the texture array to OpenGL, so it can be converted
to its internal format. This is done by the following procedure. There are many different options, which are
partially explained in below.
glTexImage2d(GL_TEXTURE_2D, level, internalFormat, width, height,
border, format, type, image);
The procedure has an incredible array of options. Here is a simple example to present OpenGL an RGB image
stored in the array myPixelArray. The image is to be stored with an internal format involving three components
(RGB) per pixel. It is of width nCols and height nRows. It has no border (border = 0), and we are storing
the highest level resolution. (Other levels of resolution are used to implement the averaging process, through a
method called mipmaps.) Typically, the level parameter will be 0 (level = 0). The format of the data that we
will provide is RGB (GL RGB) and the type of each element is an unsigned byte (GL UNSIGNED BYE). So the
nal call might look like the following:
Lecture Notes 70 CMSC 427
glTexImage2d(GL_TEXTURE_2D, 0, GL_RGB, nCols, nRows, 0, GL_RGB,
GL_UNSIGNED_BYTE, myPixelArray);
In this instance, your array myPixelArray is an array of size 256 512 3 = 393, 216 whose elements are
the RGB values, expressed as unsigned bytes, for the 256 512 texture array. An example of a typical texture
initialization is shown in the code block below. (The call to gluBuild2DMipmpaps is needed only if mipmapping
is to be used.)
Initialization for a Single Texture
GLuint textureID; // the ID of this texture
glGenTextures(1, &textureID); // assign texture ID
glBindTexture(GL_TEXTURE_2D, textureID); // make this the active texture
//
// ... input image nRows x nCols into RGB array myPixelArray
//
glTexImage2d(GL_TEXTURE_2D, 0, GL_RGB, nCols, nRows, 0, GL_RGB,
GL_UNSIGNED_BYTE, myPixelArray);
// generate mipmaps (see below)
gluBuild2DMipmaps(GL_TEXTURE_2D, GL_RGB, nCols, nRows, GL_RGB,
GL_UNSIGNED_BYTE, myPixelArray);
Texturing Options: Once the image has been input and presented to OpenGL, we need to tell OpenGL how it is to
be mapped onto the surface. Again, OpenGL provides a large number of different methods to map a surface.
These parameters are set using the following function:
glTexParameteri(target, param_name, param_value);
glTexParameterf(target, param_name, param_value);
The rst form is for integer parameter values and the second is for oat values. For most options, the argument
target will be set to GL TEXTURE 2D. There are two common parameters to set. First, in order to specify that
a texture should be repeated or not (clamped), the following options are useful.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
These options determine what happens of the s parameter of the texture coordinate is less than 0 or greater
than 1. (If this never happens, then you dont need to worry about this option.) The rst causes the texture to
be displayed only once. In particular, values of s that are negative are treated as if they are 0, and values of
s exceeding 1 are treated as if they are 1. The second causes the texture to be wrapped-around repeatedly, by
taking the value of s modulo 1. Thus, s = 5.234 amd s = 76.234 are both equivalent to s = 0.234. This can
independently be set for he t parameter of the texture coordinate, by setting GL TEXTURE WRAP T.
Filtering and Mipmapping: Another useful parameter determines how rounding is performed during magnication
(when a screen pixel is smaller than the corresponding texture pixel) and minication (when a screen pixel is
larger than the corresponding texture pixel). The simplest, but not the best looking, option in each case is to just
use the nearest pixel in the texture:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
A better approach is to use linear ltering when magnifying and mipmaps when minifying. An example is given
below.
Lecture Notes 71 CMSC 427
Combining Texture with Lighting: How are texture colors combined with object colors? The two most common
options are GL REPLACE, which simply makes the color of the pixel equal to the color of the texture, and
GL MODULATE (the default), which makes the colors of the pixel the product of the color of the pixel (without
texture mapping) times the color of the texture. The former is for painting textures that are already prelit,
meaning that lighting has already been applied. Examples include skyboxes and precomputed lighting for the
ceiling and walls of a room. The latter is used when texturing objects to which lighting is to be applied, such as
the clothing of a moving character.
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);
Drawing a Texture Mapped Object: Once the initializations are complete, you are ready to start drawing. First, bind
the desired texture (that is, make it the active texture), set the texture parameters, and enable texturing. Then start
drawing your textured objects. For each vertex drawn, be sure to specify the texture coordinates associated with
this vertex, prior to issuing the glVertex command. If lighting is enabled, you should also provide the surface
normal. A generic example is shown in the code block below.
Drawing a Textured Object
glEnable(GL_TEXTURE_2D); // enable texturing
glBindTexture(GL_TEXTURE_2D, textureID); // select the active texture
// (use GL_REPLACE below for skyboxes)
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
// repeat texture
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
// reasonable filter choices
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glBegin(GL_POLYGON); // draw the object(s)
glNormal3f( ... ); // set the surface normal
glTexCoord2f( ... ); // set texture coords
glVertex3f( ... ); // draw vertex
// ... (repeat for other vertices)
glEnd()
glDisable(GL_TEXTURE_2D); // disable texturing
Lecture 15: Bump Mapping
Bump mapping: Texture mapping is good for changing the surface color of an object, but we often want to do more.
For example, if we take a picture of an orange, and map it onto a sphere, we nd that the resulting object does
not look realistic. The reason is that there is an interplay between the bumpiness of the oranges peel and the
light source. As we move our viewpoint from side to side, the specular reections from the bumps should move
as well. However, texture mapping alone cannot model this sort of effect. Rather than just mapping colors, we
should consider mapping whatever properties affect local illumination. One such example is that of mapping
surface normals, and this is what bump mapping is all about (see Fig. 52).
What is the underlying reason for this effect? The bumps are too small to be noticed through perspective depth. It
is the subtle variations in surface normals that causes this effect. At rst it seems that just displacing the surface
normals would produce a rather articial effect. But in fact, bump mapping produces remarkably realistic
bumpiness effects. (For example, in Fig. 52, the object appears to have a bumpy exterior, but an inspection of
its shadow shows that it is in fact modeled as a perfect geometric sphere. It just looks bumpy.)
Lecture Notes 72 CMSC 427
A bump-mapped object A bump-map
Fig. 52: Bump mapping.
How its done: As with texture mapping we are presented with an image that encodes the bumpiness. Think of this as
a monochrome (gray-scale) image, where a large (white) value is the top of a bump and a small (black) value is
a valley between bumps. (An alternative, and more direct way of representing bumps would be to give a normal
map in which each pixel stores the (x, y, z) coordinates of a normal vector. One reason for using gray-valued
bump maps is that they are often easier to compute and involve less storage space.) As with texture mapping, it
will be more elegant to think of this discrete image as an encoding of a continuous 2-dimensional bump space,
with coordinates s and t. The gray-scale values encode a function called the bump displacement function b(s, t),
which maps a point (s, t) in bump space to its (scalar-valued) height. As with texture mapping, there is an
inverse wrapping function W
1
, which maps a point (u, v) in the objects surface parameter space to (s, t) in
bump space.
Parametric Surfaces, Partial Derivatives, and Tangents: Before getting into bump mapping, lets take a short di-
gression to talk a bit about parameterized surfaces and surface derivatives. Since surfaces are two-dimensional,
a surface can be presented as a parametric function in two parameters u and v. That is, each point p(u, v) on the
surface is given by three coordinate functions x(u, v), y(u, v), and z(u, v). That is,
p(u, v) =
_
_
x(u, v)
y(u, v)
z(u, v)
_
_
.
As an example, let us consider a cone of height 1 whose apex is at the origin, whose axis coincides with the
z-axis, and whose interior angle with the axis is 45
, a point
at height v projects to a point at distance v from the origin. Therefore, we have x(u, v) = v cos(u/2) and
y(u, v) = v sin(u/2). Thus, we have
p(u, v) =
_
_
v cos(u/2)
v sin(u/2)
v
_
_
, where 0 v 1, 0 u 1.
To get a specic point, we plug in any two valid values of u and v. For example, when u = 0 and v = 1, we
have the point
p(0, 1) = (1, 0, 1)
T
,
which is a point on the cone immediately above the tip of the x-axis.
Returning to the general case, let us assume for now that we know these three functions of u and v. In order to
compute a surface normal at this point, we would like to produce two vectors that are tangent to the surface, and
then take their cross product.
Lecture Notes 73 CMSC 427
p(u, v)
y
x
z
v
u/(2)
y
x v cos u
v sin u
(a) (b)
Top-down view
u/(2)
p(u, v)
z
(c)
p
v
(u, v)
p
u
(u, v)
y
x
Fig. 53: Parameterization of a cone.
To do this, consider the partial derivative of p(u, v) with respect to u, which we will denote by p
u
.
p
u
(u, v) =
_
_
x(u, v)/u
y(u, v)/u
z(u, v)/u
_
_
.
(Note that, although p(u, v) is a point, its partial derivative should be interpreted as a vector in 3-dimensional
space. As u and v vary, the direction in which this vector points changes.) Recall from differential calculus that
this means computing the derivative of each of these functions, but treating the variable v as if it were a constant.
We can assign a geometric interpretation to this vector as follows. As u varies over time, the point p(u, v) traces
out a curve along the surface. The partial derivative vector p
u
(u, v) can be thought of as the instantaneous
velocity of a point on this curve. That is, it is a tangent vector on the surface which points in the direction along
which the surface changes most rapidly as a function of u.
Analogously, we can dene the partial derivative p
v
to be
p
v
(u, v) =
_
_
x(u, v)/v
y(u, v)/v
z(u, v)/v
_
_
.
When evaluated at any point p(u, v) this is a tangent vector pointing in the direction of most rapid change for
v. Thus, the cross product p
u
(u, v) p
v
(u, v) gives us a normal vector to the surface. Depending on the
orientations of the parameterization, we may need to reverse the two vectors to get the normal to point in the
desired direction.
For example, in the case of our cone, we have
p
u
(u, v) =
_
v
2
sin
u
2
,
v
2
cos
u
2
, 0
_
T
.
If you think about it a bit, this can be seen to be a vector that is tangent to the horizontal circle passing through
p(u, v) (see Fig. 53(c)). Also, we have
p
v
(u, v) =
_
cos
u
2
, sin
u
2
, 1
_
T
.
This can be seen to be a vector that is directed along a slanted line passing through p(u, v) that is aligned with
the cones boundary (see Fig. 53(c)). Thus, these two partial derivatives give us (for any valid choice of u and
v) two surface tangents.
To obtain the surface normal, we compute the cross product of these vectors. This gives us (trust me)
n(u, v) = p
u
(u, v) p
v
(u, v) =
v
2
_
cos
u
2
, sin
u
2
, 1
_
T
.
Lecture Notes 74 CMSC 427
If we use the same values (u, v) = (0, 1), we obtain the tangent vectors
p
u
(0, 1) =
1
2
(0, 1, 0)
T
and p
v
(0, 1) = (1, 0, 1)
T
.
If you draw the picture carefully, you can see that these two vectors are tangent to the cone at the point lying
immediately above the tip of the x-axis. Finally, if we compute their cross product, we have
n(0, 1) =
1
2
(1, 0, 1)
T
.
Again, if you think about it carefully, this is a normal vector to the cone at this point.
Perturbing normal vectors: Now, let us return to the original question of how to compute the perturbed normal
vector. Consider a point p(u, v) on the surface of the object (which we will just call p). Let n denote the
surface normal vector at this point. Let (s, t) = W
1
(u, v), so that b(s, t) is the corresponding bump value (see
Fig. 54(a)). The question is, what is the perturbed normal n
surface
bumped surface
true normal
perturbed normal
p
p
u
p
v
n
u
v
(a) (b)
Fig. 54: The bump-mapping process.
All the geometric entities we will be considering here and below (e.g., p, p
u
, p
v
, b, n, n
_
_
x/v
y/v
z/v
_
_
.
Lecture Notes 75 CMSC 427
This is illustrated in Fig. 54(b). Since n may not generally be of unit length, we dene n = n/|n| to be the
normalized normal vector.
If we apply our bump at point p, it will be elevated by a distance b = b(u, v) in the direction of the normal (thus,
b is a scalar). So we have
p
= p +b n,
is the elevated point. (Ultimately, we do not need to compute p
as a means to determine the perturbed normal vector.) Determining the perturbed normal at p
= p
u
p
v
,
where p
u
and p
v
are the partial derivatives of p
u
=
u
(p +b n) = p
u
+b
u
n +b n
u
,
where b
u
and n
u
denote the respective partial derivatives of b and n with respect to u. An analogous formula
applies for p
v
. Assuming that the height of the bump b is small but its rate of change b
u
and b
v
may be high, we
can neglect the last term, and write these as
p
u
p
u
+b
u
n p
v
p
v
+b
v
n.
Taking the cross product (and recalling that cross product distributes over vector addition) we have
n
( p
u
+b
u
n) ( p
v
+b
v
n)
( p
u
p
v
) +b
v
( p
u
n) +b
u
( n p
v
) +b
u
b
v
( n n).
By basic properties of the cross product, we know that n n = 0 and ( p
u
n) = ( n p
u
). Thus, we have
n
n +b
u
( n p
v
) b
v
( n p
u
).
The partial derivatives b
u
and b
v
depend on the particular parameterization of the objects surface. It will greatly
simplify matters to assume that the objects parameterization has been constructed in common alignment with
the image. (If not, then the formulas become much messier.) With this assumption, we have the following
formula for the perturbed surface normal:
n
n +b
s
( n p
v
) b
t
( n p
u
).
This is the nal answer that we desire. Note that for each point on the surface we need to know the following
quantities:
The true surface normal n and its normalization n.
The partial derivative vectors p
u
and p
v
. We can compute these if we have an algebraic representation of
the surface, e.g., as we did for the earlier cone example.
The partial derivatives of the bump function b with respect to the texture parameters s and t, denoted b
s
and b
t
. (Typically, we do not have an algebraic representation of the bump function, since it is given as
a gray-scale image. However, we can estimate these derivatives from the bump image through the use of
nite differences.)
In summary, for each point p on the surface with (smooth) surface normal n we apply the above formula to
compute the perturbed normal n
instead. OpenGl does not provide direct support bump mapping, but
there are some extensions of OpenGl that provide for an alternate form of bump mapping, called normal maps.
We will discuss this next time.
Lecture Notes 76 CMSC 427
Lecture 16: Normal and Environment Mapping
Normal Maps: Bump mapping uses a single monochromatic (gray-scale) and some differential geometry to generate
perturbed normals at each point of a surface. You might ask, why derive the surface normals, couldnt you just
store them in the map instead? This is exactly what normal mapping does. In particular, a normal map is an
RGB color image, in which each (R, G, B) triple is interpreted as the (x, y, z) coordinates of a normal vector.
As in bump mapping, this normal vector is not used directly, but rather is treated as a perturbation to an existing
normal vector.
The obvious disadvantage of normal mapping is that it requires more space than bump mapping, since a bump
map requires only gray-scale information and a normal map requires three color components per pixel.
2
One
advantage of normal mapping is that it is faster, because we do not need to perform the cross product computa-
tions needed by bump mapping. Also, it accords some greater degree of exibility, since bump maps can only
store normal displacements that are consistent with a bump height function.
Encoding Normals as Colors: How are (perturbed) normals encoded in a normal map? There are a number of con-
ventions, but the most common is based on identifying each possible (R, G, B) value with a vector from the
origin to the top face of a unit cube. Let us assume that the color components range from 0 to 1 (see Fig. 55(a)).
(They typically are stored as a single byte, and so range from 0 to 255. Thus, we divide each by 256, yielding a
value in the interval [0, 1).)
G
Bump map Normal map
B = 1
R
B
G = 0
G = 1
B = 0
R = 0 R = 1
(a) (b)
x
y
z
Fig. 55: Encoding normals as colors.
We will assume that the B value is used to encode the z-coordinate of the normal vector, which we will assume
to be 1. Thus, B = z = 1. (This is an obvious source of inefciency.) In order to map an R component to an
x coordinate, we set x = 2 R 1. Thus, as R ranges from 0 to 1, x ranges from 1 to +1. Similarly, we set
y = 2 G1. In summary, each for each RGB triple of our image, we assume that B = 1 and 0 R, G 1.
Thus, a given (R, G, B) triple is mapped to the normal vector:
(x, y, z) = (2R 1, 2G1, B) = 2(R, G, B) 1.
Above, we have used the fact that B = 1, so 2B 1 = 1 = B. (An example of a bump map, that is, a height
function, and the equivalent normal map is shown in Fig 55(b).)
The Normal Mapping Process: The rest of the process is conceptually the same as it is for bump mapping. Consider
a pixel p that we wish to determine the lighting for, and let p
u
and p
v
denote two (ideally orthogonal) tangent
vectors on the objects surface at the current point. Let us assume that they have both been normalized to unit
length. Let n = p
u
p
v
denote the standard normal vector for the current surface point (see Fig. 56(a)). We
proceed as follows:
2
Actually, two components would have been sufcient. You only need a direction to encode a normal vector, and a direction in 3-dimensional
space can be encoded with two angles, elevation and azimuth. But, since images naturally come in RGB triplets, normal maps follow this 3-
dimensional convention.
Lecture Notes 77 CMSC 427
p
n
u
v
(c)
n
x p
u
y p
v
n
= n + x p
u
+ y p
v
(b)
x
y
z
(x, y, 1)
p
p
u
p
v
n
u
v
(a)
Fig. 56: Computing the perturbed normal.
Unwrap and look-up: Based on the inverse wrapping function (which is given by the user), we do a look-up
in the normal map to obtain the appropriate (R, G, B) triple. (As with normal texture mapping, we may
perform some smoothing to avoid aliasing effects.)
Convert: We perform the aforementioned conversion to obtain the (x, y) pair associated with this triple (see
Fig. 56(b)).
Compute nal normal: We compute the perturbed normal by scaling the tangents by x and y, respectively,
and adding them to the standard normal (see Fig. 56(c)). Thus, we have
n
= n +x p
u
+y p
v
.
Lighting: Given the perturbed normal, n
,
instead.
Environment Mapping: Next, we consider another method of applying surface detail to model reective objects.
Suppose that you are looking at a shiny waxed oor, or a metallic sphere. We have seen that we can model
the shininess by setting a high coefcient of specular reection in the Phong model, but this will mean that
the only light sources will be reected (as bright spots). Suppose that we want the surfaces to actually reect
the surrounding environment. This sort of reection of the environment is often used in commercial computer
graphics. The shiny reective lettering and logos that you see on television, the reection of light off of water,
the shiny reective look of a automobiles body, are all examples (see Fig. 57).
An environment mapped teapot
Fig. 57: Environment mapping.
The most accurate way for modeling this sort of reective effect is through ray-tracing (which we may discuss
later in the semester). Unfortunately, ray-tracing is a computationally intensive technique, and may be too slow
Lecture Notes 78 CMSC 427
for interactive graphics. To achieve fast rendering times at the cost of some accuracy, it is common to apply an
alternative method called environment mapping (also called reection mapping).
What distinguishes reection from texture? When you use texture mapping to paint a texture onto a surface,
the texture stays put. For example, if you x your eye on a single point of the surface, the color stays the same,
even if you change your viewing position. In contrast, reective objects have the property that, as you move
your head and look at the same point on the surface, the reected color changes. This is because reection is a
function of the position of the viewer, while normal surface colors are not.
Computing Reections: How can we encode such a complex reective relationship? The basic question that we need
to answer is, given a point on the reective surface, and given the location of the viewer, determine what the
viewer sees in the reection. Before seeing how this is done in environment mapping, lets see how this is done
in the more accurate method called ray tracing. In ray tracing we track the path of a light photon backwards
from the eye to determine the color of the object that it originated from. When a photon strikes a reective
surface, it bounces off. If v is the (normalized) view vector and n is the (normalized) surface normal vector, we
can compute the view reection vector, denoted r
v
, as follows (see Fig. 58):
r
v
= 2(n v)n v.
To compute the true reection, we should trace the path of this ray back from the point on the surface along
r
v
. Whatever color this ray hits, will be the color that the viewer observes as reected from this surface.
eye
n
r
v
v
Fig. 58: Reection vector.
Unfortunately, it is expensive to shoot rays through 3-dimensional environments to determine what they hit.
(This is exactly what ray-tracing does.) Instead, we will precompute the reections, store them in an image le,
and look them up as needed. But, storing reections accurately is very complicated. (An accurate representation
of the reection would be like a hologram, since it would have to store all the light energy arriving from all
angles to all points on the surface.) To make the process tractable, we will make an important simplifying
assumption:
Distant Environment Assumption: (For environment mapping) The reective surface is small in comparison
with the distances to the objects being reected in it.
For example, the reection of a room surrounding a silver teapot would satisfy this requirement. However, if
the teapot is sitting on a table, then the table would be too close (resulting in a distorted reection).
This assumption implies that the most important parameter in determining what the reection ray hits is the
direction of the reection vector, and not the actual point of this vector on the surface from which the ray starts.
Happily, the space of directions is a 2-dimensional space. (Recall, that a direction can be stored as an azimuth,
that is, a compass direction, and an elevation, that is, the angle above the horizon.) This implies that we can
precompute the (approximate) reection information and store it in a 2-dimensional image array.
The environment mapping process: Here is a sketch of how environment mapping can be implemented. The rst
thing you need to do is to compute the environment map. To do this, remove the reective object from your
Lecture Notes 79 CMSC 427
environment. Place a small cube about the center of the object. (Spheres are also commonly used, and in fact,
OpenGL assumes spherical environment maps.) The cube should be small enough that it does not intersect any
of the surrounding objects.
Next, project the entire environment onto the six faces of the cube, using the center of the cube as the center
of projection. That is, take six separate pictures which together form a complete panoramic picture of the
surrounding environment, and store the results in six image les. (OpenGL provides the ability to generate
an image, and rather sending it to the display buffer, it can be saved in memory.) By the way, an accurate
representation of the environment is not always necessary in order to generate the illusion of reectivity.
Now suppose that we want to compute the color reected from some point p on the object. As in the Phong
model we compute the usual vectors: normal vector n, view vector v, etc. We compute the view reection vector
r
v
from these two. (This is not the same as the light reection vector, r, which we discussed in the Phong model,
but it is the counterpart where the reection is taken with respect to the viewer rather than the light source.)
To determine the reected color, we imagine that the view reection vector r
v
is shot from the center of the
cube and determine the point on the cube which is hit by this ray. We use the color of this point to color the
corresponding point on the surface (see Fig. 59). (We will leave as an exercise the problem of mapping a vector
to a point on the surface of the cube.)
Eye
Eye
n
r
v
v
r
v
True reection
by ray tracing
Building the map
Using the map
p
(b) (c)
Fig. 59: Environment mapping.
Note that the nal color returned by the environment map is a function of the contents of the environment image
and r
v
(and hence of v and n). In particular, it is not a function of the location of the point on the surface.
Wouldnt taking this location into account produce more accurate results? Perhaps, but by our assumption that
objects in the environment are far away, the directional vector is the most important parameter in determining
the result. (If you really want accuracy, then use ray tracing instead.)
Reection mapping through texture mapping: OpenGL does provide limited support for environment mapping
(but the implementation assumes spherical maps, not cube maps). There are reasonably good ways to fake
it using texture mapping. Consider a polygonal face to which you want to apply an environment map. They
key question is how to compute the point in the environment map to use in computing colors. The solution is
to compute this quantities yourself for each vertex on your polygon. That is, for each vertex on the polygon,
based on the location of the viewer (which you know), and the location of the vertex (which you know) and
the polygons surface normal (which you can compute), determine the view reection vector. Use this vector to
determine the corresponding point in the environment map. Repeat this for each of the vertices in your polygon.
Now, just treat the environment map as though it were a texture map.
What makes the approach visually convincing is that, when the viewer shifts positions, the texture coordinates
of the vertices change as well. In standard texture mapping, these coordinates would be xed, independent of
the viewers position.
Lecture Notes 80 CMSC 427
Lecture 17: Shadows
Shadows: Shadows give an image a much greater sense of realism. The manner in which objects cast shadows onto
the ground and other surrounding surfaces provides us with important visual cues on the spatial relationships
between these objects. As an example of this, imagine that you are looking down (say, at a 45
angle) at a ball
sitting on a smooth table top. Suppose that (1) the ball is moved vertically straight up a short distance, or (2)
the ball is moved horizontally directly away from you by a short distance. In either case, the impression in the
visual frame is essentially the same. That is, the ball moves upwards in the pictures frame (see Fig. 60(a)).
(a) (b) (c)
Fig. 60: The role of shadows in ascertaining spatial relations.
If the balls shadow were drawn, however, the difference would be quite noticeable. In case (1) (vertical motion),
the shadow remains in a xed position on the table as the ball moves away. In case (2) (horizontal motion), the
ball and its shadow both move together (see Fig. 60(b)).
We will consider various ways to handle shadows in computer graphics. Note that OpenGL does not support
shadows directly (because shadows are global effects, while OpenGL uses a local lighting model).
Hard and soft shadows: In real life, with few exceptions, we experience shadows as fuzzy objects. The reason is
that most light sources are engineered to be area sources, not point sources. One notable exception is the sun on
a cloudless day. When a light source covers some area, the shadow varies from regions that completely outside
the shadow, to a region, called the penumbra, where the light is partially visible, to a region called the umbra,
where the light is totally hidden. The umbra region is completely shadowed. The penumbra, in contrast, tends
to vary smoothly from shadow to unshadowed as more and more of the light source is visible to the surface.
Rendering penumbra effects is computational quite intensive, since methods are needed to estimate the fraction
of the area of the light source that is visible to a given point on the surface. Static rendering methods, such
as ray-tracing, can model these effects. In contrast, real-time systems almost always render hard shadows, or
employ some image trickery (e.g., blurring) to create the illusion of soft shadows.
In the examples below, we will assume that the light source is a point, and we will consider rendering of hard
shadows only.
Shadow Painting: Perhaps the simplest and most sneaky way in which to render shadows is to simply paint them
onto the surfaces where shadows are cast. For example, suppose that a shadow is being cast on at table top by
some occluding object P. First, we compute the shape of Ps shadow P
= p
z
/v
z
. We can derive the x- and y-coordinates of the shadow point as
p
x
= r(
)
x
= p
x
p
z
v
z
v
x
and p
y
= r(
)
y
= p
y
p
z
v
z
v
y
.
Thus, the desired shadow point can be expressed as
p
=
_
p
x
v
x
v
z
p
z
, p
y
v
y
v
z
p
z
, 0
_
T
.
The shadow-projection matrix: It is interesting to observe that this transformation is an afne transformation of p.
In particular, we can express this in matrix form, called the shadow-projection matrix, as
_
_
_
_
p
x
p
y
p
z
1
_
_
_
_
=
_
_
_
_
1 0 v
x
/v
z
0
0 1 v
y
/v
z
0
0 0 0 0
0 0 0 1
_
_
_
_
_
_
_
_
p
x
p
y
p
z
1
_
_
_
_
.
This is nice, because it provides a particularly elegant mechanism for rendering the shadow polygon. The
process is described in the following code block. The rst drawing draws a projection of P on the ground in the
shadow color, and the second one actually draws P itself. Note that this assumes that P is a constructed from
polygons, but this assumption holds for for all OpenGL drawing.
Lecture Notes 82 CMSC 427
Shadow Painting with a Light Source at Innity
drawGround();
glPushMatrix();
... // disable lighting and set color to shadow color
... // enable transparency, so ground texture shows through shadow
glMultMatrix(shadowProjection);
drawObject(); // draw the shadow shape
glPopMatrix();
... // restore lighting and set objects natural color
drawObject(); // draw the object
The above shadow matrix works for light sources at innity. You might wonder whether this is possible for light
sources that are not at innity. The answer is yes, but the problem is that the projection transformation is no
longer an afne transformation. (In particular, the light rays are not parallel to each other, they converge at the
light source.) If you think about this a moment, you will realize that this is exactly the issue that we faced with
perspective projections. Indeed, the shadow projection is just an example of a projective transformation, where
the light source is the camera and the ground is the image plane!
Since, in this case, the shadow-projection transformation is a projective transformation, the above transformation
needs to be applied to the OpenGL projection matrix, not the modelview matrix. Given a light source located at
the position = (
x
,
y
,
z
)
T
, the desired shadow-projection matrix to project shadows onto the z = 0 ground
plane is given as follows (recalling that we apply perspective normalization after the transformation).
_
_
_
_
p
x
p
y
p
z
1
_
_
_
_
=
_
_
_
_
z
0
x
0
0
z
y
0
0 0 0 0
0 0 1
z
_
_
_
_
_
_
_
_
p
x
p
y
p
z
1
_
_
_
_
=
_
_
_
_
z
p
x
x
p
z
z
p
y
y
p
z
0
z
p
z
_
_
_
_
_
_
_
_
(
z
p
x
x
p
z
)/(
z
p
z
)
(
z
p
y
y
p
z
)/(
z
p
z
)
0
1
_
_
_
_
.
(We leave the derivation of this matrix as an exercise. It follows the same process as used above, but remember
that perspective normalization will be applied after applying this matrix.)
Unfortunately, there is a difculty in applying this directly in OpenGL. The problem is that, since this is a pro-
jection transformation, it needs to be applied in GL PROJECTION mode. However, the coordinates provided
to this matrix have already been transformed into the cameras view frame. Before applying this transforma-
tion, the light source and plane equations should rst be converted into view frame coordinates. We will omit
discussion of this issue.
Depth conicts with shadows: There are a couple of aspects of this process that need to be taken with some care.
The rst is the problem of hidden surface removal. If the shadow is drawn at exactly z = 0, then there will be
competition in the depth buffer between the ground and the shadow. A quick-and-dirty x for this is to nudge
the shadow slightly off the ground. For example, store a small positive constant > 0 in the third row, last
column of the above matrix. This will force the z-coordinate of p
+
+
a
b
c
image
plane
a
b
c
+
i
Let us work out the direction of the transmitted ray from this. As before let v denote the normalized view vector,
directed back along the incident ray. Let
t denote the unit vector along the transmitted direction, which we wish
to compute (see Fig. 70).
Eye
t
m
i
v
m
t
t
n
n
t
Fig. 70: Refraction.
The orthogonal projection of v onto the normalized normal vector n is
m
i
= (v n)n = (cos
i
)n.
3
To be completely accurate, the index of refraction depends on the wavelength of light being transmitted. This is what causes white light to
be spread into a spectrum as it passes through a prism, which is called chromatic dispersion. Since we do not model light as an entire spectrum,
but only through a triple of RGB values (which produce the same color visually, but not the same spectrum physically) it is not easy to model this
phenomenon. For simplicity we assume that all wavelengths have the same index of refraction.
Lecture Notes 92 CMSC 427
Consider the two parallel horizontal vectors
w
i
and
w
t
in the gure. We have
w
i
=
m
i
v.
Since v and
i
=
sin
i
sin
t
=
|
w
i
|/|v|
|
w
t
|/|
t|
=
|
w
i
|
|
w
t
|
.
Since
w
i
and
w
t
are parallel we have
w
t
=
i
w
i
=
i
t
(
m
i
v).
The projection of
t onto n is
m
t
= (cos
t
)n, and hence the desired refraction vector is:
t =
w
t
+
m
t
=
i
t
(
m
i
v) (cos
t
)n =
i
t
((cos
i
)n v) (cos
t
)n
=
_
t
cos
i
cos
t
_
n
i
t
v.
We have already computed cos
i
= (v n). We can derive cos
t
from Snells law and basic trigonometry:
cos
t
=
_
1 sin
2
t
=
1
_
t
_
2
sin
2
i
=
1
_
t
_
2
(1 cos
2
i
)
=
1
_
t
_
2
(1 (v n)
2
).
What if the term in the square root is negative? This is possible if (
i
/
t
) sin
i
> 1. In particular, this can only
happen if
i
/
t
> 1, meaning that you are already inside an object with an index of refraction greater than 1.
Notice that when this is the case, Snells law breaks down, since it is impossible to nd
t
whose sine is greater
than 1. In this situation, total internal reection takes place. That is, the light source is not refracted at all, but
is reected back within the object. (By the way, this phenomenon, combined with chromatic dispersion, is one
of the reasons for the existence of rainbows.) When this happens, the refraction reduces to reection and so we
set
t = r
v
, the view reection vector.
In summary, the transmission process is solved as follows.
(1) Compute the point where the ray intersects the surface. Let v be the normalized view vector, let n be the
normalized surface normal at this point, and let
i
and
t
be the indices of refraction on the incoming and
outgoing sides, respectively.
(2) Compute the angle of refraction:
t
= arccos
1
_
t
_
2
.
If the quantity under the square root symbol is negative, process this as internal reection, rather than
transmission.
(4) If the quantity under the square root symbol is nonnegative, compute the transmission vector
t =
_
t
cos
i
cos
t
_
n
i
t
v.
The transmission ray is emitted from the contact point along this direction.
Lecture Notes 93 CMSC 427
Lecture 20: More on Ray Tracing
Ray Tracing: Recall that ray tracing is a powerful method for synthesizing highly realistic images. Unlike OpenGL,
it implements a global model for image generation, based on tracing the rays of light, mostly working backwards
from the eye to the light sources. Last time we discussed the general principal and considered how reection
and refraction are implemented. Today, we discuss other aspects of ray tracing.
Ray Representation: Let us consider how rays are represented, generated, and how intersections are determined.
First off, how is a ray represented? An obvious method is to represent it by its origin point p and a directional
vector u. Points on the ray can be described parametrically using a scalar t:
R = p +tu [ t > 0.
Notice that our ray is open, in the sense that it does not include its endpoint. This is done because in many
instances (e.g., reection) we are shooting a ray from the surface of some object. We do not want to consider
the surface itself as an intersection. (As a practical matter, it is good to require that t is larger than some very
small value, e.g. t 10
3
. This is done because of oating point errors.)
In implementing a ray tracer, it is also common to store some additional information as part of a ray object. For
example, you might want to store the value t
0
at which the ray hits its rst object (initially, t
0
= ) and perhaps
a pointer to the object that it hits.
Ray Generation: Let us consider the question of how to generate rays. Let us assume that we are given essentially
the same information that we use in gluLookAt and gluPerspective. In particular, let eye denote the eye point,
c denote the center point at which the camera is looking, and let up denote the up vector for gluLookAt. Let
y
= fovy/180 denote the y-eld of view in radians. Let n
rows
and n
cols
denote the number of rows and
columns in the nal image, and let = n
cols
/n
rows
denote the windows aspect ratio.
In gluPerspective we also specied the distance to the near and far clipping planes. This was necessary for
setting up the depth buffer. Since there is no depth buffer in ray tracing, these values are not needed, so to
make our life simple, let us assume that the window is exactly one unit in front of the eye. (The distance is not
important, since the aspect ratio and the eld-of-view really determine everything up to a scale factor.)
The height and width of view window relative to its center point are
h = 2 tan
y
2
w = h .
So, the window extends from h/2 to +h/2 in height and w/2 to +w/2 in width. Now, we proceed to
compute the viewing coordinate frame, very much as we did in Lecture 10. The origin of the camera frame is
eye, the location of the eye. The unit vectors for the camera frame are:
e
z
= normalize(c eye) e
x
= normalize(up e
z
) e
y
= e
z
e
x
.
We will follow the (somewhat strange) convention used in .bmp les and assume that rows are indexed from
bottom to top (top to bottom is more common) and columns are indexed from left to right. Every point on the
view window has e
z
coordinate of 1. Now, suppose that we want to shoot a ray for row r and column c, where
0 r < n
rows
and 0 c < n
cols
. Observe that r/n
rows
is in the range from 0 to 1. Multiplying by h maps us
linearly to the interval [0, +h] and then subtracting h/2 yields the nal desired interval [h/2, h/2].
u
r
= h
_
r
n
rows
1
2
_
u
c
= w
_
c
n
cols
1
2
_
.
The location of the corresponding point on the viewing window is
p(r, c) = eye +u
c
e
x
+u
r
e
y
e
z
.
Lecture Notes 94 CMSC 427
r
c
h
w
x
y
z
1
R(r, c)
0
n
rows
0
n
cols
aspect = w/h
Fig. 71: Ray generation.
Thus, the desired ray R(r, c) (see Fig. 71) has the origin e and the directional vector
u(r, c) = normalize(p(r, c) eye).
Rays and Intersections: Given an object in the scene, a ray intersection procedure determines whether the ray inter-
sects and object, and if so, returns the value t
=
b
b
2
4ac
2a
=
b
b
2
4c
2
t
+
=
b +
b
2
4ac
2a
=
b +
b
2
4c
2
.
If t
> 0 we use t
was used to dene the intersection, then we are hitting the object
from the outside, and so n is the desired normal. However, if t
+
was used to dene the intersection, then we are
hitting the object from the inside, and n should be used instead.
Global Illumination through Photon Mapping: Our description of ray tracing so far has been based on the Phong
illumination. Although ray tracing can handle shadows, it is not really a full-edged global illumination model
because it cannot handle complex inter-object effects with respect to light. Such effects include the following
(see Fig. 73).
Caustics: These result when light is focused through refractive surfaces like glass and water. This causes
variations in light intensity on the surfaces on which the light eventually lands.
Indirect illumination: This occurs when light is reected from one surface (e.g., a white wall) onto another.
Color bleeding: When indirect illumination occurs with a colored surface, the reected light is colored. Thus,
a white wall that is positioned next to a bright green object will pick up some of the green color.
There are a number of methods for implementing global illumination models. We will discuss one method,
called photon mapping, which works quite well with ray tracing. Photon mapping is particularly powerful
because it can handle both diffuse and non-diffuse (e.g., specular) reective surfaces and can deal with complex
(curved) geometries. (Some simpler global illumination methods, such as radiosity, which we will not discuss,
suffer from these limitations.)
The basic idea behind photon mapping involves two steps:
Photon tracing: Simulate propagation of photons from light source onto surfaces.
Rendering: Draw the objects using illumination information from the photon trace.
Lecture Notes 96 CMSC 427
Photon mapping Photon tracing
Fig. 73: Photon mapping.
In the rst phase, a large number of photons are randomly generated from each light source and propagated into
the 3-dimensional scene (see Fig. 73). As each photon hits a surface, it is represented by three quantities:
Location: A position in space where the photon lands on a surface.
Power: The color and brightness of the photon.
Incident direction: The direction from which the photon arrived on the surface.
When a photon lands, it may either stay on this surface, or (with some probability) it may be reected onto
another surface. Such reection depends on the properties of the incident surface. For example, bright surfaces
generate a lot of reection, while dark surfaces do not. A photon hitting a colored surface is more likely to
reect the color present in the surface. When the photon is reected, its direction of reection depends on
surface properties (e.g., diffuse reectors scatter photons uniformly in all directions while specular reectors
reect photons nearly along the direction of perfect reection.)
After all the photons have been traced, the rendering phase starts. In order to render a point of some surface,
we check how many photons have landed near this surface point. By summing the total contribution of these
photons and consider surface properties (such as color and reective properties) we determine the intensity of
the resulting surface patch. For this to work, the number of photons shot into the scene must be large enough
that every point has a respectable number of nearby photons.
Because it is not a local illumination method, photon mapping takes more time than simple ray-tracing using the
Phong model, but the results produced by photon mapping can be stunningly realistic, in comparison to simple
ray tracing.
Optional Additional Information: The remaining topics in this lecture are optional and provided for your own in-
terest, in case you want to pursue the topic of ray tracing in greater detail (and even implement your own ray
tracer).
Illumination Equation Revisited: We can combine the familiar Phong illumination model with the reection and
refraction computed above. We assume that we have shot a ray, and it has hit an object at some point p.
Light sources: Let us assume that we have a collection of light source L
1
, L
2
, . . .. Each is associated with an
RGB vector of intensities (any nonnegative values). Let L
a
denote the global RGB intensity of ambient
light.
Visibility of Light Sources: The function Vis(p, i) returns 1 if light source i is visible to point p and 0 other-
wise. If there are no transparent objects, then this can be computed by simply shooting a ray from p to the
light source and seeing whether it hits any objects.
When transparent objects are present this is considerably harder, since we need to consider all the refracted
light beams that hit the light source. The area of caustics deals with simulating indirect illumination
Lecture Notes 97 CMSC 427
through transparent objects. A simplifying (but unrealistic) assumption is that transparent objects never
block illumination light. A bit more realistic is to assume that the transparent object attenuates light
according to its
t
value.
Material color: We assume that an objects material color is given by C. This is an RGB vector, in which each
component is in the interval [0, 1]. We assume that the specular color is the same as the light source, and
that the object does not emit light. Let
a
,
d
, and
s
denote the ambient, diffuse, and specular coefcients
of illumination, respectively. These coefcients are typically in the interval [0, 1]. Let denote the specular
shininess coefcient.
Vectors: Let n,
h, and
l denote the normalized normal, halfway-vector, and light vectors. See the lecture on the
Phong model for how they are computed.
Attenuation: We assume the existence of general quadratic light attenuation, given by the coefcients a, b, and
c, as before. Let d
i
denote the distance from the contact point p to the ith light source.
Reection and refraction: Let
r
and
t
denote the reective and transmitted (refracted) coefcients of illumi-
nation. If
t
,= 0 then let
i
and
t
denote the indices of refraction, and let r
v
and t denote the normalized
view reection and transmission vectors.
Let the pair (p, v) denote a ray originating at point p and heading in direction v. The complete ray-tracing
reection equation is:
I =
a
L
a
C +
i
Vis(p, i)
L
i
a +bd
i
+cd
2
i
[
d
C max(0, n
l) +
s
max(0, (n
h))
]
+
r
trace(p, r
v
) +
t
trace(p, t).
Recall that Vis(p, i) indicates whether the ith light source is visible from p. Note that if
r
or
t
are equal to
0 (as is often the case) then the corresponding ray-trace call need not be made. Observe that attenuation and
lighting are not applied to results of reection and refraction. This seems to behave reasonably in most lighting
situations, where lights and objects are relatively close to the eye.
Numerical Issues: There are some numerical instabilities to beware of when dealing with ray-sphere intersection. If
r is small relative to |w| (which happens when the sphere is far away) then we may lose the effect of r in the
computation of the discriminant. In most applications, this additional precision is not warranted. If you are
interested, consult a textbook on numerical analysis for more accurate calculations.
Lecture 21: Curved Models and B ezier Curves
Geometric Modeling: Geometric modeling is a key element of computer graphics. Up until now, we have considered
only very simple 3-dimensional models, such as triangles and other polygons, 3-dimensional rectangles, planar
surfaces, and simple implicit surfaces such as spheres. Generating more interesting shapes motivates a study of
the methods used for representing objects with complex geometries.
Geometric modeling is fundamentally about representing 3-d objects efciently, in a manner that makes them
easy to design, visualize, and modify. In computer graphics, there is almost no limit to the things that people
would like to model. Including:
Natural objects: Trees, owers, rocks, water, re, smoke, clouds
Human and animals: Skeletal structure, skin, hair, facial expressions
Architecture: Walls, doors, windows, furniture, pipes, railing
Manufacturing: Automobiles, appliances, weaponry, fabric and clothing
Non-geometric Elements: Lighting, textures, surface materials
Lecture Notes 98 CMSC 427
Because the things that we would like to model are so diverse, many different shape representation methods
have emerged over the years. These include the following:
Boundary: Represent objects by their 2-dimensional surfaces. Methods tend to fall into one of two categories,
implicit (blobs and metaballs) or parametric (B ezier surfaces, B-Splines, NURBS).
Volumetric: Represent complex objects through boolean operations on simple 3-dimension objects. For exam-
ple, take a rectangular block and subtract cylindrical hole. This is popular in computer-aided design with
machined objects. It is called constructive solid geometry (CSG).
Procedural: Objects are represented by invoking a procedure that generates the objects. This is used for very
complex natural objects, that are not easily described by mathematical formulas, like mountains, trees, or
cloudes. Examples include particle systems, fractals, and physically-based models.
Today, we will talk about boundary representations.
Boundary Representations: The most common way to represent a 3-dimensional object is to describe its boundary,
that is, the 2-dimensional skin surrounding the object. These are called boundary representations, or B-reps
for short. Boundary models can be formed of either smooth surfaces or at (polygonal) surfaces. Polygonal
surfaces most suitable for representing geometric objects with at side, such as a cube. However, even smooth
objects can be approximated by gluing together a large number of small polygonal objects into a polygonal
mesh. This is the approach that OpenGL assumes. Through the use of polygonal mesh models and smooth
shading, it is possible to produce the illusion of a smooth surface. Even when algebraic surfaces are used as the
underlying representation, in order to render them, it is often necessary to rst convert them into a polygonal
mesh.
Fig. 74: Polygonal mesh used to represent a curved surface.
There are a number of reasons why polygonal meshes are easier to work with than curved surfaces. For example,
the intersection of two planar surfaces is (ignoring special cases) a line. Generally when intersecting curved
surfaces the curve along which they intersect may be quite hard to represent.
Curved Models: Smooth surface models can be broken down into many different forms, depending on the nature
of the dening functions. The most well understood functions are algebraic functions. These are polynomials
of their arguments (as opposed, say to trigonometric functions). The degree of an algebraic function is the
highest sum of exponents. For example f(x, y) = x
2
+ 2x
2
y y is an algebraic function of degree 3. The
ratio of two polynomial functions is called a rational function. These are important, since perspective projective
transformations (because of perspective normalization), map rational functions to rational functions.
OpenGL only supports at (i.e., polygonal) objects (or equivalently, algebraic functions of degree one). Al-
though polygonal models are fundamental to OpenGL, they are not at all an easy method with which to design
and manipulate smooth solid models. Most modeling systems allow the user to dene objects at a higher-level
(e.g., as a curved surface), and then procedures will be provided to break them down into polygonal meshes, for
processing by OpenGL.
Implicit representation: In this representation a curve in 2-d and a surface in 3-d is represented as the zeros of
a formula f(x, y, z) = 0. For example, the representation of a sphere of radius r about center point c is:
f(x, y, z) = (x c
x
)
2
+ (y c
y
)
2
+ (z c
z
)
2
r
2
= 0.
Lecture Notes 99 CMSC 427
It is common to place some restrictions on the possible classes of functions, for example, they must be
algebraic or rational functions.
Implicit representations are nice for determining whether a point is inside, outside, or on the surface (just
evaluate f). However, other tasks (e.g., generating a mesh) may not be as easy.
Implicit functions of constant degree are ne for very simple models (e.g., spheres, cylinders, cones), but
they cannot represent very complex objects. There are methods (which we will not discuss) for creating
complex objects from many simple (low-degree) implicit functions. These have colorful names like blobs
and metaballs.
Parametric representation: In this representation the (x, y)-coordinates of a curve in 2-d is given as three
functions of one parameter (x(u), y(u)). Similarly, a two-dimensional surface in 3-d is given as function
of two parameters (x(u, v), y(u, v), z(u, v)). An example is the parametric representation of a sphere,
which we have seen earlier in our discussion of texture mapping. For example, a sphere of radius r at
center point c could be expressed parametrically as:
x(, ) = c
x
+r sin cos
y(, ) = c
y
+r sin sin
z(, ) = c
z
+r cos ,
for 0 2 and 0 . Here roughly corresponds to latitude and to longitude. Notice that
this is not an algebraic representation, since it involves the use of trigonometric functions.
Note that parametric representations can be used for both curves and surfaces in 3-space (depending on
whether 1 or 2 parameters are used).
Which representation is the best? It depends on the application. Implicit representations are nice, for example,
for computing the intersection of a ray with the surface, or determining whether a point lies inside, outside, or
on the surface. On the other hand, parametric representations are nice because they are easy to subdivide into
small patches for rendering, and hence they are popular in graphics. Sometimes (but not always) it is possible
to convert from one representation to another. We will concentrate on parametric representations in this lecture.
Continuity: Consider a parametric curve P(u) = (x(u), y(u), z(u))
T
. An important condition that we would like
our curves (and surfaces) to satisfy is that they should be as smooth as possible. This is particularly important
when two or more curves or surfaces are joined together. We can formalize this mathematically as follows.
We would like the curves themselves to be continuous (that is not making sudden jumps in value). If the rst k
derivatives (as function of u) exist and are continuous, we say that the curve has kth order parametric continuity,
denoted C
k
continuity. Thus, 0th order continuity just means that the curve is continuous, 1st order continuity
means that tangent vectors vary continuously, and so on. This is shown in Fig. 75
Not C
0
continuous C
0
continuous
not C
1
C
1
continuous
not C
2
C
2
continuous
discontinuity
slope discontinuity
discontinuity
curvature
Fig. 75: Degrees of continuity.
Note that this denition is dependent on the particular parametric representation used. Since the same curve
may be parameterized in different, some people suggest that a more appropriate denition in some circum-
stances is geometric continuity, denoted G
k
, which depends solely on the shape of the curve, and not on the
parameterization used.
Lecture Notes 100 CMSC 427
Generally we will want as high a continuity as we can get, but higher continuity generally comes with a higher
computational cost. C
2
continuity is usually an acceptable goal.
Control Point Representation: For a designer who wishes to design a curve or surface, a symbolic representation of
a curve as a mathematical formula is not very easy representation to deal with. A much more natural method to
dene a curve is to provide a sequence of control points, and to have a system which automatically generates a
curve which approximates this sequence. Such a procedure inputs a sequence of points, and outputs a parametric
representation of a curve. (This idea can be generalized to surfaces as well, but lets study it rst in the simpler
context of curves.)
It might seem most natural to have the curve pass through the control points, that is to interpolate between these
points. There exists such an interpolating polygon, called the Lagrangian interpolating polynomial. However
there are a number of difculties with this approach. For example, suppose that the designer wants to interpolate
a nearly linear set of points. To do so he selects a sequence of points that are very close to lying on a line.
However, polynomials tend to wiggle, and as a result rather than getting a line, we get a wavy curve passing
through these points. (See Fig. 76.)
interpolation approximation
Fig. 76: Interpolation versus approximation.
B ezier Curves and the de Casteljau Algorithm: Let us continue to consider the problem of dening a smooth curve
that approximates a sequence of control points, p
0
, p
1
, . . .. We begin with the simple idea on which these
curves will be based. Let us start with the simplest case of two control points. The simplest curve which
approximates them is just the line segment p
0
p
1
. The function mapping a parameter u to a points on this
segment involves a simple afne combination:
p(u) = (1 u)p
0
+up
1
for 0 u 1.
Observe that this is a weighted average of the points, and for any value of u, the two weighting or blending
functions u and (1 u) are nonnegative and sum to 1. That is, for any value of u the point p(u) is a convex
combination of the control points.
Three control points: Linear interpolation is a concept that we have seen many times. The question is how to go
from an interpolation process that involves two points to one that involves three, or four, or more control points.
Certainly, we could linear interpolate from p
0
to p
1
, and then from p
1
to p
2
, and so on, but this would just give
us a polygonal curvenot very smooth.
A mathematician named Paul de Casteljau, who was working for a French automobile company, came up with a
very simple and ingenious method for generalizing linear interpolations to an arbitrary number of control points
through a process of repeated linear interpolation.
To understand de Castlejaus idea, let us consider the case of three points. We want a smooth curve approximat-
ing them. Consider the line segments p
0
p
1
and p
1
p
2
. From linear interpolation we know how to interpolate a
point on each, say:
p
01
(u) = (1 u)p
0
+up
1
p
11
(u) = (1 u)p
1
+up
2
.
(See the middle gure in Fig. 77).
Lecture Notes 101 CMSC 427
p
0
p
1
p
2
p (u)
01
p (u)
21
p (u)
11
p (u)
02
p (u)
12
p
0
p
1
p
0
p
1
p
2
p (u)
01
p (u)
11
p
3
p(u)
p(u)
p(u)
2 control points 3 control points 4 control points
Fig. 77: Repeated interpolation.
Now that we are down to two points, let us apply the above method to interpolate between them:
p(u) = (1 u)p
01
(u) +up
11
(u)
= (1 u)((1 u)p
0
+up
1
) +u((1 u)p
1
+up
2
)
= (1 u)
2
p
0
+ (2u(1 u))p
1
+u
2
p
2
.
This is a algebraic parametric curve of degree two. This idea was popularized by an engineer, who was working
for a rival French automobile company, named Pierre B ezier. In particular, the resulting curve is called the
B ezier curve of degree two.
Observe that the function involves a weighted sum of the control points using the following blending functions:
b
02
(u) = (1 u)
2
b
12
(u) = 2u(1 u) b
22
(u) = u
2
.
As before, observe that for any value of u the blending functions are all nonnegative and all sum to 1, and hence
each point on the curve is a convex combination of the control points.
An example of the resulting curve is shown in Fig. 78 on the left.
p
p
1
p
p
p
1
p
p
0
2
0
2
3
Fig. 78: B ezier curves for three and four control points.
Four control points: Lets carry this one step further. Consider four control points p
0
, p
1
, p
2
, and p
3
. First use linear
interpolation between each pair yielding the points p
01
(u) and p
11
(u) and p
21
(u) as given above. (The notation
is rather messy. If you look at the right part of Fig. 77, you can see the pattern much more clearly. Simply draw
line segments between each consecutive pair of control points and interpolate along each one.)
Now, we have gone from the initial four control points to three interplated control points. What now? Just repeat
the three-point process! By linearly between these three points we have
p
02
(u) = (1 u)p
01
(u) +up
11
(u) p
12
(u) = (1 u)p
11
(u) +up
21
(u).
Now we are down to just two points. Finally interpolate these (1 u)p
02
(u) + up
12
(u). This gives the nal
point on the curve for this value of u.
Expanding everything yields (trust me):
p(u) = (1 u)
3
p
0
+ (3u(1 u)
2
)p
1
+ (3u
2
(1 u))p
2
+u
3
p
3
.
Lecture Notes 102 CMSC 427
This is a polynomial of degree three, called the B ezier curve of degree three. Observe that the formula has the
same form as the one above. It involves a blending of the four control points. The blending functions are:
b
03
(u) = (1 u)
3
b
13
(u) = 3u(1 u)
2
b
23
(u) = 3u
2
(1 u)
b
33
(u) = u
3
.
It is easy to verify that, for any value of u, these blending functions are all nonnegative and sum to 1. That is,
they form a convex combination of the contorl points. In this case, the blending functions are
Notice that if we write out the coefcients for the bending functions (adding a row for the degree-four functions,
which you can derive on your own), we get the following familiar pattern.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
This is just the famous Pascals triangle. In general, the ith blending function for the degree k B ezier curve has
the general form
b
ik
(u) =
_
k
i
_
(1 u)
ki
u
i
, where
_
k
i
_
=
k!
i!(k i)!
.
These polynomial functions are important in mathematics, and are called the Bernstein polynomials, and are
shown in Fig. 79 over the range u [0, 1].
u
03
b (u)
13 23
b (u)
33
b (u)
b (u)
0 1
Fig. 79: B ezier blending functions (Bernstein polynomials) of degree 3.
B ezier curve properties: B ezier curves have a number of interesting properties. Because each point on a B ezier
curve is a convex combination of the control points, the curve lies entirely within the convex hull of the control
points. (This is not true of interpolating polynomials which can wiggle outside of the convex hull.) Observe that
all the blending functions are 0 at u = 0 except the one associated with p
0
which is 1 and so the curve starts at
p
0
when u = 0. By a symmetric observation, when u = 1 the curve ends at the last point. By evaluating the
derivatives at the endpoints, it is also easy to verify that the curves tangent at u = 0 is collinear with the line
segment p
0
p
1
. A similar fact holds for the ending tangent and the last line segment.
Recall that computing the derivative of parametric curve gives you a tangent vector. (You may recall that we
used this approach for computing normal vectors.) If you compute the derivative of the curve with respect to u,
you will discover to your amazement that the result is itself a B ezier curve. (That is, if you treat each tangent
vector as if it were a point, these points would trace out a B ezier curve. Thus, the parameterized tangent vector
of a B ezier curve is a B ezier curve.
Finally the B ezier curve has the following variation diminishing property. Consider the polyline connecting the
control points. Given any line , the line intersects the B ezier curve no more times than it intersects this polyline.
Hence the sort of wiggling that we saw with interpolating polynomials does not occur with B ezier curves.
Lecture Notes 103 CMSC 427
Lecture 22: B ezier Surfaces and B-splines
Subdividing B ezier curves: Last time we introduced the mathematically elegant B ezier curves. Before going on to
discuss surfaces, we need to consider one more issue. In order to render curves or surfaces using a system like
OpenGL, which only supports rendering of at objects, we need to approximate the curve by a number of small
linear segments. Typically this is done by computing a sufciently dense set of points along the curve or surface,
and then approximating the curve or surface by a collection of line segments or polygonal patches, respectively.
B ezier curves (and surfaces) lend themselves to a very elegant means of recursively subdividing them into
smaller pieces. This is nice, because if we want to render a curve at varying resolutions, we can perform either a
high number or low number of subdivisions. Furthermore, if part of the surface is more important than another
(e.g., it is closer to the viewer), we can adaptively subdivide the more important regions of the surface and leave
less important regions less rened. This is called adaptive renement.
Here is a simple subdivision scheme works for these curves. Let p
0
, . . . , p
3
denote the original sequence of
control points (this can be adapted to any number of points). Relabel these points as p
00
, . . . , p
03
. Perform the
repeated interpolation construction using the parameter u =
1
2
. Label the vertices as shown in the gure below.
Now, consider the sequences p
00
, p
01
, p
02
, p
03
and p
03
, p
12
, p
21
, p
30
. Each of these sequences denes its
own B ezier curve. Amazingly, the concatenation of these two B ezier curves is equal to the original curve. (We
will leave the proof of this as an exercise.)
p
2
p
1
p
0 3
p
p
12
p
21
p
01
p
00 30
p
p
03
p
12
p
21
p
11
p
01
p
00
p
10
p
20
30
p
p
03
p
02
p
02
Fig. 80: B ezier subdivision.
Repeating this subdivision allows us to split the curve into as small a set of pieces as we would like, and at all
times we are given each subcurve in exactly the same form as the original, represented as a set of four control
points. Typically this is done until each of the pieces is sufciently close to being at, or is sufciently small.
B ezier Surfaces: Last time we dened B ezier curves. It is an easy matter to extend this notion to B ezier surfaces.
Recall that B ezier curves were dened by a process of repeated interpolation. We can extend the notion of
interpolation along a line to interpolation along two dimensions. This is called bilinear interpolation. Suppose
that we are given four control points p
00
, p
01
, p
10
, and p
11
. (Note that the indexing has changed here relative
to the previous section.) We use two parameters u and v. We interpolate between p
00
and p
01
using u, between
p
10
and p
11
using u, and then interpolate between these two values using v.
p(u, v) = (1 v)((1 u)p
00
+up
01
) +v((1 u)p
10
+up
11
)
= (1 v)(1 u)p
00
+ (1 v)up
01
+v(1 u)p
10
+vup
11
This is sometimes called a bilinear interpolation, because it is linear in each of the two parameters individually.
Note that the shape is not at, however. It is called a hyperboloid. (See Fig. 81 on the left.)
Recalling that (1 u) and u are the rst-degree B ezier blending functions b
0,1
(u) and b
1,1
(u), we see that this
Lecture Notes 104 CMSC 427
can be written as
p(u, v) = b
01
(v)b
01
(u)p
00
+b
01
(v)b
11
(u)p
01
+b
11
(v)b
01
(u)p
10
+b
11
(v)b
11
(u)p
11
=
1
i=0
1
j=0
b
i,1
(v)b
j,1
(u)p
i,j
.
P
00
P
01
P
10
P
11
P
11
P
00
P
01
P
10
P
20
P
02
P
21
P
21
P
12
v
u
v
u
Fig. 81: B ezier surfaces.
Generalizing this to the next higher degree, say quadratic B ezier surfaces, we have we have a 3 3 array of
control points, p
ij
, 0 i, j 2, and the resulting parametric formula is
p(u, v) =
2
i=0
2
j=0
b
i,2
(v)b
j,2
(u)p
i,j
.
(See Fig. 81 on the right.) This is called a tensor product construction.
Observe that if we x the value of v, then as u varies we get a B ezier curve. Similarly, if we x u and let v vary
then it traces out a B ezier curve. The nal surface is this combination of curves. It has the same convex hull and
tangent properties that B ezier curves have.
Howare B ezier surfaces rendered in OpenGL? We can generalize the subdivision process for curves in a straight-
forward manner (using u = v =
1
2
). This will result in four sets of control points, where the union of the
resulting surface patches is equal to the original surface. Again, the subdivision process may be repeated until
each patch is sufciently close to being at or is sufciently small, after which the resulting control points dene
the vertices of a polygon.
Cubic B-splines: Although B ezier curves are very elegant, they do have some shortcomings. The main problem is
that if we want to dene a single complex curve with many variations and wiggles, we need to have a large
number of control points. But this leads to a high degree polynomial, hence more complex calculations. The
fact that the B ezier blending functions are all nonzero over the entire range u (0, 1) means that these functions
have global support. This means that the movement of even one control point has an effect on the entire curve
(although it is most noticeable only in the region of the point). A system that provides for local support would
be prefered, where each control point only affects a local portion of the curve.
One solution would be to link together a many low degree (e.g. cubic) B ezier curves end to end. Getting
the joints to link with C
2
continuity (recall that this means that the function and its rst two derivatives are
continuous) is a bit tricky. (We will leave as an exercise the conditions on the control points that would guarantee
this.) What we would like is a method of stringing many points together so that we get the best of all worlds:
low degree, many control points, and C
2
(or higher) continuity.
B-splines were developed to address these shortcomings. The idea is that we will still use smooth blending
functions multiplied times the control points, but these functions will have the property that these blending
Lecture Notes 105 CMSC 427
functions are nonzero only over a small amount of the parameter range. Thus these functions have only local
support. Over the nonzero range, they will consist of the concatenation of smooth polynomials. As before each
point on the curve will be given by blending the control points
p(u) =
m
i=0
B
i
(u)p
i
,
where B
i
(u) denotes the ith blending function. The gure below left gives a crude rendering of B-splines
blending functions of order 2. Note that once we know how to construct curves, we can apply the same tensor-
product construction to form B-spline surfaces.
u
k
B (u)
0
B (u)
1
B (u)
2
B (u)
3
u
k+1
u
k+3
u
k+2
Fig. 82: B-spline basis function.
Note that it is impossible to dene a single polynomial that is zero on some range and nonzero on some other. So
to dene the B-spline blending functions we will need to subdivide the parameter space u into a set of intervals,
and dene a different polynomial over each interval. The result is a piecewise polynomial function. If we join
the pieces with sufciently high continuity then the resulting spline will have the same continuity. In the gure
above right, each interval contains a different polynomial function.
The B-spline blending functions are a generalization of the B ezier blending functions. Lets suppose that we
want to generate a curve of degree d. (The standard cubic B-spline will be the case d = 3.) Also let us assume
that we have m+ 1 data points p
0
, . . . , p
m
. Rather than work over the interval 0 u 1 as we did for B ezier
curves, it will be notationally convenient to extend the range of u to a set of intervals:
u
min
= u
0
u
1
u
2
. . . u
n
= u
max
.
These parameter values are called knot points. (Note that the term point does not refer to a point in space, as
with control points. These are just scalar values.) Each of the blending functions will consist of the concatenation
of polynomial functions, with one polynomial over each knot interval, [u
i1
, u
i
]. For simplicity you might think
of these as being intervals of unit length for the time being, but we will see later that there are advantages to
making intervals of different sizes. There will be a relationship between the number of intervals n and the
number of points m, which we will consider later.
How do we dene the B-spline blending functions? There are two ways to do this. The rst is to write down
the requirements that the blending functions must be C
2
continuous at the joint points, and that they satisfy the
convex hull property. Together these constraints completely dene B-splines. (We leave this as an exercise.)
Instead, as with the B ezier blending functions, we will do this by recursively applying linear interpolation to
the blending functions of the next lower degree. An elegant recursive expression of the blending function (but
somewhat difcult to understand) is given by the Cox-deBoor recursion. Let B
i,d
(u) denote the ith blending
function for a B-spline of degree d.
B
k,0
(u) =
_
1 if u
k
u < u
k+1
,
0 otherwise.
B
k,d
(u) =
u u
k
u
k+d
u
k
B
k,d1
(u) +
u
k+d+1
u
u
k+d+1
u
k+1
B
k+1,d1
(u).
Lecture Notes 106 CMSC 427
This is quite hard to comprehend at rst sight. However observe that as with B ezier curves, the curve at each
degree is expressed as the weighted average of two curves and the next lower degree. It can be proved (by
induction) that irrespective of the knot spacing the blending functions sum to 1.
If you grind through the denitions, then you will see that B
k,0
(u) is a step function that is 1 in the interval
[u
k
, u
k+1
). B
k,1
(u) spans two intervals and is a piecewise linear function that goes from 0 to 1 and then back to
0. B
k,2
(u) spans three intervals and is a piecewise quadratic that grows from 0 to
1
4
, then up to
3
4
in the middle
of the second interval, back to
1
4
, and back to 0. Finally B
k,3
(u) is a cubic that spans four intervals growing
from 0 to
1
6
to
2
3
, then back to
1
6
and to 0. Thus, successively higher degrees are successively smoother.
For example, the blending functions for the quadratic B-spline are:
B
k,2
=
_
_
0 u < u
k
1
2
(u u
k
)
2
u
k
u < u
k+1
(u u
k+1
)
2
+ (u u
k+1
) +
1
2
u
k+1
u < u
k+2
1
2
(1 (u u
k+2
))
2
u
k+2
u < u
k+3
0 u
k+3
u.
Notice that only the basis case of the recursion is dened in a piecewise manner, but all the other functions
inherit their piecewise nature from this.
Your eyes may strain to understand this formula, but if you work things out piece by piece, youll see that the
equations are actually fairly reasonable. For example, observe that in B
k,2
there are three intervals in which the
function is nonzero, running from u
k
to u
k+3
. In the rst interval, we have a quadratic that grows from 0 to
1
2
(assuming that each interval [u
i
, u
i+1
] is of size 1). In the second interval, we have a inverted parabola that
starts at
1
2
(when u = u
k+1
) grows to
1
4
+
1
2
+
1
2
=
3
4
(when u is midway between u
k+1
and u
k+2
) and then
shrinks down to
1
2
(when u = u
k+2
). In the third interval, we have a quadratic that decreases from
1
2
down to 0.
(See the function labeled B
0,2
in Fig. 83.)
B
03
B
02
B
01
B
00
Fig. 83: B-spline blending functions.
There is one other important issue that we have not addressed, namely, how to assign knot points to contol
points. Due to time limitations, we will skip this issue, but please refer to any standard source on B-splines for
further information.
Lecture Notes 107 CMSC 427
Supplemental Topics
Lecture 23: Scan Conversion
Scan Conversion: We turn now to a number of miscellaneous issues involved in the implementation of computer
graphics systems. In our top-down approach we have concentrated so far on the high-level view of computer
graphics. In the next few lectures we will consider how these things are implemented. In particular, we consider
the question of how to map 2-dimensional geometric objects (as might result from projection) to a set of pixels
to be colored. This process is called scan conversion or rasterization. We begin by discussing the simplest of
all rasterization problems, drawing a single line segment.
Let us think of our raster display as an integer grid, in which each pixel is a circle of radius 1/2 centered at
each point of the grid. We wish to illuminate a set of pixels that lie on or close to the line. In particular, we
wish to draw a line segment from q = (q
x
, q
y
) to r = (r
x
, r
y
), where the coordinates are integer grid points
(typically by a process of rounding). Let us assume further that the slope of the line is between 0 and 1, and
that q
x
< r
x
. This may seem very restrictive, but it is not difcult to map any line drawing problem to satisfy
these conditions. For example, if the absolute value of the slope is greater than 1, then we interchange the roles
of x and y, thus resulting in a line with a reciprocal slope. If the slope is negative, the algorithm is very easy
to modify (by decrementing rather than incrementing). Finally, by swapping the endpoints we can always draw
from left to right.
Bresenhams Algorithm: We will discuss an algorithm, which is called Bresenhams algorithm. It is one of the
oldest algorithms known in the eld of computer graphics. It is also an excellent example of how one can
squeeze every but of efciency out an algorithm. We begin by considering an implicit representation of the line
equation. (This is used only for deriving the algorithm, and is not computed explicitly by the algorithm.)
f(x, y) = ax +by +c = 0.
If we let d
x
= r
x
q
x
, d
y
= r
y
q
y
, it is easy to see (by substitution) that a = d
y
, b = d
x
, and c =
(q
x
r
y
r
x
q
y
). Observe that all of these coefcients are all integers. Also observe that f(x, y) > 0 for points
that lie below the line and f(x, y) < 0 for points above the line. For reasons that will become apparent later, let
us use an equivalent representation by multiplying by 2
f(x, y) = 2ax + 2by + 2c = 0.
Here is the intuition behind Bresenhams algorithm. For each integer x value, we wish to determine which
integer y value is closest to the line. Suppose that we have just nished drawing a pixel (p
x
, p
y
) and we are
interested in guring out which pixel to draw next. Since the slope is between 0 and 1, it follows that the
next pixel to be drawn will either be the pixel to our East (E = (p
x
+ 1, p
y
)) or the pixel to our NorthEast
(NE = (p
x
+ 1, p
y
+ 1)). Let q denote the exact y-value (a real number) of the line at x = p
x
+ 1. Let
m = p
y
+ 1/2 denote the y-value midway between E and NE. If q < m then we want to select E next, and
otherwise we want to select NE. IF q = m then we can pick either, say E. See the gure.
To determine which one to pick, we have a decision variable D which will be the value of f at the midpoint.
Thus
D = f(p
x
+ 1, p
y
+ (1/2))
= 2a(p
x
+ 1) + 2b
_
p
y
+
1
2
_
+ 2c
= 2ap
x
+ 2bp
y
+ (2a +b + 2c).
Lecture Notes 108 CMSC 427
x
+1
x
+2
x
+1
y
y
m
q
f(x,y) > 0
f(x,y) < 0
NE
E
p
p
p p p
Fig. 84: Bresenhams midpoint algorithm.
If D > 0 then m is below the line, and so the NE pixel is closer to the line. On the other hand, if D 0 then
m is above the line, so the E pixel is closer to the line. (Note: We can see now why we doubled f(x, y). This
makes D an integer quantity.)
The good news is that D is an integer quantity. The bad news is that it takes at least at least two multiplications
and two additions to compute D (even assuming that we precompute the part of the expression that does not
change). One of the clever tricks behind Bresenhams algorithm is to compute D incrementally. Suppose we
know the current D value, and we want to determine its next value. The next D value depends on the action we
take at this stage.
We go to E next: Then the next midpoint will have coordinates (p
x
+ 2, p
y
+ (1/2)) and hence the new D
value will be
D
new
= f(p
x
+ 2, p
y
+ (1/2))
= 2a(p
x
+ 2) + 2b
_
p
y
+
1
2
_
+ 2c
= 2ap
x
+ 2bp
y
+ (4a +b + 2c)
= 2ap
x
+ 2bp
y
+ (2a +b + 2c) + 2a
= D + 2a = D + 2d
y
.
Thus, the new value of D will just be the current value plus 2d
y
.
We go to NE next: Then the next midpoint will have coordinates (p
x
+ 2, p
y
+ 1 + (1/2)) and hence the new
D value will be
D
new
= f(p
x
+ 2, p
y
+ 1 + (1/2))
= 2a(p
x
+ 2) + 2b
_
p
y
+
3
2
_
+ 2c
= 2ap
x
+ 2bp
y
+ (4a + 3b + 2c)
= 2ap
x
+ 2bp
y
+ (2a +b + 2c) + (2a + 2b)
= D + 2(a +b) = D + 2(d
y
d
x
).
Thus the new value of D will just be the current value plus 2(d
y
d
x
).
Note that in either case we need perform only one addition (assuming we precompute the values 2d
y
and
2(d
y
d
x
). So the inner loop of the algorithm is quite efcient.
The only thing that remains is to compute the initial value of D. Since we start at (q
x
, q
y
) the initial midpoint is
Lecture Notes 109 CMSC 427
at (q
x
+ 1, q
y
+ 1/2) so the initial value of D is
D
init
= f(q
x
+ 1, q
y
+ 1/2)
= 2a(q
x
+ 1) + 2b
_
q
y
+
1
2
_
+ 2c
= (2aq
x
+ 2bq
y
+ 2c) + (2a +b)
= 0 + 2a +b Since (q
x
, q
y
) is on line
= 2d
y
d
x
.
We can now give the complete algorithm. Recall our assumptions that q
x
< r
x
and the slope lies between 0
and 1. Notice that the quantities 2d
y
and 2(d
y
d
x
) appearing in the loop can be precomputed, so each step
involves only a comparison and a couple of additions of integer quantities.
Bresenhams midpoint algorithm
void bresenham(IntPoint q, IntPoint r) {
int dx, dy, D, x, y;
dx = r.x - q.x; // line width and height
dy = r.y - q.y;
D = 2
*
dy - dx; // initial decision value
y = q.y; // start at (q.x,q.y)
for (x = q.x; x <= r.x; x++) {
writePixel(x, y);
if (D <= 0) D += 2
*
dy; // below midpoint - go to E
else { // above midpoint - go to NE
D += 2
*
(dy - dx); y++;
}
}
}
Bresenhams algorithm can be modied for drawing other sorts of curves. For example, there is a Bresenham-
like algorithm for drawing circular arcs. The generalization of Bresenhams algorithm is called the midpoint
algorithm, because of its use of the midpoint between two pixels as the basic discriminator.
Filling Regions: In most instances we do not want to draw just a single curve, and instead want to ll a region. There
are two common methods of dening the region to be lled. One is polygon-based, in which the vertices of a
polygon are given. We will discuss this later. The other is pixel based. In this case, a boundary region is dened
by a set of pixels, and the task is to ll everything inside the region. We will discuss this latter type of lling for
now, because it brings up some interesting issues.
The intuitive idea that we have is that we would like to think of a set of pixels as dening the boundary of some
region, just as a closed curve does in the plane. Such a set of pixels should be connected, and like a curve, they
should split the innite grid into two parts, an interior and an exterior. Dene the 4-neighbors of any pixel to be
the pixels immediately to the north, south, east, and west of this pixel. Dene the 8-neighbors to be the union
of the 4-neighbors and the 4 closest diagonal pixels. There are two natural ways to dene the notion of being
connected, depending on which notion of neighbors is used.
4-connected: A set is 4-connected if for any two pixels in the set, there is path from one to the other, lying
entirely in the set and moving from one pixel to one of its 4-neighbors.
8-connected: A set is 8-connected if for any two pixels in the set, there is path from one to the other, lying
entirely in the set and moving from one pixel to one of its 8-neighbors.
Observe that a 4-connected set is 8-connected, but not vice versa. Recall from the Jordan curve theorem that
a closed curve in the plane subdivides the plane into two connected regions, and interior and an exterior. We
Lecture Notes 110 CMSC 427
Fig. 85: 4-connected (left) and 8-connected (right) sets of pixels.
have not dened what we mean by a closed curve in this context, but even without this there are some problems.
Observe that if a boundary curve is 8-connected, then it is generally not true that it separates the innite grid
into two 8-connected regions, since (as can be seen in the gure) both interior and exterior can be joined to each
other by a 8-connected path. There is an interesting way to x this problem. In particular, if we require that the
boundary curve be 8-connected, then we require that the region it dene be 4-connected. Similarly, if we require
that the boundary be 4-connected, it is common to assume that the region it denes be 8-connected.
Recursive Flood Filling: Irrespective of how we dene connectivity, the algorithmic question we want to consider
is how to ll a region. Suppose that we are given a starting pixel p = (p
x
, p
y
). We wish to visit all pixels in
the same connected component (using say, 4-connectivity), and assign them all the same color. We will assume
that all of these pixels initially share some common background color, and we will give them a new region
color. The idea is to walk around, as whenever we see a 4-neighbor with the background color we assign it
color the region color. The problem is that we may go down dead-ends and may need to backtrack. To handle
the backtracking we can keep a stack of unnished pixels. One way to implement this stack is to use recursion.
The method is called ood lling. The resulting procedure is simple to write down, but it is not necessarily the
most efcient way to solve the problem. See the book for further consideration of this problem.
Recursive Flood-Fill Algorithm (4-connected)
void floodFill(intPoint p) {
if (getPixel(p.x, p.y) == backgroundColor) {
setPixel(p.x, p.y, regionColor);
floodFill(p.x - 1, p.y); // apply to 4-neighbors
floodFill(p.x + 1, p.y);
floodFill(p.x, p.y - 1);
floodFill(p.x, p.y + 1);
}
}
Lecture 24: Scan Conversion of Circles
Midpoint Circle Algorithm: Let us consider how to generalize Bresenhams midpoint line drawing algorithm for the
rasterization of a circle. We will make a number of assumptions to simplify the presentation of the algorithm.
First, let us assume that the circle is centered at the origin. (If not, then the initial conditions to the following
algorithm are changed slightly.) Let R denote the (integer) radius of the circle.
The rst observations about circles is that it sufces to consider how to draw the arc in the positive quadrant
from /4 to /2, since all the other points on the circle can be determined from these by 8-way symmetry.
Lecture Notes 111 CMSC 427
(x,y)
(y,x)
(y,x)
(x,y)
(y,x)
(x,y)
(x,y)
(y,x)
Fig. 86: 8-way symmetry for circles.
What are the comparable elements of Bresenhams midpoint algorithm for circles? As before, we need an
implicit representation of the function. For this we use
F(x, y) = x
2
+y
2
R
2
= 0.
Note that for points inside the circle (or under the arc) this expression is negative, and for points outside the
circle (or above the arc) it is positive.
y
p
x
p
y
p
1
E
SE
M
x
p
+1
Fig. 87: Midpoint algorithm for circles.
Lets assume that we have just nished drawing pixel (x
p
, y
p
), and we want to select the next pixel to draw
(drawing clockwise around the boundary). Since the slope of the circular arc is between 0 and 1, our choice
at each step our choice is between the neighbor to the east E and the neighbor to the southeast SE. If the circle
passes above the midpoint M between these pixels, then we go to E next, otherwise we go to SE.
Next, we need a decision variable. We take this to be the value of F(M), which is
D = F(M) = F(x
p
+ 1, y
p
1
2
)
= (x
p
+ 1)
2
+ (y
p
1
2
)
2
R
2
.
If D < 0 then M is below the arc, and so the E pixel is closer to the line. On the other hand, if D 0 then M
is above the arc, so the SE pixel is closer to the line.
Again, the new value of D will depend on our choice.
We go to E next: Then the next midpoint will have coordinates (x
p
+2, y
p
(1/2)) and hence the new d value
Lecture Notes 112 CMSC 427
will be
D
new
= F(x
p
+ 2, y
p
1
2
)
= (x
p
+ 2)
2
+ (y
p
1
2
)
2
R
2
= (x
2
p
+ 4x
p
+ 4) + (y
p
1
2
)
2
R
2
= (x
2
p
+ 2x
p
+ 1) + (2x
p
+ 3) + (y
p
1
2
)
2
R
2
= (x
p
+ 1)
2
+ (2x
p
+ 3) + (y
p
1
2
)
2
R
2
= D + (2x
p
+ 3).
Thus, the new value of D will just be the current value plus 2x
p
+ 3.
We go to NE next: Then the next midpoint will have coordinates (x
p
+ 2, Sy
p
1 (1/2)) and hence the
new D value will be
D
new
= F(x
p
+ 2, y
p
3
2
)
= (x
p
+ 2)
2
+ (y
p
3
2
)
2
R
2
= (x
2
p
+ 4x
p
+ 4) + (y
2
p
3y
p
+
9
4
) R
2
= (x
2
p
+ 2x
p
+ 1) + (2x
p
+ 3) + (y
2
p
y
p
+
1
4
) + (2y
p
+
8
4
) R
2
= (x
p
+ 1)
2
+ (y
p
1
2
)
2
R
2
+ (2x
p
+ 3) + (2y
p
+ 2)
= D + (2x
p
2y
p
+ 5)
Thus the new value of D will just be the current value plus 2(x
p
y
p
) + 5.
The last issue is computing the initial value of D. Since we start at x = 0, y = R the rst midpoint of interest
is at x = 1, y = R 1/2, so the initial value of D is
D
init
= F(1, R
1
2
)
= 1 + (R
1
2
)
2
R
2
= 1 +R
2
R +
1
4
R
2
=
5
4
R.
This is something of a pain, because we have been trying to avoid oating point arithmetic. However, there is a
very clever observation that can be made at this point. We are only interested in testing whether D is positive or
negative. Whenever we change the value of D, we do so by a integer increment. Thus, D is always of the form
D
+ 1/4, where D
a
=
y
s
y
0
y
1
y
0
is the ratio into which the scan line subdivides the edge P
0
P
1
. The depth of point P
a
, can be interpolated by the
following afne combination
z
a
= (1
a
)z
0
+
a
z
1
.
(Is this really an accurate interpolation of the depth information? Remember that the projection transformation
maps lines to lines, but depth is mapped nonlinearly. It turns out that this does work, but well leave the
explanation as an exercise.) We can derive a similar expression for z
b
.
Lecture Notes 118 CMSC 427
Then as we scan along the scan line, for each value of y we have
=
x x
a
x
b
x
a
,
and the depth of the scanned point is just the afne combination
z = (1 )z
a
+z
b
.
It is more efcient (from the perspective of the number of arithmetic operations) to do this by computing z
a
accurately, and then adding a small incremental value as we move to each successive pixel on the line. The scan
line traverses x
b
x
a
pixels, and over this range, the depth values change over the range z
b
z
a
. Thus, the
change in depth per pixel is
z
=
z
b
z
a
x
b
x
a
.
Starting with z
a
, we add the value
z
to the depth value of each successive pixel as we scan across the row. An
analogous trick may be used to interpolate the depth values along the left and right edges.
Lecture 27: Light and Color
correction.
Achromatic Light: Light and its perception are important to understand for anyone interested in computer graphics.
Before considering color, we begin by considering some issues in the perception of light intensity and the
generation of light on most graphics devices. Let us consider color-free, or achromatic light, that is gray-
scale light. It is characterized by one attribute: intensity which is a measure of energy, or luminance, which
is the intensity that we perceive. Intensity affects brightness, and hence low intensities tend to black and high
intensities tend to white. Let us assume for now that each intensity value is specied as a number from 0 to 1,
where 0 is black and 1 is white. (Actually intensity has no limits, since it is a measure of energy. However, from
a practical perspective, every display device has some maximum intensity that it can display. So think of 1 as
the brightest white that your monitor can generate.)
Perceived Brightness: You would think that intensity and luminance are linearly proportional to each other, that
is, twice the intensity is perceived as being twice as bright. However, the human perception of luminance is
nonlinear. For example, suppose we want to generate 10 different intensities, producing a uniform continuous
variation from black to white on a typical CRT display. It would seem logical to use equally spaced intensities:
0.0, 0.1, 0.2, 0.3, . . . , 1.0. However our eye does not perceive these intensities as varying uniformly. The reason
is that the eye is sensitive to ratios of intensities, rather than absolute differences. Thus, 0.2 appears to be twice
as bright as 0.1, but 0.6 only appears to be 20% brighter than 0.5. In other words, the response R of the human
visual system to a light of a given intensity I can be approximated (up to constant factors) by a logarithmic
function
R(I) = log I.
This is called the Weber-Fechner law. (It is not so much a physical law as it is a model of the human visual
system.)
For example, suppose that we want to generate intensities that appear to varying linearly between two intensities
I
0
to I
1
, as varies from 0 to 1. Rather than computing an afne (i.e., arithmetic) combination of I
0
and I
1
,
instead we should compute a geometric combination of these two intensities
I
= I
1
0
I
1
.
Observe that, as with afne combinations, this varies continuously from I
0
to I
1
as varies from 0 to 1. The
reason for this choice is that the response function varies linearly, that is,
R(I
) = log(I
1
0
I
1
) = (1 ) log I
0
+log I
1
= (1 )R(I
0
) +R(I
1
).
Lecture Notes 119 CMSC 427
Gamma Correction: Just to make things more complicated, there is not a linear relation between the voltage supplied
to the electron gun of a CRT and the intensity of the resulting phosphor. Thus, the RGB value (0.2, 0.2, 0.2)
does not emit twice as much illumination energy as the RGB value (0.1, 0.1, 0.1), when displayed on a typical
monitor.
The relationship between voltage and brightness of the phosphors is more closely approximated by the following
function:
I = V
,
where I denotes the intensity of the pixel and V denotes the voltage on the signal (which is proportional to the
RGB values you store in your frame buffer), and is a constant that depends on physical properties of the display
device. For typical CRT monitors, it ranges from 1.5 to 2.5. (2.5 is typical for PCs and Sun workstations.) The
term gamma refers to the nonlinearity of the transfer function.
Users of graphics systems need to correct this in order to get the colors they expect. Gamma correction is the
process of altering the pixel values in order to compensate for the monitors nonlinear response. In a system that
does not do gamma correction, the problem is that low voltages produce unnaturally dark intensities compared
to high voltages. The result is that dark colors appear unusually dark. In order to correct this effect, modern
monitors provide the capability of gamma correction. In order to achieve a desired intensity I, we instead aim
to produce a corrected intensity:
I
= I
1/
which we display instead of I. Thus, when the gamma effect is taken into account, we will get the desired
intensity.
Some graphics displays (like SGIs and Macs) provide an automatic (but typically partial) gamma correction. In
most PCs the gamma can be adjusted manually. (However, even with gamma correction, do not be surprised
if the same RGB values produce different colors on different systems.) There are resources on the Web to
determine the gamma value for your monitor.
Light and Color: Light as we perceive it is electromagnetic radiation from a narrow band of the complete spectrum
of electromagnetic radiation called the visible spectrum. The physical nature of light has elements that are like
particle (when we discuss photons) and as a wave. Recall that wave can be described either in terms of its
frequency, measured say in cycles per second, or the inverse quantity of wavelength. The electro-magnetic spec-
trum ranges from very low frequency (high wavelength) radio waves (greater than 10 centimeter in wavelength)
to microwaves, infrared, visible light, ultraviolet and x-rays and high frequency (low wavelength) gamma rays
(less than 0.01 nm in wavelength). Visible light lies in the range of wavelengths from around 400 to 700 nm,
where nm denotes a nanometer, or 10
9
of a meter.
Physically, the light energy that we perceive as color can be described in terms of a function of wavelength ,
called the spectral distribution function or simply spectral function, f(). As we walk along the wavelength
axis (from long to short wavelengths), the associated colors that we perceive varying along the colors of the
rainbow red, orange, yellow, green, blue, indigo, violet. (Remember the Roy G. Biv mnemonic.) Of course,
these color names are human interpretations, and not physical divisions.
The Eye and Color Perception: Light and color are complicated in computer graphics for a number of reasons. The
rst is that the physics of light is very complex. Secondly, our perception of light is a function of our optical
systems, which perform numerous unconscious corrections and modications to the light we see.
The retina of the eye is a light sensitive membrane, which contains two types of light-sensitive receptors, rods
and cones. Cones are color sensitive. There are three different types, which are selectively more sensitive to
red, green, or blue light. There are from 6 to 7 million cones concentrated in the fovea, which corresponds to
the center of your view. The tristimulus theory states that we perceive color as a mixture of these three colors.
Blue cones: peak response around 440 nm with about 2% of light absorbed by these cones.
Green cones: peak response around 545 nm with about 20% of light absorbed by these cones.
Lecture Notes 120 CMSC 427
Red cones: peak response around 580 nm, with about 19% of light absorbed by these cones.
The different absorption rates comes from the fact that we have far fewer blue sensitive cones in the fovea as
compared with red and green. Rods in contrast occur in lower density in the fovea, and do not distinguish color.
However they are sensitive to low light and motion, and hence serve a function for vision at night.
green
red
blue
400 480 560 680
Wavelength (nm)
0.1
F
r
a
c
t
i
o
n
o
f
l
i
g
h
t
a
b
s
o
r
b
e
d
b
y
c
o
n
e
0.2
Fig. 94: Spectral response curves for cones (adapted from Foley, vanDam, Feiner and Hughes).
It is possible to produce light within a very narrow band of wavelengths using lasers. Note that because of our
limited ability to sense light of different colors, there are many different spectra that appear to us to be the same
color. These are called metamers. Thus, spectrum and color are not in 11 correspondence. Most of the light
we see is a mixture of many wavelengths combined at various strengths. For example, shades of gray varying
from white to black all correspond to fairly at spectral functions.
Describing Color: Throughout this semester we have been very lax about dening color carefully. We just spoke of
RGB values as if that were enough. However, we never indicated what RGB means, independently from the
notion that they are the colors of the phosphors on your display. How would you go about describing color
precisely, so that, say, you could unambiguously indicate exactly what shade you wanted in a manner that is
independent of the display device? Obviously you could give the spectral function, but that would be overkill
(since many spectra correspond to the same color) and it is not clear how you would nd this function in the
rst place.
There are three components to color, which seemto describe color much more predictably than does RGB. These
are hue, saturation, and lightness. The hue describes the dominant wavelength of the color in terms of one of the
pure colors of the spectrum that we gave earlier. The saturation describes how pure the light is. The red color
of a re-engine is highly saturated, whereas pinks and browns are less saturated, involving mixtures with grays.
Gray tones (including white and black) are the most highly unsaturated colors. Of course lightness indicates the
intensity of the color. But although these terms are somewhat more intuitive, they are hardly precise.
The tristimulus theory suggests that we perceive color by a process in which the cones of the three types each
send signals to our brain, which sums these responses and produces a color. This suggests that there are three
primary spectral distribution functions, R(), G(), and B(), and every saturated color that we perceive can
be described as a positive linear combination of these three:
C = rR +gG+bB where r, g, b 0.
Note that R, G and B are functions of the wavelength , and so this equation means we weight each of these
three functions by the scalars r, g, and b, respectively, and then integrate over the entire visual spectrum. C is
the color that we perceive.
Lecture Notes 121 CMSC 427
Extensive studies with human subjects have shown that it is indeed possible to dene saturated colors as a
combination of three spectra, but the result has a very strange outcome. Some colors can only be formed by
allowing some of the coefcients r, g, or b to be negative. E.g. there is a color C such that
C = 0.7R + 0.5G0.2B.
We know what it means to form a color by adding light, but we cannot subtract light that is not there. The way
that this equation should be interpreted is that we cannot form color C from the primaries, but we can form the
color C + 0.2B by combining 0.7R + 0.5G. When we combine colors in this way they are no longer pure, or
saturated. Thus such a color C is in some sense super-saturated, since it cannot be formed by a purely additive
process.
The CIE Standard: In 1931, a commission was formed to attempt to standardize the science of colorimetry. This
commission was called the Commission Internationale de l
Eclairage, or CIE.
The results described above lead to the conclusion that we cannot describe all colors as positive linear combina-
tions of three primary colors. So, the commission came up with a standard for describing colors. They dened
three special super-saturated primary colors X, Y , and Z, which do not correspond to any real colors, but they
have the property that every real color can be represented as a positive linear combination of these three.
Z
X
Y
0
0.5
1.0
1.5
S
t
i
m
u
l
u
s
v
a
l
u
e
400 500 700 600
Wave length (nm)
X
Fig. 95: CIE primary colors (adapted from Hearn and Baker).
The resulting 3-dimensional space, and hence is hard to visualize. A common way of drawing the diagram is
to consider a single 2-dimensional slice, by normalize by cutting with the plane X + Y + Z = 1. We can then
project away the Z component, yielding the chromaticity coordinates:
x =
X
X +Y +Z
y =
Y
X +Y +Z
(and z can be dened similarly). These components describe just the color of a point. Its brightness is a function
of the Y component. (Thus, an alternative, but seldom used, method of describing colors is as xyY .)
If we plot the various colors in this (x, y) coordinates produce a 2-dimensional shark-n convex shape shown
in Fig. 96. Lets explore this gure a little. Around the curved top of the shark-n we see the colors of the
spectrum, from the long wavelength red to the short wavelength violet. The top of the n is green. Roughly
in the center of the diagram is white. The point C corresponds nearly to daylight white. As we get near the
boundaries of the diagram we get the purest or most saturated colors (or hues). As we move towards C, the
colors become less and less saturated.
Lecture Notes 122 CMSC 427
520
540
560
580
600
(Cyan)
(Blue)
(Violet)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0
Spectral colors
(Purple)
y
x
700 (Red)
(Yellow)
500
C (White)
(Green)
0.8
400
Color gamut
480
Fig. 96: CIE chromaticity diagram and color gamut (adapted from Hearn and Baker).
An interesting consequence is that, since the colors generated by your monitor are linear combinations of three
different colored phosphors, there exist regions of the CIE color space that your monitor cannot produce. (To
see this, nd a bright orange sheet of paper, and try to imitate the same saturated color on your monitor.)
The CIE model is useful for providing formal specications of any color as a 3-element vector. Carefully de-
signed image formats, such as TIFF and PNG, specify colors in terms of the CIE model, so that, in theory, two
different devices can perform the necessary corrections to display the colors as true as possible. Typical hard-
ware devices like CRTs, televisions, and printers use other standards that are more convenient for generation
purposes. Unfortunately, neither CIE nor these models is particularly intuitive from a users perspective.
Lecture 28: Halftone Approximation
Halftone Approximation: Not all graphics devices provide a continuous range of intensities. Instead they provide a
discrete set of choices. The most extreme case is that of a monochrom display with only two colors, black and
white. Inexpensive monitors have look-up tables (LUTs) with only 256 different colors at a time. Also, when
images are compressed, e.g. as in the gif format, it is common to reduce from 24-bit color to 8-bit color. The
question is, how can we use a small number of available colors or shades to produce the perception of many
colors or shades? This problem is called halftone approximation.
We will consider the problem with respect to monochrome case, but the generalization to colors is possible, for
example by treating the RGB components as separate monochrome subproblems.
Newspapers handle this in reproducing photographs by varying the dot-size. Large black dots for dark areas and
small black dots for white areas. However, on a CRT we do not have this option. The simplest alternative is just
to round the desired intensity to the nearest available gray-scale. However, this produces very poor results for a
monochrome display because all the darker regions of the image are mapped to black and all the lighter regions
are mapped to white.
One approach, called dithering, is based on the idea of grouping pixels into groups, e.g. 3 3 or 4 4 groups,
and assigning the pixels of the group to achieve a certain affect. For example, suppose we want to achieve 5
halftones. We could do this with a 2 2 dither matrix.
This method assumes that our displayed image will be twice as large as the original image, since each pixel is
represented by a 2 2 array. (Actually, there are ways to adjust dithering so it works with images of the same
size, but the visual effects are not as good as the error-diffusion method below.)
If the image and display sizes are the same, the most popular method for halftone approximation is called error
diffusion. Here is the idea. When we approximate the intensity of a pixel, we generate some approximation
Lecture Notes 123 CMSC 427
0.00.2 0.20.4 0.40.6 0.60.8 0.81.0
Fig. 97: Halftone approximation with dither patterns.
error. If we create the same error at every pixel (as can happen with dithering) then the overall image will suffer.
We should keep track of these errors, and use later pixels to correct for them.
Consider for example, that we a drawing a 1-dimensional image with a constant gray tone of 1/3 on a black
and white display. We would round the rst pixel to 0 (black), and incur an error of +1/3. The next pixel will
have gray tone 1/3 which we add the previous error of 1/3 to get 2/3. We round this to the next pixel value
of 1 (white). The new accumulated error is 1/3. We add this to the next pixel to get 0, which we draw as
0 (black), and the nal error is 0. After this the process repeats. Thus, to achieve a 1/3 tone, we generate the
pattern 010010010010 . . ., as desired.
We can apply this to 2-dimensional images as well, but we should spread the errors out in both dimensions.
Nearby pixels should be given most of the error and further away pixels be given less. Furthermore, it is
advantageous to distribute the errors in a somewhat random way to avoid annoying visual effects (such as
diagonal lines or unusual bit patterns). The Floyd-Steinberg method distributed errors as follows. Let (x, y)
denote the current pixel.
Right: 7/16 of the error to (x + 1, y).
Below left: 3/16 of the error to (x 1, y 1).
Below: 5/16 of the error to (x, y 1).
Below right: 1/16 of the error to (x + 1, y 1).
Thus, let S[x][y] denote the shade of pixel (x, y). To draw S[x][y] we round it to the nearest available shade K
and set err = S[x][y] K. Then we compensate by adjusting the surrounding shades, e.g. S[x + 1][y]+ =
(7/16)err.
There is no strong mathematical explanation (that I know of) for these magic constants. Experience shows that
this produces fairly good results without annoying artifacts. The disadvantages of the Floyd-Steinberg method
is that it is a serial algorithm (thus it is not possible to determine the intensity of a single pixel in isolation), and
that the error diffusion can sometimes general ghost features at slight displacements from the original.
The Floyd-Steinberg idea can be generalized to colored images as well. Rather than thinking of shades as simple
scalar values, lets think of them as vectors in a 3-dimensional RGB space. First, a set of representative colors is
chosen from the image (either from a xed color palette set, or by inspection of the image for common trends).
These can be viewed as a set of, say 256, points in this 3-dimensional space. Next each pixel is rounded to
the nearest representative color. This is done by dening a distance function in 3-dimensional RGB space and
nding the nearest neighbor among the representatives points. The difference of the pixel and its representative
is a 3-dimensional error vector which is then propagated to neighboring pixels as in the 2-dimensional case.
Lecture 29: 3-d Rotation and Quaternions
Rotation and Orientation in 3-Space: One of the trickier problems 3-d geometry is that of parameterizing rotations
and the orientation of frames. We have introduced the notion of orientation before (e.g., clockwise or counter-
clockwise). Here we mean the term in a somewhat different sense, as a directional position in space. Describing
Lecture Notes 124 CMSC 427
Fig. 98: Floyd-Steinberg Algorithm (Source: Poulb ere and Bousquet, 1999).
Lecture Notes 125 CMSC 427
and managing rotations in 3-space is a somewhat more difcult task (at least conceptually), compared with the
relative simplicity of rotations in the plane.
Why do we care about rotations? Suppose that you are an animation programmer for a computer graphics studio.
The object that you are animating is to be moved smoothly from one location to another. If the object is in the
same directional orientation before and after, we can just translate from one location to the other. If not, we need
to nd a way of interpolating between its two orientations. This usually involves rotations in 3-space. But how
should these rotations be performed so that the animation looks natural? Another example is one in which the
world is stationary, but the camera is moving from one location and viewing situation to another. Again, how
can we move smoothly and naturally from one to the other?
Since smoothly interpolating positions by translation is pretty easy to understand, let us ignore the issue of
position, and just focus on orientations and rotations about the origin. Let F denote the standard coordinate
frame, and consider another orthonormal frame G. We want some way to represent G concisely, relative to F,
and generally to interpolate a motion from F to G (see Fig. 99).
y
z
x
F
G
u
w
v
Fig. 99: Moving between frames.
Of course, we could just represent F and G by their three orthonormal basis vectors. But if we were to try to in-
terpolate (linearly) between corresponding pairs of basis vectors, the intermediate vectors would not necessarily
be orthonormal.
We will explore two methods for dealing with rotation, Euler angles and quaternions.
Euler Angles: Leonard Euler was a famous mathematician who lived in the 18th century. He proved many important
theorems in geometry, algebra, and number theory, and he is credited as the inventor of graph theory. Among his
many theorems is one that states that the composition any number of rotations in three-space can be expressed as
a single rotation in 3-space about an appropriately chosen vector. Euler also showed that any rotation in 3-space
could be broken down into exactly three rotations, one about each of the coordinate axes.
Suppose that you are a pilot, such that the x-axis points to your left, the y-axis points ahead of you, and the
z-axis points up (see Fig. 100). Then a rotation about the x-axis, denoted by , is called the pitch. A rotation
about the y-axis, denoted by , is called roll. A rotation about the z-axis, denoted by , is called yaw. Eulers
theorem states that any position in space can be expressed by composing three such rotations, for an appropriate
choice of (, , ).
x
y
z
x
y
z
x
y
z
Pitch Roll Yaw
denote the projection of w onto the yz-coordinate plane. First rotate about the x-axis, until the
vector w
coincides with the z-axis (see Fig. 101(a)). Call this angle . The original vector w will now lie
on the xz-coordinate plane.
Roll: Next, rotate about the y-axis (thus keeping the xz-coordinate plane xed) until w coincides with the z-
axis (see Fig. 101(b)). Call this angle . Afterwards, the two vectors u and v (being orthogonal to w) must
now lie on the xy-plane.
Yaw: Finally, we rotate about the z-axis until u coincides with the x-axis (see Fig. 101(c)). Call this angle .
At this point we have w = z and u = x. Assuming that G is orthonormal and right-handed, it follows that
v = y.
Thus, by these three rotations (, , ), one about each of the axes, we can bring G into alignment with F.
x
z
y
w
w
x
z
y
w
x
z = w
y
u
x = u
z = w
y = v
(a) (b) (c) (d)
Fig. 101: Rotating a frame to coincide with the standard frame.
In summary, we have established Eulers theorem. A change in orientation between any two orthonormal frames
can be accomplished with three rotations, one about each of the coordinate axes. Hence, such a transformation
can be represented by a triple of three angles, (, , ). These dene a general rotation matrix, by composing
the three basic rotations:
R(, , ) = R
z
()R
y
()R
x
().
(As usual, recall that, because we post-multiply vectors times matrices, it follows that the x-rotation is performed
rst, followed by the y-rotation, and the z-rotation.) Thus, these three angles are the Euler angles for the rotation
that aligns G with F.
Interpolating using Euler angles: Now, given two orientations in space, say given by the Euler angles E
1
= (
1
,
1
,
1
)
and E
2
= (
2
,
2
,
2
), we can interpolate between then, say by taking convex combinations. Given any
[0, 1], we can dene
R() = R((1 )E
1
+E
2
),
for example. As varies from 0 to 1, this will smoothly rotate from one orientation to the other. In practice, this
approach works fairly well if the two orientations are very close to each other. It should be noted, however, that
the interplations resulting from Euler angles are unintuitive, and in particular, Euler-angle interpolation does not
follow the shortest path between two rotations. (This is remedied by quaternions, which will discuss later.)
Shortcomings of Euler angles: There are some problems with Euler angles. One issue is the fact that this represen-
tation depends on the choice of coordinate system. In the plane, a 30 degree rotation is the same, no matter
what direction the axes are pointing (as long as they are orthonormal and right-handed). However, the result of
an Euler-angle rotation depends very much on the choice of the coordinate frame and on the order in which the
axes are named. (Later, we will see that quaternions do provide such an intrinsic system.)
Lecture Notes 127 CMSC 427
Another problem with Euler angles is called gimbal lock. Whenever we rotate about one axis, it is possible that
we could bring the other two axes into alignment with each other. (This happens, for example if we rotate x by
90
.) This causes problems because the other two axes no longer rotate independently of each other, and we
effectively lose one degree of freedom. Gimbal lock as induced by one ordering of the axes can be avoided by
changing the order in which the rotations are performed. But, this is rather messy, and it would be nice to have
a system that is free of this problem.
Angular Displacement: Let us next consider an approach to rotation that is invariant under rigid changes of the
coordinate system. This will eventually lead us to the concept of a quaternion.
In contrast to Euler angles, a more intrinsic way to express rotations (about the origin) in 3-space is in terms of
two quantities, (, u), consisting of an angle , and an axis of rotation u. Lets consider how we might do this.
First consider a vector v to be rotated. Let us assume that u is of unit length.
Our goal is to describe the rotation of a vector v as a function of and u. Let R(v) denote this rotated vector
(see Fig. 102(a)). In order to derive this, we begin by decomposing v as the sum of its components that are
parallel to and orthogonal to u, respectively.
v
= (u v)u v
= v v
= v (u v)u.
u
w
v
R(v)
R(v
)
w
(a) (b)
Top view
Fig. 102: Angular displacement.
Note that v
= u (v v
) = (u v) (u v
) = u v.
The last step follows from the fact that u and v
.
Now, consider the plane spanned by v
) = (cos )v
+ (sin )w.
From this and the fact that R(v
) = v
, we have
R(v) = R(v
) +R(v
)
= v
+ (cos )v
+ (sin )w
= (u v)u + (cos )(v (u v)u) + (sin )w
= (cos )v + (1 cos )u(u v) + (sin )(u v).
In summary, we have the following formula expressing the effect of the rotation of vector v by angle about a
rotation axis u:
R(v) = (cos )v + (1 cos )u(u v) + (sin )(u v). (1)
Lecture Notes 128 CMSC 427
This expression is the image of v under the rotation. Notice that, unlike Euler angles, this is expressed entirely
in terms of intrinsic geometric functions (such as dot and cross product), which do not depend on the choice of
coordinate frame. This is a major advantage of this approach over Euler angles.
Quaternions: We will now delve into a subject, which at rst may seem quite unrelated. But keep the above expres-
sion in mind, since it will reappear in most surprising way. This story begins in the early 19th century, when
the great mathematician William Rowan Hamilton was searching for a generalization of the complex number
system.
Imaginary numbers can be thought of as linear combinations of two basis elements, 1 and i, which satisfy the
multiplication rules 1
2
= 1, i
2
= 1 and 1 i = i 1 = i. (The interpretation of i =
1 arises from
the second rule.) A complex number a + bi can be thought of as a vector in 2-dimensional space (a, b). Two
important concepts with complex numbers are the modulus, which is dened to be
a
2
+b
2
, and the conjugate,
which is dened to be (a, b). In vector terms, the modulus is just the length of the vector and the conjugate
is just a vertical reection about the x-axis. If a complex number is of modulus 1, then it can be expressed as
(cos , sin ). Thus, there is a connection between complex numbers and 2-dimensional rotations. Also, observe
that, given such a unit modulus complex number, its conjugate is (cos , sin ) = (cos(), sin()). Thus,
taking the conjugate is something like negating the associated angle.
Hamilton was wondering whether this idea could be extended to three dimensional space. You might reason
that, to go from 2D to 3D, you need to replace the single imaginary quantity i with two imaginary quantities,
say i and j. Unfortunately, this this idea does not work. After years of work, Hamilton came up with the idea
of, rather than using two imaginaries, instead using three imaginaries i, j, and k, which behave as follows:
i
2
= j
2
= k
2
= ijk = 1 ij = k, jk = i, ki = j.
Combining these, it follows that ji = k, kj = i and ik = j. The skew symmetry of multiplication (e.g.,
ij = ji) was actually a major leap, since multiplication systems up to that time had been commutative.)
Hamilton dened a quaternion to be a generalized complex number of the form
q = q
0
+q
1
i +q
2
j +q
3
k.
Thus, a quaternion can be viewed as a 4-dimensional vector q = (q
0
, q
1
, q
2
, q
3
). The rst quantity is a scalar,
and the last three dene a 3-dimensional vector, and so it is a bit more intuitive to express this as q = (s, u),
where s = q
0
is a scalar and u = (q
1
, q
2
, q
3
) is a vector in 3-space. We can dene the same concepts as we did
with complex numbers:
Conjugate: q
= (s, u).
Modulus: [q[ =
_
q
2
0
+q
2
1
+q
2
2
+q
2
3
=
_
s
2
+ (u u).
Unit Quaternion: q is said to be a unit quaternion if [q[ = 1.
Quaternion multiplication: Consider two quaternions q = (s, u) and p = (t, v):
q = (s, u) = s +u
x
i +u
y
j +u
z
k
p = (t, v) = t +v
x
i +v
y
j +v
z
k.
If we multiply these two together, well get lots of cross-product terms, such as (u
x
i)(v
y
j), but we can simplify
these by using Hamiltons rules. That is, (u
x
i)(v
y
j) = u
x
v
h
(ij) = u
x
v
h
k. If we do this, simplify, and collect
common terms, we get a very messy formula involving 16 different terms (see the appendix at the end of this
lecture). The formula can be expressed somewhat succinctly in the following form:
qp = (st (u v), sv +tu +u v).
Note that the above expression is in the quaternion scalar-vector form. The rst term st (u v) evaluates to a
scalar (recalling that the dot product returns a scalar), and the second term (sv + tu + u v) is a sum of three
vectors, and so is a vector. It can be shown that quaternion multiplication is associative, but not commutative.
Lecture Notes 129 CMSC 427
Quaternion multiplication and 3-dimensional rotation: Before considering rotations, we rst dene a pure quater-
nion to be one with a 0 scalar component
p = (0, v).
Any quaternion of nonzero magnitude has a multiplicative inverse, which is dened to be
q
1
=
1
[q[
2
q
.
(To see why this works, try multiplying qq
1
, and see what you get.) Observe that if q is a unit quaternion,
then it follows that q
1
= q
.
As you might have guessed, our objective will be to show that there is a relation between rotating vectors
and multiplying quaternions. In order apply this insight, we need to rst show how to represent rotations as
quaternions and 3-dimensional vectors as quaternions. After a bit of experimentation, the following does the
trick:
Vector: Given a vector v = (v
x
, v
y
, v
z
) to be rotated, we will represent it by the pure quaternion (0, v).
Rotation: To represent a rotation by angle about a unit vector u, you might think, well use the scalar part
to represent and the vector part to represent u. Unfortunately, this doesnt quite work. After a bit
of experimentation, you will discover that the right way to encode this rotation is with the quaternion
q = (cos(/2), (sin(/2))u). (You might wonder, why we do we use /2, rather than . The reason, as
we shall see below, is that this is what works.)
Rotation Operator: Given a vector v represented by the quaternion p = (0, v) and a rotation represented by a
unit quaternion q, we dene the rotation operator to be:
R
q
(p) = qpq
1
= qpq
.
(The last equality results from the fact that q
1
= q
for a unit
quaternion q = (s, u). Given p = (0, v), by expanding the rotation operator denition and simplifying we
obtain:
R
q
(p) = (0, (s
2
(u u))v + 2u(u v) + 2s(u v)). (2)
(We leave the derivation as an exercise, but a few nontrivial facts regarding dot products and cross products need
to applied.)
Let us see if we can express this in a more suggestive form. Since q is of unit magnitude, we can express it as
q =
_
cos
2
,
_
sin
2
_
u
_
, where |u| = 1.
Plugging this into the above expression and applying some standard trigonometric identities, we obtain
R
q
(p) =
_
0,
_
cos
2
2
sin
2
2
_
v + 2
_
sin
2
2
_
u(u v) + 2 cos
2
sin
2
(u v)
_
= (0, (cos )v + (1 cos )u(u v) + sin (u v)).
Now, recall the rotation displacement equation presented earlier in the lecture. The vector part of this quaternion
is identical, implying that the quaternion rotation operator achieves the desired rotation.
Lecture Notes 130 CMSC 427
Example: Consider the 3-d rotation shown in Fig. 103. This rotation can be achieved by performing a rotation about
the y-axis by = 90 degrees. Thus = /2, and u = (0, 1, 0). Thus the quaternion that encodes this
rotation is
q = (cos(/2), (sin(/2))u) =
_
cos
_
4
_
, sin
_
4
_
(0, 1, 0)
_
=
_
1
2
,
_
0,
1
2
, 0
__
.
y
x
z
Fig. 103: Rotation example.
Let us consider how the x-unit vector v = (1, 0, 0)
T
is transformed under this rotation. To reduce this to a
quaternion operation, we encode v as a pure quaternion p = (0, v) = (0, (1, 0, 0)). We then apply the rotation
operator, and so by Eq. (2) we have
R
q
(p) = (0, (1/2 1/2)(1, 0, 0) + 2(0, 1, 0)0 + (2/
2)((0, 1/
2, 0) (1, 0, 0)))
= (0, (0, 0, 0) + (0, 0, 0) + (1)(0, 0, 1))
= (0, (0, 0, 1)).
Thus p is mapped to a point on the z-axis, as expected.
Composing Rotations: We have shown that each unit quaternion corresponds to a rotation in 3-space. This is an
elegant representation, but can we manipulate rotations through quaternion operations? The answer is yes. In
particular, the action of multiplying two unit quaternions results in another unit quaternion. Furthermore, the
resulting product quaternion corresponds to the composition of the two rotations. In particular, given two unit
quaternions q and q
= q
q. That is,
R
q
R
q
= R
q
where q
= q
q.
This follows from the associativity of quaternion multiplication, and the fact that (qq
)
1
= q
1
q
1
, as shown
below.
R
q
(R
q
(p)) = q
(qpq
1
)q
1
= (q
q)p(q
1
q
1
)
= (q
q)p(qq
)
1
= q
pq
1
= R
q
(p).
Lerp, nlerp, and slerp: (No, these are not cartoon characters nor a new drink at 7-Eleven.) An important question is,
given two unit quaternions q
0
and q
1
, how can we smoothly interpolate between them? We need this in order
to perform smooth animations involving rotating objects.
We have already learned about one means of interpolation, called linear interpolation (or lerp for short). Given
two points p
0
and p
1
, for any , where 0 1, we can express a point between p
0
and p
1
as the afne
combination
lerp(p
0
, p
1
; ) = (1 )p
0
+p
1
.
Lecture Notes 131 CMSC 427
As ranges from 0 to 1, the value of lerp(p
0
, p
1
; ) varies linearly from p
0
to p
1
(see Fig. 104).
As the name suggests, this interpolates between p
0
and p
1
along a straight line between the two points. In some
cases, however, these may be points on a sphere, and rather than have the interpolation pass through the interior
of the sphere, we would like to interpolation to follow the shortest path along the surface of the sphere. To do
this, we need a different sort of interpolation, a spherical interpolation.
u
1
u
0
lerp
1
2
3
4 u
1
u
0
nlerp
1
2
3
4
1
4
u
1
u
0
slerp
1
2
3
4
1
4
1
4
Fig. 104: Lerp, nlerp, and slerp.
To develop the notion of a spherical interpolation, suppose that u
0
and u
1
are two unit vectors, which we can
think of points on a unit sphere. There are two ways to perform a spherical interpolation. The rst starts with a
linear interpolation. Since such an interpolation passes through the interior of the sphere, in order to get the point
to lie on the sphere, we simply normalize it to unit length. The result is called a normalized linear interpolation
(or nlerp for short). Here is the formula.
nlerp(p
0
, p
1
; ) = normalize(lerp(p
0
, p
1
; ) = normalize((1 )p
0
+p
1
).
Recall that the function normalize(u) simply divides u by its length, u/|u|, thus always producing a unit
vector.
The nlerp produces very reasonable results when the amount of rotation is small. It has one defect, however, in
that, if the two points are far apart from each other (e.g., one close to the north pole and one close to the south
pole) then the motion is not constant along the sphere. It moves much more rapidly in the middle of the arc than
at the ends (see Fig. 104). To x this, we need a truly spherical approach to interpolation. This is called the
spherical interpolation (or slerp for short). To dene it, let = arccos(u
0
u
1
) denote the angle between the
unit vectors u
0
and u
1
. For 0 1, we dene
slerp(p
0
, p
1
; ) =
sin(1 )
sin
u
0
+
sin
sin
u
1
.
Although, it is not obvious why this works, this produces a path along the sphere that has constant velocity from
u
0
to u
1
, as varies from 0 to 1 (see Fig. 104).
Interpolating Quaternions: When interplating rotations represented as unit quaternions, we can use either nlerp or
slerp. The nlerp denition for quaternions is identical to the one given above (it is just applied in 4-dimensional
space).
When slerping between quaternions, however, we need to be aware of one issue. In particular, recall that when
we encode a rotation by angle as a quaternion, the scalar and vector components are multiplied by cos(/2)
and sin(/2), respectively. This implies that, if we have two unit quaternions q
0
and q
1
, which involve a relative
rotation angle of between them, then the 4-dimensional dot product (q
0
q
1
) produces the arccosine of /2,
not . For this reason, it is recommended that, if (q
0
q
1
) < 0 (implying that ) then replace q
1
with q
1
.
(Note that q
1
and q
1
are equivalent rotations.)
Rather than implementing your own quaternion slerp, I would suggest downloading code from the web to do
this. There are computationally more efcient versions of slerp, which are not directly based on the above
formula.
Lecture Notes 132 CMSC 427
Matrices and Quaternions: Quaternions provide a very elegant way of representing rotations in 3-space. Returning
to the problem of interpolating smoothly between two orientations, we can see that we can describe the before
and after orientations of any object by two quaternions, q and p. Then, to interpolate smoothly between these
two orientations, we spherically interpolate between q and p in quaternion space.
However, once we have a quaternion representation, we need a way to inform our graphics API (like OpenGL)
about the actual transformation. In particular, given a unit quaternion
q =
_
cos
2
, sin
2
u
_
= (s, (u
x
, u
y
, u
z
)),
what is the corresponding afne transformation (expressed as a rotation matrix). By simply expanding the
denition of of R
q
(p), it is not hard to show that the following (homogeneous) matrix is equivalent
_
_
_
_
1 2u
2
y
2u
2
z
2u
x
u
y
2su
z
2u
x
u
z
+ 2su
y
0
2u
x
u
y
+ 2su
z
1 2u
2
x
2u
2
z
2u
y
u
z
2su
x
0
2u
x
u
z
2su
y
2u
y
u
z
+ 2su
x
1 2u
2
x
2u
2
y
0
0 0 0 1
_
_
_
_
.
Thus, given your quaternion interpolant, you apply this rotation by invoking glMultMatrix(), and all sub-
sequently drawn points will be rotated in accordance with the quaternion.
Quaternion Summary: In summary, quaternions are a generalization of the concept of complex numbers, which can
be used to represent rotations in three dimensional space. Unlike Euler angles, quaternions are independent of
the coordinate system. Also, they do not suffer from the problem of gimbal lock. Thus, from a mathematical
perspective, they represent a much cleaner system for representing rotations.
Quaternions can be used to represent the rotation (orientation) of an object in 3-dimensional space.
A rotation by a given angle about a unit vector u can be represented by the unit quaternion q =
(cos(/2), (sin(/2))u).
A vector v is represented by the pure quaternion p = (0, v).
The effect of applying this rotation to v is given by R
q
(p) = qpq
, the product q
.
Given two rotation quaternions q
0
and q
1
, it is possible to interpolate smoothly between them using either
the nlerp (normalized linear interpolation) or slerp (spherical interpolation).
Appendix (Deriving quaternion multiplication): Earlier, we stated that, given two quaternions:
q = (s, u) = s +u
x
i +u
y
j +u
z
k
p = (t, v) = t +v
x
i +v
y
j +v
z
k,
their product is given by
qp = (st (u v), sv +tu +u v).
In this appendix, we derive this fact. Although, this is just a rather tedious exercise in algebra, it is nice to see
that everything follows from the basic laws that Hamilton discovered for the multiplication of the quaternion
imaginaries i, j and k. (And perhaps it explains why Hamilton was so excited when he realized that he got it
right!)
First, let us recall that, given two vectors u and v, the denitions of the dot and cross products are:
(u v) = u
x
v
x
+u
y
v
y
+u
z
v
z
u v = (u
y
v
z
u
z
v
y
, u
z
v
x
u
x
v
z
, u
x
v
y
u
y
v
x
)
T
.
Lecture Notes 133 CMSC 427
Given this, lets start by multiplying q and p. Multiplication between scalars and imaginaries is commutative
(si = is) but multiplication between imaginaries is not (ij ,= ji), so we need to be careful to preserve the order
of the imaginary quantities.
qp = (s +u
x
i +u
y
j +u
z
k)(t +v
x
i +v
y
j +v
z
k)
= (st +sv
x
i +sv
y
j +sv
z
k) + (u
x
ti +u
x
v
x
i
2
+u
x
v
y
ij +u
x
v
z
ik) +
(u
y
tj +u
y
v
x
ji +u
y
v
y
j
2
+u
y
v
z
jk) + (u
z
tk +u
z
v
x
ki +u
z
v
y
kj +u
z
v
z
k
2
).
Next, let us apply the multiplication rules for the imaginary quantities. Recall that
i
2
= j
2
= k
2
= 1 ij = (ji) = k, jk = (kj) = i, ki = (ik) = j.
Using these, we can express the product as
qp = (st +sv
x
i +sv
y
j +sv
z
k) + (u
x
ti u
x
v
x
+u
x
v
y
k u
x
v
z
j) +
(u
y
tj u
y
v
x
k u
y
v
y
+u
y
v
z
i) + (u
z
tk +u
z
v
x
j u
z
v
y
i u
z
v
z
).
Now, let us collect common terms based on i, j, and k.
qp = (st u
x
v
x
u
y
v
y
u
z
v
z
) +
(sv
x
+u
x
t +u
y
v
z
u
z
v
y
)i +
(sv
y
+u
y
t u
x
v
z
+u
z
v
x
)j +
(sv
z
+u
z
t +u
x
v
y
u
y
v
x
)k.
Observe that right-hand side above is a valid quaternion. The rst (scalar) term can be expressed more succintly
as st (u v). The other three terms share a common structure. If we think of them as the components of the
three-element vector whose components are the i, j, and k terms, respectively, they can be simplied to
s
_
_
v
x
v
y
v
z
_
_
+t
_
_
u
x
u
y
u
z
_
_
+
_
_
+u
y
v
z
u
z
v
y
u
x
v
z
+u
z
v
x
+u
x
v
y
u
y
v
x
_
_
= sv +tu +u v.
Finally, if we put the scalar and vector parts together, we obtain
qp = (st (u v), sv +tu +u v),
just as desired. (Whew!)
Lecture 30: Physically-Based Modeling
Physical Modeling: Traditionally, computer graphics was just about producing images from geometric models. Over
recent years, the eld has grown to encompass comptuational aspects of many other areas. The need for un-
derstanding physics grew from a need to produce realistic renderings of physical phenomena, such as moving
clouds, owing water, and breaking glass.
The good news to programmers is that there exist software systems that provide basic physical simulations.
However, in order to use these systems, it is necessary to understand the basic elements of physics. It is also
important to understand a bit about how these systems work, in order to understand what tasks they perform
well, and what tasks they struggle with.
Basic Physics Concepts: Let us begin by discussing the basic elements of physics, which every graphics programmer
needs to know.
Lecture Notes 134 CMSC 427
Kinematics: This is the study of motion (ignoring forces). For example, it considers questions like: How does
acceleration affect velocity? How does velocity affect position? There are two common models that are
considered in kinematics:
Particles: This involves the motions of point masses. In particular, body rotation is ignored and only
translation is considered. At rst this may seem rather restrictive, but many complex phenomena,
such as dust and water, can be modeled by simulating the motion of the massive number of individual
particles.
Rigid bodies: For objects that are not points, the rotation of the body needs to be considered.
Force: Understanding forces and the effects they have on objects is central to physical modeling. Objects
change their motion only when forces are applied. There are a number of different types of forces (see
Fig. 105):
contact force torque eld force (gravity) envir. force (bouyancy)
Fig. 105: Types of forces.
Contact forces and Torques: Contact forces arise when one or more objects collide with each other, such
as a bowling ball striking a bowling pin. Torques refer to a special designation of contact forces that
induce rotation, such as turning a steering wheel.
Field forces: These are forces like gravity or magnetism, which act without any explicit contact occuring.
Environmental forces: These include forces that are induced by the medium (air or water) in which the
object resides or the surface on which the object is placed. These include friction, buoyancy (in water
or air), and drag and lift for airplanes.
Kinetics: (also called Dynamics) In contrast to kinematics, which considers motion independent of force, kine-
matics explains how forces effect motion. The study of kinetics can be further decomposed into the nature
of the object on which the kinetics is being applied:
Rigid Bodies: This means that a body moves (translates and rotates) as a single unit (e.g., a rock hurdling
through the air).
Non-rigid Objects: These are bodies composed of multiple parts that move semi-independently, although
the movement of one part may inuence the movement of other parts. Examples include jointed-
assemblies (like the joints and bones making up the human body), mass-spring systems (like the
cloths that make up a fabric), hair and rope, water and soft plastics.
Physical Properties: Basic physics is about how forces affect the motion of objects. Forces induce acceleration,
acceleration changes velocity, and velocity dictates motion. The manner in which forces affect an object depends
on simple properties of the object. For a rigid body, the following quantities are important.
Mass: This is a scalar quantity, which describes the amount of stuff there is to an object. More practically,
mass indicates the degree to which an object resists a change in its velocity. This is sometimes refered to
as an objects inertial mass.
Formally, letting B be our body, we can dene the mass by integrating the total volume of an object times
its density per unit area. Let dV = dx dy dz denote a differential volume element (an innitesimally
small cube at the point (x, y, z)) and let us assume that the object is of unit density. Letting m denote
mass, we have
m =
_
B
dV.
Lecture Notes 135 CMSC 427
Center of mass (or center of gravity): This is a point about which all rotations occurs (assuming that the ob-
ject is not tied down to anything). Formally, it is dened to be the point c = (c
x
, c
y
, c
z
)
T
, where
c
x
=
1
m
_
B
xdV, c
y
=
1
m
_
B
y dV, c
z
=
1
m
_
B
z dV.
Observe that this is just the continuous equivalent of computing the average x-, y- and z- coordinates of
the bodys mass. Note that the center of mass need not lie within the object (see Fig. 106). For example,
the center of mass of a hoop is the center of the hoop.
center of mass high moment of inertia low moment of inertia
Fig. 106: Center of mass and moment of inertia.
Moment of Inertia: This is a scalar quantity, which corresponds to mass, but for torquing motions. In particu-
lar, in indicates how much an object resists a change in its angular velocity relative to a given rotation axis.
Assuming for simplicity that the rotation is about the z-axis, we dene the moment of inertia to be
I =
_
B
(x
2
+y
2
)dV.
For example, a hoop has a higher moment of inertia than a spherical lump of equal mass, since most of its
mass is far from its center of mass (see Fig. 106). Given the same torque, the hoop will spin more slowly
than the compact lump. (There is also a more complex physical quantity, called the inertial tensor, which
encodes the moment of inertia for all possible rotation axes, but we will not discuss this.)
Physical State: Physical properties remain constant throughout an objects lifetime. There are also a number of
physical properties that vary with time. These are refered to as the objects physical state. The physical state of
a particle is described by two values:
Position: This is a point p = (p
x
, p
y
, p
z
)
T
that indicates the particles location in space. (For rigid bodies, it is
typically the location of the objects center of mass.)
Velocity: This is the derivative of position with respect to time, expressed as a vector v = (v
x
, v
y
, v
z
)
T
.
Note that position and velocity are time dependent. When we want to emphasize position and velocity at a
particular time t, we will write these as p(t) and v(t), respectively.
When dealing with rigid bodies, we also need to consider rotation. We add the following two additional ele-
ments:
Angular position: Angular position, which we will denote by q, indicates the amount of rotation relative to
an initial (default) position. It can be expressed in a number of ways (e.g., Euler angles, rotation matrix,
or unit quaternion). We will assume that it is represented as a unit quaternion q = (s, u). Recall that a
rotation by angle about axis u can be expressed as the unit quaternion
q =
_
cos
2
,
_
sin
2
_
u
_
, where |u| = 1.
Lecture Notes 136 CMSC 427
Angular velocity: The angular velocity, denoted by , is typically represented as a 3-dimensional vector. The
direction indicates the axis of rotation, and the length of the vector is the speed of rotation, given, say, in
radians per second.
As above, rotation and angular velocity are functions of time, and so may be expressed as q(t) and (t) to
emphasize this dependence.
Physical Simulation and Kinematics: Simple physical simulations are performed by a process called integration.
Intuitively, this involves updating the state of each physical object in your environment through a small time step.
Collisions are rst detected and contact and torque forces are computed. These forces result in accelerations
that change the linear and angular velocities of objects. Finally, velocities change positions.
Assuming that the current state of a rigid body at time t is given by its position p(t), its velocity v(t), its angular
position q(t), and its angular velocity (t), the process of integration involves updating these quantities over a
small time interval t. We can express this as
[p(t), v(t), q(t), (t)] [p(t + t), v(t + t), q(t + t), (t + t)] .
How this is done is the subject of kinematics, that is, the study of how quantities such as position, velocity, and
acceleration interact. By denition v(t) is the instantaneous change of position over time. This may also be
expressed as dp/dt or p(t). Similarly, the angular velocity (t) is the instantaneous change of angular position
q(t) over time, which is often denoted by dq/dt or q(t). Assuming that the velocities v(t) and (t) have been
computed (from the known forces) we can then update the position and angular position as follows for a small
time step t as:
p(t + t) p(t) + p(t)t = p(t) +v(t)t
q(t + t) q(t) + q(t)t = q(t) +(t)t.
Let us discuss rotation in a bit more detail. Let us assume that the angular position is expressed as a quaternion
q(t), and the angular velocity is expressed as a vector (t) = (
x
(t),
y
(t),
z
(t))
T
(as described above). In
order to update the angular position, we need to compute the derivative of the quaternion, assuming that the
object is rotating with angular velocity (t). Consulting a reference on quaternions, we nd that, in order to
compute the desired quaternion derivative, we rst compute the pure quaternion whose vector part is (t), that
is, let w(t) = (0, (t)). The derivative, which we denote by q(t), is
q(t) =
1
2
w(t) q(t),
where the denotes quaternion multiplication.
Since rotation quaternions are required to be of unit length, we should normalize the result of applying the
derivative in order to avoid it drifting away from unit length. Thus, we have the following rule for updating the
angular position of a rotating object:
q(t + t) normalize (q(t) + q(t)t) .
Physics for the Programming Project: The programming project involves physics in a number of ways. Each phys-
ical event can be viewed as a force, which serves to modify an objects velocity (linear and/or angular).
User Impulses: Through keyboard input, the user can cause objects to move by applying various impulse
forces. These forces have the effect of changing the current linear velocity by adding some 3-dimensional
vector to the current velocity. Letting v denote the current object velocity and v
i
denote the additional
impulse velocity, we have:
v v +v
i
.
Lecture Notes 137 CMSC 427
Gravity and Friction: The force of gravity decreases the vertical component of the objects linear velocity.
Friction and air resistance decreases the magnitude of the linear velocity, without changing its direction.
Air resistance can also cause an objects angular velocity to decrease slowly over time. Let t denote the
elapsed small time interval, let g be a constant related to gravity, and let
1
be a small positive constant
related to friction and air resistance. We have
v
z
(1 g t)v
z
(if the object is above the ground)
v (1
1
) t v (friction or air resistance).
Rolling: Assuming that we have a spherical body that rolls along the ground. As it rolls (assuming no slippage),
it also rotates. Suppose that the body is moving horizontally with linear velocity v, and assume that the
up direction is the z-axis. Then, the axis of rotation passes through the center of mass and is directed to
the objects left. (See Fig. 107.)
v
z
r
Fig. 107: Rolling rotation.
For a given linear velocity, the bodys angular velocity varies inversely with its radius. (That is, a small
radius wheel must spin faster to maintain the same speed as a large radius wheel.) To achieve this, let r
denote the bodys radius. The angular velocity (in radians per second) is set to
1
r
(z v).
Note that, this is only applied when the object is rolling on the ground. If the body ceases to have contact
with the ground, its angular velocity should remain constant (or possibly slow due to air resistance).
Collisions: When a collision occurs, we need to consider the impact of the body in terms of both translation
and rotation. For simplicity, let us assume that all other objects in the scene are xed, so that hitting an
object is like hitting a wall. All of the force of the impact is directed back to the moving body. Assuming
that the moving body is a sphere, whenever a collision occurs, we need to know the bodys velocity vector
v and the vector u from the center of the body to the point of contact (see Fig. 108(a)). Since the body is
a sphere, the vector u is also the normal vector to the point of contact. There are two types of collision
response to consider, translational and rotational.
Translational response: To determine the collision response, we decompose the vector v into two com-
ponents, v = v
+ v
, where v
is parallel to u and v
=
(u v)
(u u)
u and v
= v v
.
To compute the effect of the collision on the object, the obstacle exerts an impulse to the object in the
direction v
,
Lecture Notes 138 CMSC 427
v
u
(a) (b) (c)
v
v
v
v
v
new
= v
u u
v
u
(d)
v
r
Fig. 108: Geometry of collisions.
where 0 < 1 (called the coefcient of restitution) is a factor that takes into consideration various
physical issues, such as the elasticity of the collision (see Fig. 108(c)).
Rotational response: Let u, v
, and v
u
|u|
_
.
This assumes that the objects rotation is determined entirely by the collision. More generally, there
may be slippage, and the rotation may be some linear combination of this rotation and its rotation
prior to the collision.
Updating the Positions: Once the new velocities v and have been computed, we can update position
and angular position. To do this, let w be the quaternion (0, ), and dene q =
1
2
w q (using
quaternion multiplication). Then we set:
p p + t v q normalize (q + qt) .
Rendering: Finally, to render the object we use the following conceptual ordering. First draw the object,
next apply the quaternion rotation. You can either use the rotation matrix (from the quaternion lecture)
with glMultMatrix or you can extract the rotation angle and rotation axis from the quaternion and apply
glRotatef. Finally apply the translation p using glTranslatef. As always, this order will be reversed and
nested within in a matrix push-pop pair.
Lecture 31: 3-D Modeling: Constructive Solid Geometry
Solid Object Representations: We begin discussion of 3-dimensional object models. There is an important funda-
mental split in the question of how objects are to be represented. Two common choices are between repre-
senting the 2-dimensional boundary of the object, called a boundary representation or B-rep for short, and a
volume-based representation, which is sometimes called CSG for constructive solid geometry. Both have their
advantages and disadvantages.
Lecture Notes 139 CMSC 427
Volume Based Representations: One of the most popular volume-based representations is constructive solid geome-
try, or CSG for short. It is widely used in manufacturing applications. One of the most intuitive ways to describe
complex objects, especially those arising in manufacturing applications, is as set of boolean operations (that
is, set union, intersection, difference) applied to a basic set of primitive objects. Manufacturing is an important
application of computer graphics, and manufactured parts made by various milling and drilling operations can
be described most naturally in this way. For example, consider the object shown in the gure below. It can be
described as a rectangular block, minus the central rectangular notch, minus two cylindrical holes, and union
with the rectangular block on the upper right side.
=
+
Fig. 109: Constructive Solid Geometry.
This idea naturally leads to a tree representation of the object, where the leaves of the tree are certain primitive
object types (rectangular blocks, cylinders, cones, spheres, etc.) and the internal nodes of the tree are boolean
operations, union (X Y ), intersection (X Y ), difference (X Y ), etc. For example, the object above might
be described with a tree of the following sort. (In the gure we have used + for union.)
+
Fig. 110: CSG Tree.
The primitive objects stored in the leaf nodes are represented in terms of a primitive object type (block, cylinder,
sphere, etc.) and a set of dening parameters (location, orientation, lengths, radii, etc.) to dene the location
and shape of the primitive. The nodes of the tree are also labeled by transformation matrices, indicating the
transformation to be applied to the object prior to applying the operation. By storing both the transformation
and its inverse, as we traverse the tree we can convert coordinates from the world coordinates (at the root of the
tree) to the appropriate local coordinate systems in each of the subtrees.
This method is called constructive solid geometry (CSG) and the tree representation is called a CSG tree. One
nice aspect to CSG and this hierarchical representation is that once a complex part has been designed it can
be reused by replicating the tree representing that object. (Or if we share subtrees we get a representation as a
directed acyclic graph or DAG.)
Point membership: CSG trees are examples of unevaluated models. For example, unlike a B-rep representation in
which each individual element of the representation describes a feature that we know is a part of the object,
it is generally impossible to infer from any one part of the CSG tree whether a point is inside, outside, or on
the boundary of the object. As a ridiculous example, consider a CSG tree of a thousand nodes, whose root
Lecture Notes 140 CMSC 427
operation is the subtraction of a box large enough to enclose the entire object. The resulting object is the empty
set! However, you could not infer this fact from any local information in the data structure.
Consider the simple membership question: Given a point P does P lie inside, outside, or on the boundary of an
object described by a CSG tree. How would you write an algorithm to solve this problem? For simplicity, let
us assume that we will ignore the case when the point lies on the boundary (although we will see that this is a
tricky issue below).
The idea is to design the program recursively, solving the problem on the subtrees rst, and then combining
results from the subtrees to determine the result at the parent. We will write a procedure isMember(Point P,
CSGnode T) where P is the point, and T is pointer to a node in the CSG tree. This procedure returns True if the
object dened by the subtree rooted at T contains P and False otherwise. If T is an internal node, let T.left and
T.right denote the children of T. The algorithm breaks down into the following cases.
Membership Test for CSG Tree
bool isMember(Point P, CSGnode T) {
if (T.isLeaf)
return (membership test appropriate to Ts type)
else if (T.isUnion)
return isMember(P, T.left || isMember(P, T.right)
else if (T.isIntersect)
return isMember(P, T.left && isMember(P, T.right)
else if (T.isDifference)
return isMember(P, T.left && !isMember(P, T.right)
}
Note that the semantics of operations [[ and && avoid making recursive calls when they are not needed. For
example, in the case of union, if P lies in the right subtree, then the left subtree need not be searched.
CSG and Ray Tracing: CSG objects can be handled very naturally in ray tracing. Suppose that R is a ray, and T is
a CSG tree. The intersection of the ray with any CSG object can be described as a (possibly empty) sorted set
of intervals in the parameter space.
I = [t
0
, t
1
], [t
2
, t
3
], . . ..
(See Fig. 111.) This means that we intersect the object whenever t
0
t t1 and t
2
t t
3
, and so on. At
the leaf level, the set of intervals is either empty (if the ray misses the object) or is a single interval (if it hits).
Now, we evaluate the CSG tree through a post-order traversal, working from the leaves up to the root. Suppose
that we are at a union node v and we have the results from the left child I
L
and the right child I
R
.
We compute the union of these two sets of intervals. This is done by rst sorting the endpoints of the intervals.
With each interval endpoint we indicate whether this is an entry or exit. Then we traverse this sorted list. We
maintain a depth counter, which is initialized to zero, and is incremented whenever we enter an interval and
decremented when we exit an interval. Whenever this count transitions from 0 to 1, we output the endpoint as
the start of a new interval in the union, and whenever the depth count transitions from 1 to 0, we output the
resulting count as the endpoint of an interval. An example is shown in Fig. 111. (A similar procedure applies
for intersection and difference. As an exercise, determine the depth count transitions that mark the start and end
of each interval.) The resulting set of sorted intervals is then associated with this node. When we arrive at the
root, we select the smallest interval enpoint whose t-value is positive.
Regularized boolean operations: There is a tricky issue in dealing with boolean operations. This goes back to a the
same tricky issue that arose in polygon lling, what to do about object boundaries. Consider the intersection
A B shown in Fig. 112. The result contains a dangling piece that has no width. That is, it is locally
two-dimensional.
Lecture Notes 141 CMSC 427
t
0
s
1
t
2
s
2
s
3
t
3
t
0
t
1
t
2
t
3
s
0
s
1
s
2
s
3
P
u
I
L
I
R
0 1 2 1 0 1 0 1 0
Union
depth count
t
0 t
1
t
2
t
3
Fig. 111: Ray tracing in a CSG Tree.
(a) (b) (c)
A
B
Fig. 112: (a) A and B, (b) A B, (c) A
B.
These low-dimensional parts can result from boolean operations, and are usually unwanted. For this reason, it
is common to modify the notion of a boolean operation to perform a regularization step. Given a 3-dimensional
set A, the regularization of A, denoted A
= closure(int(A)).
Note that int(A) does not contain the dangling element, and then its closure adds back the boundary.
When performing operations in CSG trees, we assume that the operations are all regularized, meaning that the
resulting objects are regularized after the operation is performed.
A op
B = closure(int(A op B)).
where op is either , , or . Eliminating these dangling elements tends to complicate CSG algorithms, because
it requires a bit more care in how geometric intersections are represented.
Lecture 32: Fractals
Fractals: One of the most important aspects of any graphics system is how objects are modeled. Most man-made
(manufactured) objects are fairly simple to describe, largely because the plans for these objects are be designed
Lecture Notes 142 CMSC 427
manufacturable. However, objects in nature (e.g. mountainous terrains, plants, and clouds) are often much
more complex. These objects are characterized by a nonsmooth, chaotic behavior. The mathematical area of
fractals was created largely to better understand these complex structures.
One of the early investigations into fractals was a paper written on the length of the coastline of Scotland. The
contention was that the coastline was so jagged that its length seemed to constantly increase as the length of
your measuring device (mile-stick, yard-stick, etc.) got smaller. Eventually, this phenomenon was identied
mathematically by the concept of the fractal dimension. The other phenomenon that characterizes fractals is self
similarity, which means that features of the object seem to reappear in numerous places but with smaller and
smaller size.
In nature, self similarity does not occur exactly, but there is often a type of statistical self similarity, where
features at different levels exhibit similar statistical characteristics, but at different scales.
Iterated Function Systems and Attractors: One of the examples of fractals arising in mathematics involves sets
called attractors. The idea is to consider some function of space and to see where points are mapped under
this function. There are many ways of dening functions of the plane or 3-space. One way that is popular with
mathematicians is to consider the complex plane. Each coordinate (a, b) in this space is associated with the
complex number a +bi, where i =
a
2
+b
2
. This is a generalization of the notion of absolute value with reals. Observe that the numbers of given
xed modulus just form a circle centered around the origin in the complex plane.
Now, consider any complex number z. If we repeatedly square this number,
z z
2
,
then the number will tend to fall towards zero if its modulus is less than 1, it will tend to grow to innity if its
modulus is greater than 1. And numbers with modulus 1 will stay at modulus 1. In this case, the set of points
with modulus 1 is said to be an attractor of this iterated function system (IFS).
In general, given any iterated function system in the complex plane, the attractor set is a subset of nonzero
points that remain xed under the mapping. This may also be called the xed-point set of the system. Note that
it is the set as a whole that is xed, even though the individual points tend to move around. (See Fig. 113.)
Julia Sets: Suppose we modify the complex function so that instead of simply squaring the point we apply the iterated
function
z z
2
+c
where c is any complex constant. Now as before, under this function, some points will tend toward and others
towards nite numbers. However there will be a set of points that will tend toward neither. Altogether these
latter points form the attractor of the function system. This is called the Julia set for the point c. An example
for c = 0.62 0.44i is shown in Fig. 114.
A common method for approximately rendering Julia sets is to iterate the function until the modulus of the
number exceeds some prespecied threshold. If the number diverges, then we display one color, and otherwise
we display another color. How many iterations? It really depends on the desired precision. Points that are far
from the boundary of the attractor will diverge quickly. Points that very close, but just outside the boundary may
take much longer to diverge. Consequently, the longer you iterate, the more accurate your image will be.
Lecture Notes 143 CMSC 427
Attractor Set
Fig. 113: Attractor set for an interated function system.
Fig. 114: A Julia Set.
Lecture Notes 144 CMSC 427
The Mandelbrot Set: For some values of c the Julia set forms a connected set of points in the complex plane. For
others it is not. For each point c in the complex plane, if we color it black if Julia(c) is connected, and color
it white otherwise, we will a picture like the one shown below. This set is called the Mandelbrot set. (See
Fig. 115.)
One way of approximating whether a complex point d is in the Mandelbrot set is to start with z = (0, 0) and
successively iterate the function z z
2
+ d, a large number of times. If after a large number of iterations the
modulus exceeds some threshold, then the point is considered to be outside the Mandelbrot set, and otherwise
it is inside the Mandelbrot set. As before, the number of iterations will generally determine the accuracy of the
drawing.
Fig. 115: The Mandelbrot Set.
Fractal Dimension: One of the important elements that characterizes fractals is the notion of fractal dimension.
Fractal sets behave strangely in the sense that they do not seem to be 1-, 2-, or 3-dimensional sets, but seem to
have noninteger dimensionality.
What do we mean by the dimension of a set of points in space? Intuitively, we know that a point is zero-
dimensional, a line is one-dimensional, and plane is two-dimensional and so on. If you put the object into a
higher dimensional space (e.g. a line in 5-space) it does not change its dimensionality. If you continuously
deform an object (e.g. deform a line into a circle or a plane into a sphere) it does not change its dimensionality.
How do you determine the dimension of an object? There are various methods. Here is one, which is called
fractal dimension. Suppose we have a set in d-dimensional space. Dene a d-dimensional -ball to the interior of
a d-dimensional sphere of radius . An -ball is an open set (it does not contain its boundary) but for the purposes
of dening fractal dimension this will not matter much. In fact it will simplify matters (without changing the
denitions below) if we think of an -ball to be a solid d-dimensional hypercube whose side length is 2 (an
-square).
The dimension of an object depends intuitively on how the number of balls its takes to cover the object varies
with . First consider the case of a line segment. Suppose that we have covered the line segment, with -balls,
and found that it takes some number of these balls to cover to segment. Suppose we cut the size of the balls
exactly by 1/2. Now how many balls will it take? It will take roughly twice as many to cover the same area.
(Note, this does not depend on the dimension in which the line segment resides, just the line segment itself.)
More generally, if we reduce the ball radius by a factor of 1/a, it will take roughly a times as many balls to
cover the segment.
On the other hand, suppose we have covered a planar region with -balls. Now, suppose we cut the radius by
1/2. How many balls will it take? It will take 4 times as many. Or in general, if we reduce the ball by a radius
of 1/a it will take roughly a
2
times as many balls to cover the same planar region. Similarly, one can see that
with a 3-dimensional object, reducing by a factor of 1/2 will require 8 times as many, or a
3
.
This suggests that the nature of a d-dimensional object is that the number of balls of radius that are needed to
cover this object grows as (1/)
d
. To make this formal, given an object A in d-dimensional space, dene
N(A, ) = smallest number of -balls needed to cover A.
Lecture Notes 145 CMSC 427
It will not be necessary to the absolute minimum number, as long as we do not use more than a constant factor
times the minimum number. We claim that an object A has dimension d if N(A, ) grows as C(1/)
d
, for some
constant C. This applies in the limit, as tends to 0. How do we extract this value of d? Observe that if we
compute ln N(A, ) (any base logarithm will work) we get ln C + d ln(1/). As tends to zero, the constant
term C remains the same, and the d ln(1/) becomes dominant. If we divide this expression by ln(1/) we will
extract the d.
Thus we dene the fractal dimension of an object to be
d = lim
0
ln N(A, )
ln(1/)
.
Formally, an object is said to be a fractal if it is self-similar (at different scales) and it has a noninteger fractal
dimension.
Now suppose we try to apply this to fractal object. Consider rst the Sierpinski triangle, dened as the limit of
the following process. (See Fig. 116.)
Fig. 116: The Sierpinski triangle.
How many -balls does it take to cover this gure. It takes one 1-square to cover it, three (1/2)-balls, nine
(1/4)-balls, and in general 3
k
, (1/2
k
)-balls to cover it. Letting = 1/2
k
, we nd that the fractal dimension of
the Sierpinski triangle is
D = lim
0
ln N(A, )
ln(1/)
= lim
k
ln N(A, (1/2
k
))
ln(1/(1/2
k
))
= lim
k
ln 3
k
ln 2
k
= lim
k
k ln 3
k ln 2
= lim
k
ln 3
ln 2
=
ln 3
ln 2
1.58496 . . . .
Thus although the Sierpinski triangle resides in 2-dimensional space, it is essentially a 1.58 dimensional ob-
ject, with respect to fractal dimension. Although this denition is general, it is sometimes easier to apply the
following formula for fractals made through repeated subdivision. Suppose we form an object by repeatedly
replacing each piece of length x by b nonoverlapping pieces of length x/a each. Then it follows that the
fractal dimension will be
D =
ln b
ln a
.
Lecture Notes 146 CMSC 427
As another example, consider the limit of the process shown in Fig. 116. The area of the object does not change,
and it follows that the fractal dimension of the interior is the same as a square, which is 2 (since the balls that
cover the square could be rearranged to cover the object, more or less). However, if we consider the boundary,
observe that with each iteration we replace one segment of length x with 4 subsegments each of length
2/4.
It follows that the fractal dimension of the boundary is
ln 4
ln(4/
2)
= 1.3333 . . . .
The the shape is not a fractal (by our denition), but its boundary is.
Fig. 117: An object with a fractal boundary.
Lecture 33: Ray Tracing: Triangle Intersection
Ray-Triangle Intersection: Suppose that we wish to intersect a ray with a polyhedral object. There are two standard
approaches to this problem. The rst works only for convex polyhedra. In this method, we represent a polyhe-
dron as the intersection of a set of halfspaces. In this case, we can easily modify the 2-d line segment clipping
algorithm presented in Lecture 9 to perform clipping against these halfspaces. We will leave this as an exercise.
The other method involves representing the polyhedron by a set of polygonal faces, and intersecting the ray with
these polygons. We will consider this approach here.
There are two tasks which are needed for ray-polygon intersection tests. The rst is to extract the equation of
the (innite) plane that supports the polygon, and determine where the ray intersects this plane. The second step
is to determine whether the intersection occurs within the bounds of the actual polygon. This can be done in a
2-step process. We will consider a slightly different method, which does this all in one step.
P
P+tu
w
Q
0
w
Q
2
Q
1
1
2
u
Fig. 118: Ray-triangle intersection.
Let us rst consider how to extract the plane containing one of these polygons. In general, a plane in 3-space
can be represented by a quadruple of coefcients (a, b, c, d), such that a point P = (p
x
, p
y
, p
z
) lies on the plane
if and only if
ap
x
+bp
y
+cp
z
+d = 0.
Note that the quadruple (a, b, c, d) behaves much like a point represented in homogeneous coordinates, because
any scalar multiple yields the same equation. Thus (a/d, b/d, c/d, 1) would give the same equation (provided
that d ,= 0).
Lecture Notes 147 CMSC 427
Given any three (noncollinear) vertices of the polygon, we can compute these coefcients by solving a set of
three linear equations. Such a system will be underdetermined (3 equations and 4 unknowns) but we can nd a
unique solution by adding a fourth normalizing equation, e.g. a +b +c +d = 1. We can also represent a plane
by giving a normal vector n and a point on the plane Q. In this case (a, b, c) will just be the coordinates of n
and we can derive d from the fact that
aq
x
+bq
y
+cq
z
+d = 0.
To determine the value of t where the ray intersect the plane, we could plug the rays parametric representation
into this equation and simply solve for t. If the ray is represented by P +tu, then we have the equation
a(p
x
+tu
x
) +b(p
y
+tu
y
) +c(p
z
+tu
z
) +d = 0.
Soving for t we have
t =
ap
x
+bp
y
+cp
z
au
x
+bu
y
+cu
z
.
Note that the denominator is 0 if the ray is parallel to the plane. We may simply assume that the ray does not
intersect the polygon in this case (ignoring the highly unlikely case where the ray hits the polygon along its
edge). Once the intersection value t
that lies on this triangle can be described by a convex combination of these points
Q
=
0
Q
0
+
1
Q
1
+
2
Q
2
,
where
i
0 and
i
= 1. From the fact that the
i
s sum to 1, we can set
0
= 1
1
2
and do a little
algebra to get
Q
= Q
0
+
1
(Q
1
Q
0
) +
2
(Q
2
Q
0
),
where
i
0 and
1
+
2
1. Let
w
1
= Q
1
Q
0
, w
2
= Q
2
Q
0
,
giving us the following
Q
= Q
0
+
1
w
1
+
2
w
2
.
Recall that our ray is given by P +tu for t > 0. We want to know whether there is a point Q
yielding
P +tu = Q
0
+
1
w
1
+
2
w
2
P Q
0
= tu +
1
w
1
+
2
w
2
.
Let w
P
= P Q
0
. This is an equation, where t,
1
and
2
are unknown (scalar) values, and the other values
are all 3-element vectors. Hence this is a system of three equations with three unknowns. We can write this as
_
u
w
1
w
2
_
_
_
t
2
_
_
=
_
w
P
_
.
To determine t,
1
and
2
, we need only solve this system of equations. Let M denote the 3 3 matrix whose
columns are u, w
1
and w
2
. We can do this by computing the inverse matrix M
1
and then we have
_
_
t
2
_
_
= M
1
_
w
P
_
.
Lecture Notes 148 CMSC 427
There are a number of things that can happen at this point. First, it may be that the matrix is singular (i.e., its
columns are not linearly independent) and no inverse exists. This happens if
2
. If either is negative then there is no intersection and if
1
+
2
> 1 then there is no intersection.
Normal Vector: In addition to computing the intersection of the ray with the object, it is also desirable to compute
the normal vector at the point of intersection. In the case of the triangle, this can be done by computing the cross
product
n = normalize((Q
1
Q
0
) (Q
2
Q
0
)) = normalize( w
1
w
2
).
But which direction should we take for the normal, n or n? This depends on which side of the triangle the
ray arrives. The normal should be directed opposite to the directional ray of the vector. Thus, if n u > 0, then
negate n.
Lecture 34: Ray Tracing Bezier Surfaces
Issues in Ray Tracing: Today we consider a number of miscellaneous issues in the ray tracing process.
Ray and B ezier Surface Intersection: Let us consider a more complex but more realistic ray intersection problem,
namely that of intersecting a ray with a B ezier surface. One possible approach would be to derive an implicit
representation of innite algebraic surface on which the B ezier patch resides, and then determine whether the
ray hits the portion of this innite surface corresponding to the patch. This leads to a very complex algebraic
task.
A simpler approach is based on using circle-ray and triangle-ray intersection tests (which we have already
discussed) and the deCasteljau procedure for subdividing B ezier surfaces. The idea is to construct a simple
enclosing shape for the curve, which we will use as a lter, to rule out clear misses. Let us describe the process
for a B ezier curve, and we will leave the generalization to surfaces as an exercise.
What enclosing shape shall we use? We could use the convex hull of the control points. (Recall the convex hull
property, which states that a B ezier curve or surface is contained within the convex hull of its control points.)
However, computing convex hulls, especially in 3-space, is a tricky computation.
We will instead apply a simpler test, by nding an enclosing circle for the curve. We do this by rst computing
a center point C for the curve. This can be done, for example, by computing the centroid of the control points.
(That is, the average of the all the point coordinates.) Alternatively, we could take the midpoint between the
rst and last control points. Given the center point C, we then compute the distance from each control point and
C. Let d
max
denote the largest such distance. The circle with center C and radius d
max
encloses all the control
points, and hence it encloses the convex hull of the control points, and hence it encloses the entire curve. We
test the ray for intersection with this circle. An example is shown in Fig. 119.
d
max
refine
ray ray
refine
ray
C
Fig. 119: Ray tracing B ezier curves through ltering and subdivision.
Lecture Notes 149 CMSC 427
If it does not hit the circle, then we may safely say that it does not hit the B ezier curve. If the ray does hit the
circle, it still may miss the curve. Here we apply the deCasteljau algorithm to subdivide the Bezier curve into
two Bezier subcurves. Then we apply the ray intersection algorithm recursively to the two subcurves. (Using
the same circle lter.) If both return misses, then we miss. If either or both returns a hit, then we take the closer
of the two hits. We need some way to keep this recursive procedure from looping innitely. To do so, we need
some sort of stopping criterion. Here are a few possibilities:
Fixed level decomposition: Fix an integer k, and decompose the curve to a depth of k levels (resulting in 2
k
)
subcurves in all. This is certainly simple, but not a very efcient approach. It does not consider the shape
of the curve or its distance from the viewer.
Decompose until at: For each subcurve, we can compute some function that measures how at, that is, close
to linear, the curve is. For example, this might be done by considering the ratio of the length of the line
segment between the rst and last control points and distance of the furthest control point from this line.
At this point we reduce the ray intersection to line segment intersection problem.
Decompose to pixel width: We continue to subdivide the curve until each subcurve, when projected back to
the viewing window, overlaps a region of less than one pixel. Clearly it is unnecessary to continue to
subdivide such a curve. This solves the crack problem (since cracks are smaller than a pixel) but may
produce an unnecessarily high subdivision for nearly at curves. Also notice that this the notion of back
projection is easy to implement for rays emanating from the eye, but this is much harder to determine for
reection or refracted rays.
Lecture Notes 150 CMSC 427