COS3712-Summary
COS3712-Summary
COS3712-Summary
Chapter 1
Computer graphics is concerned with all aspects of producing pictures or images using a computer.
The application of computer graphics can be divided into four, possibly overlapping, areas:
1. Display of information: Some examples include:
• Maps are used to display celestial and geographical information.
• Statistical plots are generated to aid the viewer in determining information in a set
of data.
• Medical imaging technologies, such as CT, MRI, ultrasound, and PET.
2. Design: Professions such as engineering and architecture use computer-aided design (CAD)
tools to create interactive technical drawings.
3. Simulation and animation: Graphical flight simulators have proved both to increase safety
and to reduce training expenses. Computer graphics are also used for animation in the
television, motion-pictures, and advertising industries. Virtual reality (VR) technology allows
the viewer to act as part of a computer-generated scene.
4. User interfaces: Our interaction with computers has become dominated by a visual
paradigm that includes windows, icons, menus, and a pointing device, such as a mouse.
In a graphics program, we can obtain the measure (what the device returns) of an input device in
three distinct modes:
1. Request mode: The measure of the device is not returned to the program until the device is
triggered. A trigger of a device is a physical input on the device with which the user can
signal the computer.
2. Sample-mode: As soon as the function call of a function that expects device input is
encountered, the measure is returned.
3. Event-mode: Each time that a device is triggered, an event is generated and the device
measure, including the identifier for that device, is placed in an event queue. Periodically the
queue is polled, and for each (if any) event in the queue, the program can look at the event’s
type and then decide what to do.
Both request- and sample-mode input APIs require that the user identify which device is to provide
the input, and, are thus not sufficient for modern computing environments.
Page 1 of 38
Summary Interactive Computer Graphics
Raster based graphics system: The image we see on the output device is an array – the raster – of
pixels produced by the graphics system. Each pixel corresponds to a unique location in the image.
Collectively, the pixels are stored in a part of memory called the frame buffer. Resolution refers to
the number of pixels in the frame buffer, and it determines the detail that you can see in the image.
The depth, or precision of the frame buffer, defined as the number of bits used per pixel, determines
properties such as how many colours can be represented on a given system. In full-colour (also
known as true-colour or RGB-colour) systems, there is at least 24 bit per pixel.
The frame buffer is actually a collection of buffers: Colour buffers hold the coloured pixels that are
displayed; depth buffers hold information needed for creating images from three-dimensional data;
and other special purpose buffers, such as accumulation buffers, etc..
Rasterization (or scan conversion) is the process of converting geometric entities to pixel colours and
locations in the frame buffer.
Non-interlaced display: The pixels are displayed row by row, or scan line by scan line at the refresh
rate. I.e. all the rows are refreshed.
Interlaced display: Odd rows and even rows are refreshed alternatively.
Two basic entities must be part of any image-formation process, be it mathematical or physical:
• Object: It exists in space independent of any image-formation process and of any viewer. We
define a synthetic object by specifying the positions in space, called vertices, of various
geometric primitives (points, lines, and polygons) that, when put together, approximate the
object. CAD systems make it easy for a user to build synthetic objects.
• Viewer: It is the viewer that forms the image of our objects. Viewers placed at different
positions, will see different images of the same object.
Projection (image formation): The process by which the specification of an object is combined with
the specification of a viewer to produce a two-dimensional image.
Visible light has wavelengths in the range of 350 to 780 nanometres (nm). Distinct frequencies
within this range are visible as distinct colours. Wavelengths in the middle of the range, around 520
nm, are seen as green; those near 450 nm are seen as blue; and those near 650 nm are seen as red.
Light sources can emit light either as a set of discrete frequencies or continuously. A particular light
source is characterized by the intensity of light that it emits at each frequency and by that light’s
directionality. An ideal point source emits energy from a single location at one or more frequencies
equally in all directions.
Light from the source strikes various surfaces of an object; the details of the interaction between
light and the surfaces of the object determine how much light is reflected, and hence the colour(s) of
the object as perceived by the viewer.
Page 2 of 38
Summary Interactive Computer Graphics
Infinite depth of field: Every point within the field of view is in focus.
Projection plane
COP
Projector
Clipping window
Page 3 of 38
Summary Interactive Computer Graphics
If we are to follow the synthetic-camera model, we need functions in the API to specify the
following:
• Objects: The geometry of an object is usually defined by sets of vertices.
• A viewer: We can define a viewer or camera by specifying a number of parameters,
including: position, orientation, focal length, and the size of the projection plane.
• Light sources: Light sources are defined by their location, strength, colour, and directionality.
• Material properties: Material properties are characteristics, or attributes, of the objects, and
such properties are specified through a series of function calls at the time that each object is
defined.
Wire-frame image: Only the edges of polygons (outline) are rendered using line segments.
The modelling-rendering paradigm: The modelling of the scene is separated from the production of
the image, or the rendering of the scene. Thus, we might implement the modeller and the renderer
with different software and hardware. This paradigm has become popular as a method for
generating computer games and images over the Internet. Models, including the geometric objects,
lights, cameras, and material properties, are placed in a data structure called a scene graph that is
passed to a renderer or game engine.
Display processors: The earliest attempts to build special-purpose graphics systems were concerned
primarily with relieving the general-purpose computer from the task of refreshing the display
continuously. The instructions to generate the image could be assembled once in the host and sent
to the display processor where they were stored in the display processor’s local memory. Thus, the
host is freed for other tasks while the display processor continuously refreshes the display.
Page 4 of 38
Summary Interactive Computer Graphics
The graphics pipeline: We start with a (possibly enormous) set of vertices which defines the
geometry of the scene. We must process all these vertices in a similar manner to form an image in
the frame buffer. There are four major steps in the imaging process:
1. Vertex processing: Each vertex is processed independently. The two major functions of this
block are to carry out coordinate transformations and to compute a colour for each vertex.
Per-vertex lighting calculations can be performed in this box.
2. Clipping and primitive assembly: Sets of vertices are assembled into primitives, such as line
segments and polygons, before clipping can take place. In the synthetic camera model, a
clipping volume represents the field of view of an optical system. The projections of objects
in this volume appear in the image; those that are outside do not (clipped out); and those
that straddle the edges of the clipping volume are partly visible. Clipping must be done on a
primitive-by-primitive basis rather than on a vertex-by-vertex basis. The output of this stage
is a set of primitives whose projections should appear in the image.
3. Rasterization: The primitives that emerge from the clipper are still represented in terms of
their vertices and must be converted to pixels in the frame buffer. The output of the
rasterizer is a set of fragments for each primitive. A fragment can be thought of as a
potential pixel that carries with it information, including its colour, location, and depth.
4. Fragment processing: It takes the fragments generated by the rasterizer and updates the
pixels in the frame buffer. Hidden-surface removal, texture mapping, bump mapping, and
alpha blending can be applied here. Per-fragment lighting calculations can also be
performed in this box.
Page 5 of 38
Summary Interactive Computer Graphics
Retained mode graphics: We compute all the geometric data first and store it in some data
structure. We then display the scene by sending all the stored data to the graphics processor at
once. This approach avoids the overhead of sending small amounts of data to the graphics processor
for each vertex we generate, but at the cost of having to store all the data. Because the data are
stored, we can redisplay the scene, by resending the stored data without having to regenerating it.
Current GPUs allow us to store the generated data directly on the GPU, thus avoiding the bottleneck
caused by transferring the data from the CPU to the GPU each time we wish to redisplay the scene.
In WebGL terms:
• A vertex is a position in space; we use two-, three- and four-dimensional spaces in computer
graphics. We use vertices to specify the atomic geometric primitives that are recognized by
our graphics system.
• A point is the simplest geometric primitive, and is usually specified by a single vertex.
Clip coordinate system: Can be visualized as a cube centred at the origin whose diagonal goes from
(-1, -1, -1) to (1, 1, 1). Objects outside this cube will be eliminated, or clipped, and cannot appear on
the display. The vertex shader uses transformations to convert geometric data specified in some
coordinate system to a representation in clip coordinates and outputs this information to the
rasterizer.
Page 6 of 38
Summary Interactive Computer Graphics
3. Viewing functions: Allow us to specify various views. WebGL does not provide any viewing
functions, but relies on the use of transformations in the shaders to provide the desired
view.
4. Transformation functions: Allow us to carry out transformations of objects, such as rotation,
translation, and scaling. In WebGL, we carry out transformations by forming transformation
matrices in our applications, and then applying then either in the application or in the
shaders.
5. Input functions: Deals with input devices.
6. Control functions: Enable us to communicate with the window system, to initialize our
programs, and to deal with any errors that take place during the execution of our programs.
7. Query functions: Allow us to obtain information about the operating environment, camera
parameters, values in the frame buffer, etc.
We can think of the entire graphics system as a state machine. Applications provide input that
change the state of the machine or cause the machine to produce a visible output. From the
perspective of the API, graphics functions are of two types: those that specify primitives that flow
through a pipeline inside the state machine and those that either change the state inside the
machine or return state information. One important consequence of the state machine view is that
most parameters are persistent; their values remain unchanged until we explicitly change them.
WebGL functions are in a single library called GL. Shaders are written in the WebGL Shading
Language (GLSL), which has a separate specification from WebGL, although the functions to interface
the shaders with the application are part of the WebGL API. To interface with the window system
and to get input from external devices into our programs, we need to use some other library (GLX
for X Window System, wgl for Windows, agl for Macintosh, GLUT (WebGL Utility Toolkit) is a simple
cross-platform library). The WebGL Extension Wrangler (GLEW) library is used with cross-platform
libraries, such a GLUT, to removes operating system dependencies. WebGL makes heavy use of
defined constants to increase code readability and avoid the use of magic numbers. Functions that
transfer data to the shaders have the following notation:
glSomeFunction*();
where the * can be interpreted as either nt or ntv, where n signifies the number of dimensions (1,
2, 3, 4, or Matrix); t denotes the data type, such as integer (i), float (f), or double (d); and v, if
present, indicates that the variables are specified through a pointer to an array, rather than through
an argument list.
The units used to specify vertex positions in the application program are referred to as vertex
coordinates, object coordinates, or world coordinates, and can be arbitrarily chosen by the
programmer to suite the application. Units on the display device are called window coordinates,
screen coordinates, physical-device coordinates or just device coordinates. At some point, the values
in vertex coordinates must be mapped to window coordinates, but this is automatically done by the
graphics system as part of the rendering process. The user needs to specify only a few parameters.
This allows for device-independent graphics; freeing application programmers from worrying about
the details of input and output devices.
Page 7 of 38
Summary Interactive Computer Graphics
We can separate primitives into two classes: geometric primitives and image, or raster, primitives.
The basic WebGL geometric primitives are specified by sets of vertices. All WebGL geometric
primitives are variants of points, line segments, and triangular polygons.
• Points (GL_POINTS): A point can be displayed as a single pixel or a small group of pixels.
Use glPointSize() to set the current point size (in pixels).
• Lines: Use glLineWidth() the set the current line width (in pixels).
o Line segments (GL_LINES): Successive pairs of vertices are interpreted as the
endpoints of individual line segments.
o Line strip or polyline (GL_LINE_STRIP): Successive vertices are connected.
o Line loop (GL_LINE_LOOP): Successive vertices are connected, and a line segment
is drawn from the final vertex to the first, thus creating a closed path.
• Polygons (triangles): Use glPolygonMode() to tell the renderer to generate only the
edges or just points for the vertices, instead of fill (the default).
o Triangles (GL_TRIANGLES): Each successive group of three vertices specifies a new
triangle.
o Triangle strip (GL_TRIANGLE_STRIP): Each additional vertex is combined with
the previous two vertices to define a new triangle.
o Triangle fan (GL_TRIANGLE_FAN): It is based on one fixed point. The next two
points determine the first triangle, and subsequent triangles are formed from one
new point, the previous point, and the first (fixed) point.
Page 8 of 38
Summary Interactive Computer Graphics
Triangulation is the process of approximating a general geometric object by subdividing it into a set
of triangles. Every set of vertices can be triangulated. Triangulation is a special case of the more
general problem of tessellation, which divides a polygon into a polygonal mesh, not all of which need
be triangles.
Attributes are properties that describe how an object should be rendered. Available attributes
depend on the type of object. For example, line segments can have colour, thickness, and pattern
(solid, dashed, or dotted).
2.5 Colour
Additive colour model: The three primary colours (Red, Green, Blue) add together to give the
perceived colour. (CRT monitors and projectors are examples of additive colour systems).With
additive colour, primaries add light to an initially black display, yielding the desired colour.
Subtractive colour model: Here we start with a white surface, such as a sheet of paper. Coloured
pigments remove colour components from light that is striking the surface. If we assume that white
light hits the surface, a particular point will appear red if all components of the incoming light are
absorbed by the surface except for wavelengths in the red part of the spectrum, which is reflected.
In subtractive systems, the primaries are usually the complementary colours: cyan, magenta, and
yellow (CMY). Industrial printers are examples of subtractive colour systems.
Colour cube: We can view a colour as a point in a colour solid. We draw the
solid using a coordinate system corresponding to the three primaries. The
distance along a coordinate axis represents the amount of the
corresponding primary in the colour. We can represent any colour that we
can produce with this set of primaries as a point in the cube.
In a RGB system, each pixel might consist of 24 bits (3 bytes): 1 byte for each of red, green, and blue.
The specification of RGB colours is based on the colour cube. Thus, specify colour components as
numbers between 0.0 and 1.0, where 1.0 denotes the maximum (or saturated value) of the
corresponding primary and 0.0 denotes a zero value of that primary. RGBA is an extension of the
RGB model, where the fourth colour (A, or alpha) is treated by WebGL as either an opacity or
transparency value. Transparency and opacity are complements of each other: an opaque object lets
no light through, while a transparent object passes all light. Opacity values range from 0.0 (fully
transparent) to 1.0 (fully opaque). Alpha blending is disabled by default.
Page 9 of 38
Summary Interactive Computer Graphics
Indexed colour: Early graphics systems had frame buffers that were limited in depth: for example,
each pixel was only 8 bits deep. Instead of subdividing a pixel’s bits into groups, and treat them as
RGB values (which will result in a very restricted set of colours), with Indexed colours the limited-
depth pixel is interpreted as an integer value which index into a colour-lookup table. The user
program can fill the entries (rows) of the table with the desired colours. A Problem with indexed
colours is that when we work with dynamic images that must be shaded, usually we need more
colours than are provided by colour-index mode. Historically, colour-index mode was important
because it required less memory for the frame buffer; however, the cost and density of memory is
no longer an issue.
2.6 Viewing
A fundamental concept that emerges from the synthetic-camera model is that the specification of
the objects in our scene is completely independent of our specification of the camera.
The simplest and WebGL’s default view is the orthographic projection. All projectors are parallel, and
the centre of projection is replaced by a direction of projection. Furthermore, all projectors are
perpendicular (orthogonal) to the projection plane. The orthographic projection takes a point
(𝑥, 𝑦, 𝑧) and projects it into the point (𝑥, 𝑦, 0).
The aspect ratio of a rectangle is the ratio of the rectangle’s width to its height. If the aspect ratio of
the viewing (clipping) rectangle, specified by camera parameters, is not the same as the aspect ratio
of the window, objects appear distorted on the screen.
A viewport is a rectangular area of the display window in which our images are rendered. By default,
it is the entire window, but it can be set to any smaller size in pixels via the function
void glViewport(GLint x, GLint y, GLsizei w, GLsizei h)
where (x, y) is the lower-left corner of the viewport (measured relative to the lower-left corner of
the window) and w and h give the width and height, respectively. For a given window, we can adjust
the height and width of the viewport to match the aspect ratio of the clipping rectangle, thus
preventing any object distortion in the image.
Events are changes that are detected by the operating system and include such actions as a user
pressing a key on the keyboard, the user clicking a mouse button or moving the mouse. When an
event occurs, it is placed in an event-queue. The event queue can be examined by an application
program or by the operating system. We can associate callback functions with specific types of
events. Event processing gives us interactive control in our programs. With GLUT, we can execute
the function glutMainLoop() to begin an event-processing loop. All our programs must have at
least a display callback function which is invoked when the application program or the operating
system determines that the graphics in a window need to be redrawn.
Page 10 of 38
Summary Interactive Computer Graphics
Include glFlush();at the end of the display callback function to ensure that all the data are
rendered as soon as possible.
Every application, no matter how simple, must provide both a vertex- and a fragment-shader (there
are no default shaders). Each shader is a complete C-line program with main() as its entry point.
The vertex shader is executed for each vertex that is passed through the pipeline. In general, a
vertex shader will transform the representation of a vertex location from whatever coordinate
system in which it is specified to a representation in clip coordinates for the rasterizer. The fragment
shader is executed for each fragment generated by the rasterizer. At a minimum, each execution of
the fragment shader must output a colour for the fragment.
Chapter 3
Instead of calling the display callback function directly, rather invoke the glutPostRedisplay()
function which sets an internal flag indicating that the display needs to be redrawn. At the end of
each event-loop iteration, if the flag is set, the display callback is invoked and the flag is unset. This
method prevents the display from being redrawn multiple times in a single pass through the event
loop.
Two types of events are associated with the pointing device (mouse):
• Mouse events occur when one of the mouse buttons is either depressed (mouse down
event) or released (mouse up event).
• Move events are generated when the mouse is moved with one of the buttons depressed. If
the mouse is moved without a button being held down, this event is called a passive move
event.
Page 11 of 38
Summary Interactive Computer Graphics
Reshape events occur when the user resizes the window, usually by dragging a corner of the
window to a new location. This is an example of a window event. Unlike most other callbacks,
there is a default reshape callback that simply changes the viewport to the new window size.
Keyboard events can be generated when the mouse is in the window and one of the keys is
depressed or released.
The idle callback is invoked when there are no other events. A typical use of the idle callback is
to continue to generate graphical primitives through a display function while nothing else is
happening. Another is to produce an animated display.
An application program operates asynchronously from the automatic display of the contents of
the frame buffer, and can cause changes to the frame buffer at any time. Hence, a redisplay of
the frame buffer can occur while its contents are still being altered by the application and the
user will see only a partially drawn display. This distortion can be severe, especially if objects are
constantly moving around in the scene. A common solution is double-buffering. The hardware
has two frame buffers: one, called the front buffer, is the one that is displayed, the other, called
the back buffer, is then available for constructing what we would like to display next. Once the
drawing is complete, we swap the front and back buffers. We then clear the new back buffer and
can start drawing into it. With double-buffering we use glutSwapBuffers() instead of
glFlush()at the end of the display callback function.
GLUT provides pop-up menus that we can use with the mouse to create sophisticated interactive
applications. GLUT also supports hierarchical (cascading) menu entries. Using menus involves
taking a few simple steps:
• Define callback function(s) that specify the actions corresponding to each entry in the menu.
• Create a menu; register its callback; and add menu entries and/or submenus. This step must
be repeated for each submenu, and once for the top-level menu.
• Attach the top-level menu to a particular mouse button.
Page 12 of 38
Summary Interactive Computer Graphics
Page 13 of 38
Summary Interactive Computer Graphics
Because there is an affine transformation that corresponds to each change of frame, there are 4 × 4
matrices that represent the transformation from model coordinates to world coordinates and from
world coordinates to eye coordinates. These transformations usually are concatenated together into
the model-view transformation, which is specified by the model-view matrix.
Page 14 of 38
Summary Interactive Computer Graphics
We describe geometric objects through a set of vertex specifications. The data specifying the
location of the vertices (geometry) can be stored as a simple list or array - the vertex list.
Rotation is an operation that rotates points by a fixed angle about a point or line. In a right-handed
system, when we draw the 𝑥- and 𝑦-axes in the standard way, the positive 𝑧-axis comes out of the
“page”. If we look down the positive 𝑧-axis towards the origin, the positive direction of rotation
(positive angle of rotation) is counter-clockwise. This definition applies to both the 𝑥- and 𝑦-axes as
well.
Rotation and translation are known as rigid-body transformations. No combination of rotations and
translations can alter the shape or volume of an object; they can alter only the object’s location and
orientation.
Scaling is an affine non-rigid-body transformation by which we can make an object bigger or smaller.
Scaling has a fixed point: a point that is unaffected by the transformation. A negative scaling factor
gives us reflection about the fixed point, in the specific scaling direction.
• Uniform scaling: The scaling factor in all directions is identical. The shape of the scaled object
is preserved.
• Non-uniform scaling: The scaling factor of each direction need not be identical. The shape of
the scaled object is distorted.
Page 15 of 38
Summary Interactive Computer Graphics
1 0 0 𝛼𝑥
0 1 0 𝛼𝑦
𝑇�𝛼𝑥 ,𝛼𝑦 , 𝛼𝑧 � = � �
0 0 1 𝛼𝑧
0 0 0 1
Suppose that we let 𝑅 denote any of our three rotation matrices, the inverse rotation matrix is:
𝑅 −1 (𝜃) = 𝑅 (−𝜃) = 𝑅 𝑇 (𝜃)
We can construct any desired rotation matrix, with a fixed point at the origin as a product of
individual rotations about the three axes:
𝑅 = 𝑅𝑧𝑅𝑦 𝑅𝑥 .
Using the fact that the transpose of a product is the product of the transposes in the reverse order,
we see that for any rotation matrix,
𝑅 −1 = 𝑅𝑇 .
Page 16 of 38
Summary Interactive Computer Graphics
To rotate an object about an arbitrary fixed point, say 𝐩𝑓, we first translate the object such that 𝐩𝑓
coincides with the origin: 𝑇�−𝐩𝑓 �; we then apply the rotation: 𝑅 (𝜃) ; and finally move the object
back such that 𝐩𝑓 is again at its original position: 𝑇�𝐩𝑓 �. Thus, concatenating the matrices together,
we obtain the single matrix:
𝑀 = 𝑇�𝐩𝑓 �𝑅(𝜃)𝑇�−𝐩𝑓 �.
Notice the “reverse” order in which the matrices are multiplied. Matrix multiplication, in general, is
not a commutative operation, thus, the order in which we apply transformations is critical!
Objects are usually defined in their own frames, with the origin at the centre of mass and the sides
aligned with the model frame axes. To place an instance of such an object in a scene, we apply an
affine transformation – the instance transformation – to the prototype to obtain the desired size,
orientation, and location. The instance transformation is constructed in the following order: first, we
scale the object to the desired size; then we orient it with a rotation matrix; finally, we translate it to
the desired location. Hence, the instance transformation is of the form
𝑀 = 𝑇𝑅𝑆.
To rotate an object by an angle 𝜃 about an arbitrary axis, we carry out at most two rotations to align
the axis of rotation with, say the 𝑧-axis; then rotate by 𝜃 about the 𝑧-axis; and finally we undo the
two rotations that did the aligning. Thus, our final rotation matrix will be of the form:
𝑅 = 𝑅 −1 −1
𝑥 𝑅𝑦 𝑅𝑧 (𝜃)𝑅 𝑦 𝑅𝑥 .
In my opinion, the remainder of this section (except for the example) applies to older versions of
WebGL.
Page 17 of 38
Summary Interactive Computer Graphics
Chapter 5: Viewing
5.1 Classical and Computer viewing
Projectors meet at the centre of projection (COP). The COP corresponds to the centre of the lens in
the camera or in the eye, and in a computer-graphics system, it is the origin of the camera frame for
perspective views. The projection surface is a plane, and the projectors are straight lines. If we move
the COP to infinity, the projectors become parallel and the COP can be replaced by a direction of
projection (DOP). Views with a finite COP are called perspective views; views with a COP at infinity
(i.e. a DOP) are called parallel view. The class of projections produced by parallel and perspective
systems is known as planar geometric projections because the projection surface is a plane and the
projectors are lines. Both perspective and parallel projections preserve lines; they do not, in general,
preserve angles.
Classical views:
• Parallel projections:
o Orthographic projection: In all orthographic (or orthogonal) views, the projectors are
perpendicular to the projection plane. In a multi-view orthographic projection, we
make multiple projections, in each case with the projection plane parallel to one of
the principal faces of the object. The importance of this type of view is that it
preserves both distances and angles. It is well suited for working drawings.
o Axonometric projections: In axonometric views, the projectors are still orthogonal to
the projection plane, but the projection plane can have any orientation with respect
to the object.
Isometric view: The projection plane is placed symmetrically with respect to
the three principal faces that meet at a corner of a rectangular object.
Diametric view: The projection place is placed symmetrically with respect to
two of the principal faces of a rectangular object.
Trimetric view: The projection plane can have any orientation with respect
to the object (the general case).
Although parallel lines are preserved in the image, angles are not. Axonometric
views are used extensively in architecture and mechanical design.
o Oblique projections: It is the most general parallel view. We obtain an oblique
projection by allowing the projectors to make an arbitrary angle with the projection
plane. Angles in planes parallel to the projection plane are preserved.
• Perspective projections: All perspective views are characterized by diminution of size: the
farther an object is moved from the viewer, the smaller its image becomes. We cannot make
measurements from a perspective view. Hence, perspective views are used by applications
where it is important to achieve natural-looking images. The classical perspective views are
usually known as one- , two-, and three-point perspective. The one, two, and three prefixes
refer to the number of vanishing points (points at which lines of perspective meet).
Page 18 of 38
Summary Interactive Computer Graphics
Hidden-surface removal occurs after the fragment shader. Consequently, although an object might
be blocked from the camera by other objects, even with hidden-surface removal enabled, the
rasterizer will still generate fragments for blocked objects within the clipping volume.
The first method for constructing the view transformation is by concatenating a carefully selected
series of affine transformations. We can think of the camera as being fixed at the origin, pointing
down the negative 𝑧-axis. Thus, we transform (translate, rotate, etc.) the scene relative to the
camera frame. For example, if we want to move farther away from an object located directly in front
of the camera, we move the scene down the negative 𝑧-axis (i.e. translate by a negative 𝑧-value).
In the second approach, we specify the camera frame with respect to the world frame and construct
the matrix, called the view-orientation matrix, that will take us from world coordinates to camera
coordinates. In order to define the camera frame, we require three parameters to be specified:
• View-Reference Point (VRP): Specifies the location of the COP, given in world coordinates.
• View-Plane Normal (VPN): Also known as 𝑛, specifies the normal to the projection plane.
• View-up vector (VUP): Specifies what direction is up from the camera’s perspective. This
vector need not be perpendicular to 𝑛.
We project the VUP vector onto the view plane to obtain the up-direction vector, 𝑣, which is
orthogonal to 𝑛. We then use the cross product (v× 𝑛) to obtain a third orthogonal direction 𝑢. This
Page 19 of 38
Summary Interactive Computer Graphics
new orthogonal coordinate system usually is referred to as either the viewing-coordinate system or
the 𝑢-𝑣-𝑛 system.With the addition of the VRP, we have the desired camera frame.
The third method, called the look-at function, is similar to our second approach: it differs only in the
way we specify the VPN. We specify a point, 𝐞, called the eye point, which has exactly the same
meaning as the VRP described above. Next, we define a point, 𝐚, called the at point, at which the
camera is pointing. Together, these points determine the VPN (𝑣𝑝𝑛 = 𝐚 − 𝐞). The specification of
VUP and the derivation of the camera frame is the same as above.
The second part of the viewing process, often called the normalization transformation, involves
specifying and applying a specific projection matrix (parallel or perspective).
Projection is a technique that takes the specification of points in three dimensions and maps them to
points on a two-dimensional projection surface. Such a transformation is not invertible, because all
points along a projector map into the same points on the projection surface.
Orthogonal or orthographic projections are a special case of parallel projections, in which the
projectors are perpendicular to the projection plane.
Projection normalization is a technique that converts all projections into simple orthogonal
projections by distorting the objects such that the orthogonal projection of the distorted objects is
the same as the desired projection of the original objects. This is done by applying a matrix called the
normalization matrix, also known as the projection matrix. Conceptually, the normalization matrix
should be defined such that it transforms (distorts) the specified view volume to coincide exactly
with the canonical (default) view volume. Consequently, vertices are transformed such that vertices
within the specified view volume are transformed to vertices within the canonical view volume, and
vertices outside the specified view volume are transformed to vertices outside the canonical view
volume. The canonical view volume is the cube defined by the planes
𝑥 = ±1,
𝑦 = ±1,
𝑧 = ±1.
Two advantages of employing projection normalization are:
• Both perspective and parallel views can be supported by the same pipeline;
• The clipping process is simplified because the sides of the canonical view volume are aligned
with the coordinate axes.
The shape of the viewing volume for an orthogonal projection is a right-parallelepiped. Thus, the
projection normalization process for an orthographical projection requires two steps:
1. Perform a translation to move the centre of the specified view volume to the centre of the
canonical view volume (the origin).
2. Scale the sides of the specified view volume such that they have a length of 2.
Page 20 of 38
Summary Interactive Computer Graphics
A point in space (𝑥, 𝑦, 𝑧) is projected along a projector into the point �𝑥𝑝 ,𝑦𝑝 , 𝑧𝑝 �. All projectors pass
through the COP (origin), and, because the projection plane is perpendicular to the
𝑧-axis,
𝑧𝑝 = 𝑑.
Because the camera is pointing in the negative 𝑧-direction, 𝑑 is negative. From the top view shown
in the figure above, we see that two similar triangles are formed. Hence
𝑥𝑝 𝑥
=
𝑑 𝑧
𝑥
⇒ 𝑥𝑝 =
𝑧/𝑑
Using the side view:
𝑦𝑝 𝑦
=
𝑑 𝑧
𝑦
⇒ 𝑦𝑝 =
𝑧/𝑑
The division by 𝑧 describes nonuniform foreshortening: The images of objects farther from the
centre of projection are reduced in size(diminution) compared to the images of objects closer to the
COP. Although this perspective transformation preserves lines, it is not affine. It is also irreversible:
we cannot recover a point from its projection.
We can extend our use of homogeneous coordinates to handle projections. When we introduced
homogeneous coordinates, we represented a point in three dimensions (𝑥, 𝑦, 𝑧) by the point
(𝑥, 𝑦, 𝑧, 1) in four dimensions. Suppose that, instead, we replace (𝑥, 𝑦, 𝑧) by the four-dimensional
point
𝑤𝑥
𝑤𝑦
� �.
𝑤𝑧
𝑤
As long as 𝑤 ≠ 0, we can recover the three-dimensional point from its four-dimensional
representation by dividing the first three components by 𝑤; a process known as perspective division.
By allowing 𝑤 to change, we can represent a larger class of transformations, including perspective
projections. Consider the matrix
1 0 0 0
0 1 0 0
𝑀=� �.
0 0 1 0
0 0 1/𝑑 0
The matrix 𝑀 transforms the point [𝑥, 𝑦, 𝑧, 1]𝑇 to the point [𝑥, 𝑦, 𝑧, 𝑧/𝑑]𝑇 . By performing perspective
division (i.e. divide the first three components by the fourth), we obtain
Page 21 of 38
Summary Interactive Computer Graphics
𝑥
⎡𝑧/𝑑⎤ 𝑥𝑝
⎢𝑦 ⎥ 𝑦𝑝
⎢𝑧/𝑑⎥ = �𝑧𝑝 �.
⎢𝑑⎥
1
⎣1⎦
Hence, matrix 𝑀 can be used to perform a simple perspective projection. We apply the projection
matrix after the model-view matrix, but remember that we must perform a perspective division at
the end.
Frustum() and Perspective() are two APIs that can be used to specify a perspective
projection matrix.
The z-buffer algorithm is an image-space algorithm that fits in well with the rendering pipeline. As
primitives are rasterized, we keep track of the distance from the COP or the projection plane to the
closest point on each projector that has already been rendered. We update this information as
successive primitives are projected and filled. Ultimately, we display only the closest point on each
projector. The algorithm requires a depth buffer, or z-buffer, to store the necessary depth
Page 22 of 38
Summary Interactive Computer Graphics
information as primitives are rasterized. The z-buffer forms part of the frame buffer and has the
same spatial resolution as the colour buffer.
Major advantages of this algorithm are that its complexity is proportional to the number of
fragments generated by the rasterizer and that it can be implemented with a small number of
additional calculations over what we have to do to project and display polygons without hidden-
surface removal.
Culling: for a convex object, such as the cube, faces whose normals point away from the viewer are
never visible and can be eliminated or culled before the rasterization process commences.
Consider a simple shadow that falls on the surface 𝑦 = 0. Not only is this
shadow a flat polygon, called a shadow polygon, but it is also the projection
of the original polygon onto this surface. More specifically, the shadow
polygon is the projection of the polygon onto the surface with the centre of
projection at the light source. It is possible to compute the vertices of the
shadow polygon by means of a suitable projection matrix.
Page 23 of 38
Summary Interactive Computer Graphics
Interactions between light and materials can be classified into three groups:
1. Specular surfaces appear shiny because most of the light that is reflected or scattered is in a
narrow range of angles close to the angle of reflection. With a perfectly specular surface, an
incoming light ray may be partially absorbed, but all reflected light from a given angle
emerges at a single angle, obeying the rule that the angle of incidence is equal to the angle
of reflection.
2. Diffuse surfaces are characterized by reflected light being scattered in all directions.
Perfectly diffuse surfaces scatter light equally in all directions.
3. Translucent surfaces allow some light to penetrate the surface and to emerge from another
location on the object. This process of refraction characterizes glass and water.
Page 24 of 38
Summary Interactive Computer Graphics
The Phong model supports the three types of material-light interactions: ambient, diffuse, and
specular. For each light source we can have separate ambient, diffuse, and specular components for
each of the three primary colours. Thus we need nine coefficients to characterize these terms at any
point 𝒑. We can place these coefficients in a 3 × 3 illumination matrix for the ith light source:
The intensity of ambient light 𝐼a is the same at every point on the surface. Some of this light is
absorbed and some is reflected. The amount reflected is given by the ambient reflection coefficient,
𝑘 a (0 ≤ 𝑘 a ≤ 1). Thus: 𝐼a = 𝑘 a 𝐿a
A perfectly diffuse reflector scatters the light that it reflects equally in all directions. Hence, such a
surface appears the same to all viewers (i.e. neither 𝒗 nor 𝒓 need be considered). The amount of
light reflected depends both on the material – because some of the incoming light is absorbed – and
on the position of the light source relative to the surface. Diffuse surfaces, sometimes called
Lambertian surfaces, can be modelled mathematically with Lambert’s law. According to Lambert’s
law, we see only the vertical component of the incoming light. Lambert’s law states that
𝑅 𝑑 ∝ cos 𝜃
where 𝜃 is the angle between the normal at the point of interest 𝒏 and the
direction of the light source 𝒍. If both 𝒍 and 𝒏 are unit-length vectors, then
cos 𝜃 = 𝒍 ∙ 𝒏.
If we add in a reflection coefficient 𝑘 d (0 ≤ 𝑘 d ≤ 1) representing the fraction of incoming diffuse
light that is reflected, we have the diffuse reflection term:
𝐼d = 𝑘 d𝐿d (𝒍∙ 𝒏).
Specular reflection adds a highlight that we see reflected from shiny objects.The amount of light that
the viewer sees depends on the angle 𝜙 between 𝒓, the direction of a perfect reflector and 𝒗, the
direction of the viewer. The Phong model uses the equation
𝐼𝑠 = 𝑘 𝑠 𝐿𝑠 cos α 𝜙
The coefficient 𝑘 𝑠 (0 ≤ 𝑘 𝑠 ≤ 1) is the fraction of the incoming specular light that is reflected. The
exponent α is a shininess coefficient. As α is increased, the reflected light is concentrated in a
narrower region centred on the angle of a perfect reflector. Values in the range 100 to 500
correspond to most metallic surfaces. If 𝒓 and 𝒗 are normalized, then
𝐼𝑠 = 𝑘 𝑠 𝐿𝑠 (𝒓 ∙ 𝒗) 𝛼
Page 25 of 38
Summary Interactive Computer Graphics
If we use the Phong model with specular reflections, the dot product 𝒓 ∙ 𝒗 sould be recalculated at
every point on the surface. An approximation to this involves the unit vector halfway between the
viewer vector and the light-source vector:
𝒍+𝒗
𝒉= .
|𝒍 + 𝒗|
If we replace 𝒓 ∙ 𝒗 with 𝒏 ∙ 𝒉, we avoid calculating 𝒓. When we use the halfway vector in the
calculation of the specular term, we are using the Blim-Phong, or modified Phong, lighting model.
At every point (𝑥, 𝑦, 𝑧) on the surface of a sphere centred at the origin, we have that
𝒏 = (𝑥, 𝑦, 𝑧).
To calculate 𝒓, we first normalize both 𝒍 and 𝒏, and then use the following equation
𝒓 = 2(𝒍 ∙ 𝒏)𝒏 − 𝒍.
GLSL provides a function, reflect(), which we can use in our shaders to compute 𝒓.
Page 26 of 38
Summary Interactive Computer Graphics
Material properties should match up directly with the supported light sources and with the chosen
reflection model. We specify ambient, diffuse, and specular reflectivity coefficients (𝑘 a, 𝑘 d, 𝑘 s) for
each primary colour via three colours using either RGB or RGBA colours. Note that often the diffuse
and specular reflectivity coefficients are the same. For the specular component, we also need to
specify its shininess coefficient.
We also want to allow for scenes in which a light source is within the view volume and thus might be
visible. We can create such effects by including an emissive component that models self-luminous
sources. This term is unaffected by any of the light sources, and it does not affect any other surfaces.
It simply adds a fixed colour to the surface.
We have three choices as to where we do lighting calculations: in the application, in the vertex
shader, or in the fragment shader. But for the sake of efficiency, we will almost always want to do
lighting calculations in the shaders.
Light sources are special types of geometric object and have geometric attributes, such as position,
just like polygons and points. Hence, light sources can be affected by transformations.
Page 27 of 38
Summary Interactive Computer Graphics
Programmable shaders make it possible to not only incorporate more realistic lighting models in real
time but also to create interesting non-photorealistic effects. Two such examples are the use of only
a few colours and emphasizing the edges in objects.
Page 28 of 38
Summary Interactive Computer Graphics
In most applications, textures start out as two-dimensional images that might be formed by
application programs or scanned in from a photograph, but, regardless of their origin, they are
eventually brought into processor memory as arrays. We call the elements of
these arrays texels. We can think of this array as a continuous rectangular
two-dimensional texture pattern 𝑇(𝑠, 𝑡) . The independent variables 𝑠 and 𝑡
are known as texture coordinates and vary over the interval [0, 1].
Texture mapping requires interaction among the application program, the vertex shader, and the
fragment shader. There are three basic steps:
1. We must form a texture image and place it in texture memory on the GPU;
2. We must assign texture coordinates to each fragment;
3. We must apply the texture to each fragment.
Texture objects allow the application program to define objects that consist of the texture array and
the various texture parameters that control its application to surfaces. Many of the complexities of
Page 29 of 38
Summary Interactive Computer Graphics
how we can apply the texture are inside the texture object and thus will allow us to use very simple
fragment shaders. To create a texture object:
1. Get some unused texture identifiers by calling glGenTextures();
2. Bind the texture object (glBindTexture()) to make it the current texture object;
3. Use texture functions to specify the texture image and its parameters, which become part of
the current texture object.
The key element in applying a texture in the fragment shader is the mapping between the location of
a fragment and the corresponding location within the texture image where we will get the texture
colour for that fragment. We specify texture coordinates as a vertex attribute in the application. We
then pass these coordinates to the vertex shader and let the rasterizer interpolate the vertex texture
coordinates to fragment texture coordinates. The key to putting everything together is a variable
called a sampler which most often appears only in a fragment shader. A sampler variable provides
access to a texture object, including all its parameters.
Aliasing of textures is a major problem. When we map texture coordinates to the array of texels, we
rarely get a point that corresponds to the centre of a texel. There are two basic strategies:
• Point sampling: We use the value of the texel that is closest to the texture coordinate output
by the rasterizer. This strategy is the one most subject to visible aliasing errors.
• Linear filtering: We use a weighted average of a group of texels in the neighbourhood of the
texel determined by point sampling. This results in smoother texturing.
The size of the pixel that we are trying to colour on the screen may be smaller or larger than one
texel (i.e. the resolution of the texture image does not match the resolution of the area on the
screen to which the texture is mapped). If the texel is larger than one pixel (the resolution of the
texture image is less than that of the area on the screen), we call it magnification; if the texel is
smaller than one pixel (the resolution of the texture image is greater than that of the area on the
screen), it is called minification.
Mipmaps can be used to deal with the minification problem. For objects that project to an area of
screen space that is small compared with the size of the texel array, we do not need the resolution
of the original texel array. WebGL allows us to create a series of texture arrays, called the mipmap
hierarchy, at reduced sizes. WebGL will automatically use the appropriate sized mipmap from the
mipmap hierarchy.
Page 30 of 38
Summary Interactive Computer Graphics
The classic approach to solve the second difficulty is to project the environment onto a sphere
centred at the centre of projection. WebGL supports a variation of this method called sphere
mapping. The application program supplies a circular image that is the orthographic projection of
the sphere onto which the environment has been mapped. The advantage of this method is that the
mapping from the reflection vector to two-dimensional texture coordinates on this circle is simple
and can be implemented in either hardware or software. The difficult part is obtaining the required
circular image.
Another approach is to compute six projections, corresponding to the six sides of a cube, using six
virtual cameras located at the centre of the box, each pointing in a different direction. Once we
computed the six images, we can specify a cube map in WebGL with six function calls, one for each
face of a cube centred at the origin. The advantage of this approach is that we can compute the
environment map using the standard projections that are supported by the graphics system.
These techniques are examples of multi-pass rendering (or multi-rendering) techniques, where, in
order to compute a single image, we compute multiple images, each using the rendering pipeline.
The opacity of a surface is a measure of how much light penetrates through that surface. An opacity
of 1 (𝛼 = 1) corresponds to a completely opaque surface that blocks all light incident on it. A surface
with an opacity of 0 is transparent; all light passes through it. The transparency or translucency of a
surface with opacity 𝛼 is given y 1 − 𝛼.
Page 31 of 38
Summary Interactive Computer Graphics
The major difficulty with compositing is that for most choices of the blending factors, the order in
which we render the polygons affects the final image. Consequently, unlike most WebGL programs
where the user does not have to worry about the order in which polygons are rasterized, to get a
desired effect we must now control this order within the application. In applications where handling
of translucency must be done in a consistent and realistic manner, we often must sort the polygons
from front to back within the application. Then depending on the application, we can do a front-to-
back or back-to-front rendering using WebGL’s blending functionality.
A more subtle but visibly apparent problem occurs when we combine opaque and translucent
objects in a scene. In a scene containing both opaque and translucent polygons, any polygon (or part
of a polygon) behind an opaque polygon should not be rendered, but polygons (or parts of polygons)
behind translucent polygons should be composited. If all polygons are rendered with the standard z-
buffer algorithm, compositing will not be performed correctly, particularly if a translucent polygon is
rendered first, and an opaque behind it is rendered later. However, if we make the z-buffer read-
only when rendering translucent polygons, we can prevent the depth information from being
updated when rendering translucent objects. In other words, if the depth information allows a pixel
to be rendered, it is blended (composited) with the pixel already stored there. If the pixel is part of
an opaque polygon, the depth data is updated, but if it is a translucent pixel, the depth data is not
updated.
One of the major uses of the 𝛼 channel is for antialiasing. When rendering a line, instead of
colouring an entire pixel with the colour of the line if it passes through it, the amount of contribution
of the line to the pixel is stored in the pixels alpha value. This value is then used to calculate the
intensity of the colour (specified by the RGB values), and avoids the sharp contrasts and steps of
aliasing.
Rather than antialiasing individual lines and polygons, we can antialias the entire scene using a
technique called multisampling. In this mode, every pixel in the frame buffer contains a number of
samples. Each sample is capable of storing a colour, depth, and other values. When a scene is
rendered, it is as if the scene is rendered at an enhanced resolution. However, when the image must
be displayed in the frame buffer, all of the samples for each pixel are combined to produce the final
pixel colour.
Page 32 of 38
Summary Interactive Computer Graphics
At a high level, we can consider the graphics system as a black box whose inputs are the vertices and
states defined in the program – geometric objects, attributes, camera specifications – and whose
output is an array of coloured pixels in the frame buffer. Within this black box, we must do many
tasks, including transformations, clipping shading, hidden-surface removal, and rasterization of the
primitives that can appear on the display. Every geometric object must be passed through this
system, and we must assign a colour to every pixel in the colour buffer that is displayed.
In the object-oriented approach, we loop over the objects. A pipeline renderer fits this description
Vertices flow through a sequence of modules that transform them, colours them, and determines
whether they are visible. Data (vertices) flow forward through the system. Because we are doing the
same operations on every primitive, the hardware to build an object-based system is fast and
relatively inexpensive. Because each geometric primitive is processed independently, the main
limitation of object-oriented implementations is that they cannot handle most global calculations.
Image-oriented approaches loop over pixels, or rows of pixels called scan-lines, that constitute the
frame buffer. For each pixel, we work backward, trying to determine which geometric primitives can
contribute to its colour. The main disadvantage of this approach is that all the geometric data must
be available at all times during the rendering process.
Page 33 of 38
Summary Interactive Computer Graphics
8.3 Clipping
Clipping is the process of determining which primitives, or parts of primitives, should be eliminated
because they lie outside the viewing volume. Clipping is done before the perspective division that is
necessary if the 𝑤 component of a clipped vertex is not equal to 1.The portions of all primitives that
can possibly be displayed lie within the cube
𝑤 ≥ 𝑥 ≥ −𝑤,
𝑤 ≥ 𝑦 ≥ −𝑤,
𝑤 ≥ 𝑧 ≥ −𝑤.
Note that projection has been carried out only partially at this stage: perspective division and the
final orthographic projection must still be performed.
Cohen-Sutherland clipping: The algorithm starts by extending the sides of the clipping rectangle to
infinity, thus breaking up space into the nine regions shown in the diagram below.
1001 1000 1010 𝑦 = 𝑦
max
0001 0000 0010 𝑦 = 𝑦
min
0101 0100 0110
𝑥 = 𝑥min 𝑥 = 𝑥 max
Each region is assigned a unique 4-bit binary number called an outcode, 𝑏0 𝑏1 𝑏2 𝑏3, as follows.
Suppose that (𝑥, 𝑦) is a point in the region; then
1 if 𝑦 > 𝑦max
𝑏0 = �
0 otherwise
Likewise, 𝑏1 is 1 if 𝑦 < 𝑦min, and 𝑏2 and 𝑏3 are determined by the relationship between 𝑥 and the
left and right sides of the clipping window. The resulting codes are indicated in the diagram above.
For each endpoint of a line segment, we first compute the endpoint’s outcode. Consider a line
segment whose outcodes are given by 𝑜1 = 𝑜𝑢𝑡𝑐𝑜𝑑𝑒(𝑥1 ,𝑦1 ) and 𝑜2 = 𝑜𝑢𝑡𝑐𝑜𝑑𝑒(𝑥2 , 𝑦2 ). There are
four cases to consider:
1. (𝑜1 = 𝑜2 = 0). Both endpoints are inside the clipping window, as is true for segment AB in
the figure below. The entire line segment can be sent on to be rasterized.
2. (𝑜1 ≠ 0, 𝑜2 = 0; or vice versa). One endpoint is inside the clipping window; one is outside
(see segment CD in the figure below). The line segment must be shortened. The nonzero
outcode indicates which edge or edges of the window are crossed by the segment. One or
two intersections must be computed. Note that after one intersection is computed, we can
Page 34 of 38
Summary Interactive Computer Graphics
Thus, with this algorithm we do intersection calculations only when they are needed, as in the
second case, or where the outcodes did not contain enough information, as in the fourth case. The
Cohen-Sutherland algorithm works best when there are many line segments but few are actually
displayed (line segments lie fully outside one or two of the extended sides of the clipping rectangle).
This algorithm can be extended to three dimensions. The main disadvantage of the algorithm is that
it must be used recursively.
Liang-Barsky Clipping: Suppose that we have a line segment defined by the two endpoints
𝑝1 = [𝑥1 , 𝑦1 ]𝑇 and 𝑝2 = [ 𝑥2 ,𝑦2 ]𝑇 . We can parametrically express this line in either matrix form:
𝑝(𝛼) = (1 − 𝛼) 𝑝1 + 𝛼𝑝2 ,
or as two scalar equations:
𝑥(𝛼) = (1 − 𝛼)𝑥1 + 𝛼𝑥 2 ,
𝑦(𝛼) = (1 − 𝛼) 𝑦1 + 𝛼𝑦2 .
As the parameter 𝛼 varies from 0 to 1, we move along the segment from 𝑝1 to 𝑝2; negative values of
𝛼 yield points on the line on the other side of 𝑝1 from 𝑝2; values of 𝛼 > 1 gives points on the line
past 𝑝2. Consider a line segment and the line of which it is part, as shown in the figure below
As long as the line is not parallel to a side of the window (if it is, we can handle that situation with
ease), there are four points where the line intersects the extended sides of the window. These
points correspond to the four values of the parameter: 𝛼1 (bottom), 𝛼2 (left), 𝛼3 (top) and 𝛼4 (right).
We can order these values and determine which correspond to intersections (if any) that we need
for clipping. For the first example, 0 < 𝛼1 < 𝛼2 < 𝛼3 < 𝛼4 < 1. Hence, all four intersections are
inside the original line segment, with the two innermost (𝛼2 and 𝛼3 ) determining the clipped line
segment.The case in the second example also has the four intersections between the endpoints of
the line segment, but notice that the order for this case is 0 < 𝛼1 < 𝛼3 < 𝛼2 < 𝛼4 < 1. The line
intersects both the top and the bottom of the window before it intersects either the left or the right;
Page 35 of 38
Summary Interactive Computer Graphics
thus, the entire line segment must be rejected. Other cases of the ordering of the points of
intersection can be argued in a similar way.
The efficiency of this approach, compared to that of the Cohen-Sutherland algorithm, is that we
avoid multiple shortening of line segments and the related re-executions of the clipping algorithm.
8.8 Rasterization
Pixels have attributes that are colours in the colour buffer.
Fragments are potentially pixels. Each fragment has a colour attribute and a location in screen
coordinates that corresponds to a location in the colour buffer. Fragments also carry depth
information that can be used for hidden-surface removal.
The DDA algorithm (Digital Differential Analyser): Suppose that we have a line segment defined by
the endpoints ( 𝑥1 ,𝑦1 ) and ( 𝑥2 ,𝑦2 ). Because we are working in a colour buffer, we assume that
these are all integer values. The slope of his line is given by
𝑦2 − 𝑦1 ∆𝑦
𝑚= = .
𝑥2 − 𝑥1 ∆𝑥
This algorithm is based on writing a pixel for each value of ix in write_pixel as 𝑥 goes from 𝑥1
to 𝑥 2. . For any change in 𝑥 equal to ∆𝑥, the corresponding changes in 𝑦 must be
∆𝑦 = 𝑚∆𝑥.
As we move from 𝑥1 to 𝑥 2, we increase 𝑥 by 1 in each iteration; thus, we must increase 𝑦 by
∆𝑦 = 𝑚.
This algorithm in pseudo code is:
For large slopes, the separation between fragments can be large, generating an unacceptable
approximation to the line segment. To alleviate this problem, for slopes greater than 1, we swap the
roles of 𝑥 and 𝑦.
Flat simple polygons have well-defined interiors. If they are also convex, they are guaranteed to be
rendered correctly by WebGL.
Page 36 of 38
Summary Interactive Computer Graphics
Inside-outside testing: Conceptually, the process of filling the inside of a polygon with a colour or
pattern is equivalent to deciding which points in the plane of the polygon are interior (inside) points.
• The crossing (or odd-even) test is the most widely used test for making inside-outside
decisions. Suppose that p is a point inside a polygon. Any ray emanating from p and going off
to infinity must cross an odd number of edges. Any ray emanating from a point outside the
polygon and entering the polygon crosses an even number of edges before reaching infinity.
Usually, we replace rays through points with scan-lines, and we count the crossing of
polygon edges to determine inside and outside.
• The winding test considers the polygon as a knot being wrapped around a point or a line. We
start by traversing the edges of the polygon from any starting vertex and going around the
edge in a particular direction until we reach the starting point. Next we consider an arbitrary
point. The winding number for this point is the number of times it is encircled by the edges
of the polygon. We count clockwise encirclements as positive and counter-clockwise
encirclements as negative. A point is inside the polygon if its winding number is not zero.
Since WebGL can only render triangles; to render a more complex (non-flat or concave) polygon, we
can apply a tessellation algorithm to this polygon to subdivide it into triangles. A goo tessellation
should not produce triangles that are long and thin; it should, if possible, produce sets of triangles
that can use supported features, such as triangle strips and triangle fans.
Flood fill: This algorithm works directly with pixels in the frame buffer. We first rasterize a polygon’s
edges into the frame buffer using Bresenham’s algorithm. If we can find an initial point (𝑥, 𝑦) that lie
inside these edges (called a seed point), we can look at its neighbouring pixels recursively, colouring
them with the fill colour only if they are not coloured already. We can obtain a number of variants of
flood fill by removing the recursion. One way to do so is to work one scan-line at a time (called scan-
line fill).
Culling: For situations where we cannot see back faces, such as scenes composed of convex
polyhedral, reduce the work required for hidden-surface removal by eliminating all back-facing
polygons before we apply any other hidden-surface-removal algorithms. A polygon is facing forward
if and only if 𝑛 ∙ 𝑣 ≥ 0, where 𝑣 is in the direction of the viewer and 𝑛 is the normal to the front face
of the polygon. Usually, culling is performed after the transformation to normalized device
coordinates (perspective division).
The z-buffer is a buffer which usually has the same spatial resolution as the colour buffer, and before
each scene rendering, each of its elements is initialized to a depth corresponding to the maximum
distance away from the centre of projection. At any time during rasterization and fragment
processing, each location in the z-buffer contains the distance along the ray corresponding to the
location of the closest polygon found so far. Rasterization is done polygon by polygon using some
rasterization algorithm. For each fragment on the polygon corresponding to the intersection of the
polygon with a ray through a pixel, we compute the depth from the centre of projection. The
Page 37 of 38
Summary Interactive Computer Graphics
method compares this depth to the value in the z-buffer corresponding to this fragment. If this
depth is greater than the depth in the z-buffer, this fragment is discarded. If the depth is less than
the depth in the z-buffer, update the depth in the z-buffer and place the shade computed for this
fragment at the corresponding location in the colour buffer. The z-buffer algorithm is the most
widely used hidden-surface-removal algorithm. It has the advantages of being easy to implement, in
either hardware or software, and of being compatible with pipeline architectures, where it can
execute at the speed at which fragments are passing through the pipeline.
Suppose that we have already computed the z-extents of each polygon. The next step of depth sort
is to order all the polygons by how far away from the viewer their maximum z-value is. If no two
polygons’ z-extents overlap, we can paint the polygons back to front and we are done. However, if
the z-extents of two polygons overlap, we still may be able to find an order to paint (render) the
polygons individually and yield the correct image. The depth-sort algorithm runs a number of
increasingly more difficult tests, attempting to find such an ordering. Two troublesome situations
remain: If three or more polygons overlap cyclically, or if a polygon can pierce another polygon,
there is no correct order for painting without having to subdivide some polygons. The main idea
behind this class of algorithms is that if one object obscures part of another then the first object is
painted after the object that it obscures. The painter’s algorithm is an example of such a method.
8.12 Antialiasing
Rasterized line segments and edges of polygons can appear jagged. Aliasing errors are caused by
three related problems with the discrete nature of the frame buffer:
1. If we have an 𝑛 × 𝑚 frame buffer, the number of pixels is fixed, and we can generate only
certain patterns to approximate a line segment.
2. Pixel locations are fixed on a uniform grid; regardless of where we would like to place pixels,
we cannot place them at other than evenly spaced locations.
3. Pixels have a fixed size and shape.
The scan-conversion algorithm forces us, for lines of slope less than 1, to choose exactly one pixel
value for each value of 𝑥. If, instead, we shade each box by the percentage of the ideal line that
crosses it, we get a smoother looking rendering. This technique is known as antialiasing by area
averaging. If polygons share a pixel, and each polygon has a different colour, the colour assigned to
the pixel is the one associated with the polygon closest to the viewer. We could obtain a much more
accurate image if we could assign a colour based on an area-weighted average of the colours of
these polygons. Such algorithms can be implemented with fragment shaders on hardware with
floating point frame buffers. These algorithms are collectively known as spatial-domain aliasing.
Page 38 of 38