Text Book For Computer Graphics
Text Book For Computer Graphics
CONTENTS
The aim of this lesson is to learn the introduction, history and various applications of
computer graphics
The objectives of this lesson are to make the student aware of the following concepts
a. History of Computer Graphics
b. Applications of Computer Graphics
c. Graphical User Interface
1
1.2 Introduction
In the 1950’s, output are via teletypes, lineprinter, and Cathode Ray Tube (CRT).
Using dark and light characters, a picture can be reproduced. In the 1960’s, beginnings
of modern interactive graphics, output are vector graphics and interactive graphics. One
of the worst problems was the cost and inaccessibility of machines. In the early 1970’s,
output start using raster displays, graphics capability was still fairly chunky. In the
1980’s output are built-in raster graphics, bitmap image and pixel. Personal computers
costs decrease drastically; trackball and mouse become the standard interactive devices.
In the 1990’s, since the introduction of VGA and SVGA, personal computer could easily
display photo-realistic images and movies. 3D image renderings became the main
advances and it stimulated cinematic graphics applications. Table 1: gives a general
history of computer graphics.
2
Table 1: General History of Computer Graphics
Year Inventions, discovery and findings
1950 Ben Laposky created the first graphic images, an Oscilloscope, generated by an electronic
(analog) machine. The image was produced by manipulating electronic beams and
recording them onto high-speed film.
1951 1) UNIVAC-I: the first general purpose commercial computer, crude hardcopy devices,
and line printer pictures.
2) MIT – Whirlwind computer, the first to display real time video, and capable of
displaying real time text and graphic on a large oscilloscope screen.
1960 William Fetter coins the computer graphics to describe new design methods.
1961 Steve Russel developed Spacewars, the first video/computer game
1963 1) Douglas Englebart developed first mouse
2) Ivan Sutherland developed Sketchpad, an interactive CG system, a man-machine
graphical communication system with pop-up menus, constraint-based drawing,
hierarchical modeling, and utilized lightpen for interaction. He formulated the ideas of
using primitives, lines polygons, arcs, etc. and constraints on them; He developed the
dragging, rubberbanding and transforming algorithms; He introduced data structures
for storing. He is considered the founder of the computer graphics.
1964 William Fetter developed first computer model of a human figure
1965 Jack Bresenham designed line-drawing algorithm
1968 1) Tektronix – a special CRT, the direct-view storage tube, with keyboard and mouse, a
simple computer interface for $15, 000, which made graphics affordable
2) Ivan Sutherland developed first head-mounted display
1969 John Warnock – area subdivision algorithm, hidden-surface algorithms
Bell Labs – first framebuffer containing 3 bits per pixel
1972 Nolan Kay Bushnell – Pong, video arcade game
1973 John Whitney. Jr. and Gary Demos – “Westworld”, first film with computer graphics
1974 Edwin Catmuff –texture mapping and Z-buffer hidden-surface algorithm
James Blinn – curved surfaces, refinement of texture mapping
Phone Bui-Toung – specular highlighting
1975 Martin Newell – famous CG teapot, using Bezier patches
Benoit Mandelbrot – fractal/fractional dimension
1976 James Blinn – environment mapping and bump mapping
1977 Steve Wozniak -- Apple II, color graphics personal computer
1979 Roy Trubshaw and Richard Bartle – MUD, a multi-user dungeon/Zork
1982 Steven Lisberger – “Tron”, first Disney movie which makes extensive use of 3-D graphics
Tom Brighman – “Morphing”, first film sequence plays a female character which deforms
and transforms herself into the shape of a lynx.
John Walkner and Dan Drake – AutoCAD
1983 Jaron Lanier – “DataGlove”, a virtual reality film.
1984 Wavefron tech. – Polhemus, first 3D graphics software
1985 Pixar Animation Studios – “Luxo Jr.”, 1989, “ Tin toy”
NES – Nintendo home game system
1987 IBM – VGA, Video Graphics Array introduced
1989 Video Electronics Standards Association (VESA) – SVGA, Super VGA formed
1990 Hanrahan and Lawson – Renderman
1991 Disney and Pixar – “Beauty and the Beast”, CGI was widely used, Renderman systems
provides fast, accurate and high quality digital computer effects.
1992 Silicon Graphics – OpenGL specification
1993 University of Illinois -- Mosaic, first graphic Web browser
Steven Spielberg – “Jurassic Park” a successful CG fiction film.
1995 Buena Vista Pictures – “Toy Story”, first full-length, computer-generated, feature film
NVIDIA Corporation – GeForce 256, GeForce3(2001)
2003 ID Software – Doom3 graphics engine
3
1.4 Applications of Computer Graphics
Computer-aided design (CAD) is use of a wide range of computer based tools that
assist engineers, architects and other design profession in their design activities. It is the
main geometry authoring tool within the Product Lifecycle Management process and
involves both software and sometimes special-purpose hardware. Current packages range
from 2D vector base drafting systems to 3D solid and surface modellers.
CAD is used to design, develop and optimize products, which can be goods used
by end consumers or intermediate goods used in other products. CAD is also extensively
used in the design of tools and machinery used in the manufacture of components, and in
the drafting and design of all types of buildings, from small residential types (houses) to
the largest commercial and industrial structures (hospitals and factories).
4
The capabilities of modern CAD systems include (a) Wireframe geometry
creation, (b) 3D parametric feature based modelling, Solid modeling, (c) Freeform
surface modeling, (d) Automated design of assemblies, which are collections of parts
and/or other assemblies, (e) create Engineering drawings from the solid models, (f) Reuse
of design components, (g) Ease of modification of design of model and the production of
multiple versions, (h) Automatic generation of standard components of the design, (i)
Validation/verification of designs against specifications and design rules, (j) Simulation
of designs without building a physical prototype, (k) Output of engineering
documentation, such as manufacturing drawings, and Bills of Materials to reflect the
BOM required to build the product, (l) Import/Export routines to exchange data with
other software packages, (m) Output of design data directly to manufacturing facilities,
(n) Output directly to a Rapid Prototyping or Rapid Manufacture Machine for industrial
prototypes, (o) maintain libraries of parts and assemblies, (p) calculate mass properties of
parts and assemblies, (q) aid visualization with shading, rotating, hidden line removal,
etc..., (r) Bi-directional parametric association (modification of any feature is reflected in
all information relying on that feature; drawings, mass properties, assemblies, etc... and
counter wise), (s) kinematics, interference and clearance checking of assemblies, (t) sheet
metal, (u) hose/cable routing, (v) electrical component packaging, (x) inclusion of
programming code in a model to control and relate desired attributes of the model, (y)
Programmable design studies and optimization, (z) Sophisticated visual analysis routines,
for draft, curvature, curvature continuity...
Originally software for CAD systems were developed with computer language
such as Fortran, but with the advancement of object-oriented programming methods this
has radically changed. Typical modern parametric feature based modeler and freeform
surface systems are built around a number of key C programming language modules with
their own APIs.
Today most CAD computer workstations are Windows based PCs; some CAD
systems also run on hardware running with one of the Unix operating systems and a few
with Linux. Some CAD systems such as NX provide multiplatform support including
Windows, LINUX, UNIX and Mac OSX.
CAD of Jet Engine CAD and Rapid Prototyping Parachute Modeling and Simulation
5
virtual 3-D interiors (Virtual Environment) CAM(jewelry industry) CAM
Since the age of the Industrial Revolution, the manufacturing process has
undergone many dramatic changes. One of the most dramatic of these changes is the
introduction of Computer Aided Manufacturing (CAM), a system of using computer
technology to assist the manufacturing process.
Through the use of CAM, a factory can become highly automated, through
systems such as real-time control and robotics. A CAM system usually seeks to control
the production process through varying degrees of automation. Because each of the many
manufacturing processes in a CAM system is computer controlled, a high degree of
precision can be achieved that is not possible with a human interface.
The CAM system, for example, sets the toolpath and executes precision machine
operations based on the imported design. Some CAM systems bring in additional
6
automation by also keeping track of materials and automating the ordering process, as
well as tasks such as tool replacement.
Robotic arms and machines are commonly used in factories, but these do still
require human workers. The nature of those workers' jobs change however. The repetitive
tasks are delegated to machines; the human workers' job descriptions then move more
towards set-up, quality control, using CAD systems to create the initial designs, and
machine maintenance.
1.4.3 Entertainment
One of the main goals of todays special effects producers and animators is to
create images with highest levels of photorealism. Volume graphics is the key technology
to provide full immersion in upcoming virtual worlds e.g. movies or computer games.
Real world phenomena can be realized best with true physics based models and volume
graphics is the tool to generate, visualize and even feel these models! Movies like Star
Wars Episode I, Titanic and The Fifth Element already started employing true physics
based effects.
Entertainment Games
Medical content creation has become more and more important in entertainment
and education in the last years. For instance, virtual anatomical atlas on CD-ROM and
DVD have been build on the base of the NIH Visible Human Project data set and
7
different kind of simulation and training software were build up using volume rendering
techniques. Volume Graphics' products like the VGStudio software are dedicated to the
used in the field of medical content creation. VGStudio provides powerful tools to
manipulate and edit volume data. An easy to use keyframer tool allows to generate
animations, e.g. flights through any kind of volume data. In addition VGStudio provides
highest image quality and unsurpassed performance already on a PC!
1.4.5 Advertisement
Voxel data can be used to visualize the most fascinating and complex facts in the
world. The visualization of the human body and medical content creation is an example.
Voxel data sets like CT or MRI scans or the exciting Visible Human data show all the
finest details up to the gross structures of the human anatomy. Images rendered by
Volume Graphics 3D graphics software are already used for US TV productions as well
as for advertising. Volume Graphics cooperates with companies specialized on Video and
TV productions as well as with advertising agencies.
1.4.6 Visualization
8
Visualization today has ever-expanding applications in science, engineering
Product visualization, all forms of education, interactive multimedia, medicine etc.
Typical of a visualization application is the field of computer graphics. The invention of
computer graphics may be the most important development in visualization. The
development of animation also helped advance visualization.
Graphic images and models are proving not only useful, but crucial in many
contemporary fields dealing with complex data. Only by graphically combining millions
9
of discrete data items, for example, can meteorologists track weather systems, including
hurricanes that may threaten thousands of lives. Theoretical physicists depend on images
to think about events like collisions of cosmic strings at 75 percent of the speed of light,
and chaos theorists require pictures to find order within apparent disorder. Computer-
aided design systems are critical to the design and manufacture of an extensive range of
contemporary products, from silicon chips to automobiles, in fields ranging from space
technology to clothing design.
Computer systems, on which we all increasingly depend, are also becoming more
and more visually oriented. Graphical user interfaces are the emerging standard, and
graphic tools are the heart of contemporary systems analysis, identifying and preventing
critical errors and omissions that might otherwise not be evident until the system is in
daily use. Graphic computer-aided systems engineering (CASE) tools are now used to
build other computer systems. Recent research indicates that visual computer
programming produces better comprehension and accuracy than do traditional
programming languages based on words, and commercial visual programming packages
are now on the market.
Medical research and practice offer many examples of the use of graphic tools
and images. Conceptualizing the deoxyribonucleic acid (DNA) double helix permitted
dramatic advances in genetic research years before the structure could actually be seen.
Computerized imaging systems like computerized tomography (CT) and magnetic
resonance imaging (MRI) have produced dramatic improvements in the diagnosis and
treatment of serious illness, and a project compiling a three-dimensional cross-section of
the human body provides a new approach to the study of anatomy. X-rays, venerable
medical imaging tools, are now being combined with expert systems to help physicians
identify other cases similar to those they are handling, suggesting additional diagnostic
and treatment information relevant to patients.
10
1.5 Graphical user interface
A graphical user interface (GUI) is a type of user interface which allows people to
interact with a computer and computer-controlled devices which employ graphical icons,
visual indicators or special graphical elements called "widgets", along with text, labels or
text navigation to represent the information and actions available to a user. The actions
are usually performed through direct manipulation of the graphical elements.
Some graphical user interfaces are designed for the rigorous requirements of vertical
markets. These are known as "application specific graphical user interfaces." Examples of
application specific graphical user interfaces:
Touch screen point of sale software used by wait staff in busy restaurants
11
Self-service checkouts used in some retail stores..
ATMs
Airline self-ticketing and check-in
Information kiosks in public spaces like train stations and museums
Monitor/control screens in embedded industrial applications which employ a real
time operating system (RTOS).
The latest cell phones and handheld game systems also employ application specific
touch screen graphical user interfaces. Cars have graphical user interfaces in them. For
example, GPS navigation, touch screen multimedia centers, and even on dashboards of
the newer cars.
12
Screenshot showing the 'cube' plugin of Compiz on Ubuntu
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) The need of Computer Graphics in the modern world
b) The use of Computer Graphics in the modern world
13
1.9 Model answers to “Check your Progress”
1.10 References
14
LESSON – 2: OVERVIEW OF COMPUTER GRAPHICS
CONTENTS
The aim of this lesson is to learn the concepts of computer display, random scan and
raster scan systems.
The objectives of this lesson are to make the student aware of the following concepts
a) Display systems
b) Cathode ray tube
c) Random Scan
d) Raster Scan and
e) Display processor
2.2 Introduction
15
output: A display system presenting rapidly variable (not just hard-copy)
graphical output;
input: Some input device(s), e.g. keyboard + mouse. These may provide
graphical input:
o A mouse provides graphical input the computer echoes as a graphical
cursor on the display.
o A keyboard typically provides graphical input located at a separate text
cursor position.
There may be other I/O devices, e.g. a scanner and/or printer, microphone(s)
and/or speakers.
A display device such as a CRT (cathode ray tube), liquid crystal display, etc.
o Most have a screen which presents a 2D image;
o Stereoscopic displays show distinct 2D images to each eye (head-mounted
/ special glasses);
o Displays with true 3D images are available.
A display processor controlling the display according digital instructions about
what to display.
memory for these instructions or image data, possibly part of a computer's
ordinary RAM.
16
The CRT or cathode ray tube, is the picture tube of a monitor. The back of the
tube has a negatively charged cathode. The electron gun shoots electrons down the tube
and onto a charged screen. The screen is coated with a pattern of dots that glow when
struck by the electron stream. Each cluster of three dots, one of each color, is one pixel.
The image on the monitor screen is usually made up from at least tens of
thousands of such tiny dots glowing on command from the computer. The closer together
the pixels are, the sharper the image on screen. The distance between pixels on a
computer monitor screen is called its dot pitch and is measured in millimeters. Most
monitors have a dot pitch of 0.28 mm or less.
There are two electromagnets around the collar of the tube which deflect the
electron beam. The beam scans across the top of the monitor from left to right, is then
blanked and moved back to the left-hand side slightly below the previous trace (on the
next scan line), scans across the second line and so on until the bottom right of the screen
is reached. The beam is again blanked, and moved back to the top left to start again. This
process draws a complete picture, typically 50 to 100 times a second. The number of
times in one second that the electron gun redraws the entire image is called the refresh
rate and is measured in hertz (cycles per second). It is common, particularly in lower-
priced equipment, for all the odd-numbered lines of an image to be traced, and then all
the even-numbered lines; the circuitry of such an interlaced display need be capable of
only half the speed of a non-interlaced display. An interlaced display, particularly at a
relatively low refresh rate, can appear to some observers to flicker, and may cause
eyestrain and nausea.
17
CRT computer monitor
Liquid crystal display (LCD). LCDs are the most popular display device for new
computers in the Western world.
Cathode ray tube (CRT)
o Vector displays, as used on the Vectrex, many scientific and radar
applications, and several early arcade machines (notably Asteroids -
always implemented using CRT displays due to requirement for a
deflection system, though can be emulated on any raster-based display.
o Television receivers were used by most early personal and home
computers, connecting composite video to the television set using a
modulator. Image quality was reduced by the additional steps of
composite video → modulator → TV tuner → composite video.
Plasma display
Surface-conduction electron-emitter display (SED)
Video projector - implemented using LCD, CRT, or other technologies. Recent
consumer-level video projectors are almost exclusively LCD based.
Organic light-emitting diode (OLED) display
18
Refresh rate. The number of times in a second that a display is illuminated.
Power consumption, measured in watts (W).
Aspect ratio, which is the horizontal size compared to the vertical size, e.g. 4:3 is
the standard aspect ratio, so that a screen with a width of 1024 pixels will have a
height of 768 pixels. A widescreen display can have an aspect ratio of 16:9, which
means a display that is 1024 pixels wide will have a height of 576 pixels.
Display resolution. The number of distinct pixels in each dimension that can be
displayed.
A fraction of all LCD monitors are produced with "dead pixels"; due to the desire
to increase profit margins by companies, most manufacturers sell monitors with dead
pixels. Almost all manufacturers have clauses in their warranties which claim monitors
with fewer than some number of dead pixels is not broken and will not be replaced. The
dead pixels are usually stuck with the green, red, and/or blue subpixels either individually
always stuck on or off. Like image persistence, this can sometimes be partially or fully
reversed by using the same method listed below, however the chance of success is far
lower than with a "stuck" pixel.
Screen burn-in, where a static image left on the screen for a long time embeds the
image into the phosphor that coats the screen, is an issue with CRT and Plasma computer
monitors and televisions. The result of phosphor burn-in are "ghostly" images of the
static object visible even when the screen has changed, or is even off. This effect usually
fades after a period of time. LCD monitors, while lacking phosphor screens and thus
immune to phosphor burn-in, have a similar condition known as image persistence, where
the pixels of the LCD monitor "remember" a particular color and become "stuck" and
unable to change. Unlike phosphor burn-in, however, image persistence can sometimes
be reversed partially or completely. This is accomplished by rapidly displaying varying
colors to "wake up" the stuck pixels. Screensavers using moving images, prevent both of
these conditions from happening by constantly changing the display. Newer monitors are
more resistant to burn-in, but it can still occur if static images are left displayed for long
periods of time.
Many monitors have analog signal relay, but some more recent models (mostly
LCD screens) support digital input signals. It is a common misconception that all
computer monitors are digital. For several years, televisions, composite monitors, and
computer displays have been significantly different. However, as TVs have become more
versatile, the distinction has blurred.
Some users use more than one monitor. The displays can operate in multiple
modes. One of the most common spreads the entire desktop over all of the monitors,
which thus act as one big desktop. The X Window System refers to this as Xinerama.
19
Two Apple flat-screen monitors used as dual display
Random scan displays, often termed vector displays, came first and are still used
in some applications. Here the electron gun of a CRT illuminates points and/or
straight lines in any order. The display processor repeatedly reads a variable
'display file' defining a sequence of X,Y coordinate pairs and brightness or colour
values, and converts these to voltages controlling the electron gun.
Raster scan displays, also known as bit-mapped or raster displays, are somewhat
less relaxed. Their whole display area is updated many times a second from
image data held in raster memory. The rest of this handout concerns hardware
and software aspects of raster displays.
20
2.5 Raster Scan
2.5.1 Rasters
A Raster
Pixel positions have X,Y coordinates. Usually Y points down. This may reflect
early use to display text to western readers. Also when considering 3D, right-handed
coordinates imply Z represents depth.
21
Monochrome displays are of two types. Bi-level displays have 1-bit pixels and have
been green or orange as well as black-and-white. Greyscale displays usually have 8 to 16
bit pixel values encoding brightness.
Non-monochrome displays also have different types. True-colour displays have pixel
values divided into three component intensities, usually red, green and blue, often of 8
bits each. This used to be very costly. Alternatively the pixel values may index into a
fixed or variable colour map defining a limited colour palette. Pseudo-colour displays
with 8-bit pixels indexing a variable colour map of 256 colours have been common.
Pixmap: A pixmap is storage for a whole raster of pixel values. Usually a contiguous
area of memory, comprising one row (or column) of pixels after another.
Bitmap: Technically a bitmap is a pixmap with 1 bit per pixel, i.e. boolean colour values,
e.g. for use in a black-and-white display. But 'bitmap' is often misused to mean any
pixmap - please try to avoid this!
Pixrect: A pixrect is any 'rectangular area' within a pixmap. A pixrect thus typically
refers to a series of equal-sized fragments of the memory within a pixmap, one for each
row (or column) of pixels.
Frame Buffer: In a bit-mapped display, the display processor refreshes the screen 25 or
more times per second, a line at a time, from a pixmap termed its frame buffer. In each
refresh cycle, each pixel's colour value is 'copied' from the frame buffer to the screen.
Frame buffers are often special two-ported memory devices ('video memory') with one
port for writing and another for concurrent reading. Alternatively they can be part of the
ordinary fast RAM of a computer, which allows them to be extensively reconfigured by
software.
Additional raster memory may exist 'alongside' that for colour values. For example there
may be an 'alpha channel' (transparency values) a z-buffer (depth values for hidden object
removal), or an a-buffer (combining both ideas). The final section of these notes will
return to this area, especially use of a z-buffer.
22
2.5.4 Key Attributes of Raster Displays
Major attributes that vary between different raster displays include the following:
'Colour': bi-level, greyscale, pseudo-colour, true colour: see 'pixel values' above;
Size: usually measured on the diagonal: inches or degrees;
Aspect ratio: now usually 5:4 or 4:3 (625-line TV: 4:3; HDTV: 5:3);
Resolution: e.g. 1024×1280 (pixels). Multiplying these numbers together we can
say e.g. 'a 1.25 Mega-pixel display'. Avoid terms such as low/medium/high
resolution which may change over time.
Pixel shape: now usually square; other rectangular shapes have been used.
Brightness, sharpness, contrast: possibly varying significantly with respect to
view angle.
Speed, interlacing: now usually 50 Hz or more and flicker-free to most humans;
Computational features, as discused below...
Since the 1970s, raster display systems have evolved to offer increasingly powerful
facilities, often packaged in optional graphics accelerator boards or chips. These facilities
have typically consisted of hardware implementation or acceleration of computations
which would otherwise be coded in software, such as:
It is useful for graphics software developers to be aware of such features and how
they can be accessed, and to have insight into their cost in terms of time taken as a
function of length or area.
23
the present invention does not require a bit map frame buffer to be utilized before
displaying windowed data on a screen. Each horizontal strip may be as thin as 1 pixel,
which allows for the formation of windows of irregular shapes, such as circles.
In this lesson we have learnt about random scan, raster scan, and the display
processor.
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Discuss about raster memory
Discuss about the key attributes of raster displays
2.11 References
24
LESSON – 3: GRAPHICS SOFTWARE STANDARDS
CONTENTS
The aim of this lesson is to learn the concept of graphics software standards.
The objectives of this lesson are to make the student aware of the following concepts
a) Graphics Kernel System
b) PHGIS
c) OpenGL
3.2 Introduction
CGI - the computer graphics interface - which is the low-level interface between
GKS and the hardware.
CGM - the computer graphics metafile - which is defined as the means of
communicating between different software packages.
3D-GKS - the three-dimensional extension of GKS.
PHIGS - the Programmers Hierarchical Interactive Graphics System - another
three-dimensional standard (based on the old SIGGRAPH core).
25
3.3 Graphical Kernel System
1. Polyline. Draws one or more straight lines through the coordinates supplied.
2. Polymarker. Draws a symbol at each of the coordinates supplied. The software
allows the choice of one of the five symmetric symbols, namely: x + * 0
3. Text. This allows a text string to be output in a number of ways, starting at the
coordinate given.
4. Fill-area. This allows a polygon to be drawn and filled, using the coordinates
given. Possible types of fill include hollow, solid and a variety of hatching and
patterns.
5. Cell-array. This allows a pattern to be defined and output in the rectangle
defined by the coordinates given. This is discussed in the section "Patterns &
Pictures".
6. Generalised Drawing Primitive (GDP). This allows the provision of a variety of
other facilities. Most systems include software for arcs of circles or ellipses and
the drawing of a smooth curve through a set of points (I have called this
"polysmooth" elsewhere in this text).
3.4 PHIGS
26
applications with complicated hierarchical data structures, for example: Mechanical
CAD, Molecular Modelling, Simulation and Process Control.
3.5 OpenGL
27
3.8 Points for Discussion
3.10 References
1 Chapter 1 of William M. Newman, Robert F. Sproull, “Principles of Interactive
Computer Graphics”, Tata-McGraw Hill, 2000
2 Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”,
Pearson Education, 2007
3 Chapter 1, 2, 17 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
4 Chapter 1, 2, 4, 7 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
28
LESSON – 4: GRAPHICS INPUT DEVICES
CONTENTS
4.1 Aims and Objectives
4.2 Introduction
4.3 Keyboard
4.4 Mouse
4.5 Data gloves
4.6 Graphics Tablets
4.7 Scanner
4.8 Joy Stick
4.9 Light Pen
4.10 Let us Sum Up
4.11 Lesson-end Activities
4.12 Points for Discussion
4.13 Model answers to “Check your Progress”
4.14 References
29
4.1 Aims and Objectives
The aim of this lesson is to learn the concept of some of important input devices
needed for computer graphics
The objectives of this lesson are to make the student aware of some of the important
input devices.
4.2 Introduction
In the following subsection we will learn about the following input devices
a) Keyboard
b) Mouse
c) Data gloves
d) Graphics Tablet
e) Scanner
f) Joystick
g) Light Pen
4.3 Keyboard
Keyboard keys
Roughly 50% of all keyboard keys produce letters, numbers or signs (characters).
Other keys can produce actions when pressed, and other actions are available by the
simultaneous pressing of more than one action key.
30
http://en.wikipedia.org/wiki/Image:Foldable_keyboard.jpg
Multimedia keyboard
http://en.wikipedia.org/wiki/Image:Foldable_keyboard.jpgA foldable keyboard
4.4 Mouse
The name mouse, coined at the Stanford Research Institute, derives from the
resemblance of early models (which had a cord attached to the rear part of the device,
suggesting the idea of a tail) to the common eponymous rodent.
31
A contemporary computer mouse The first computer mouse, held by
inventor Douglas Engelbart
http://en.wikipedia.org/wiki/Image:Mouse-mechanism-cutaway.png
4.5 Data gloves
http://en.wikipedia.org/wiki/Image:Mouse-mechanism-cutaway.png
A glove equipped with sensors that sense the movements of the hand and
interfaces those movements with a computer. Data gloves are commonly used in virtual
reality environments where the user sees an image of the data glove and can manipulate
the movements of the virtual environment using the glove
4.7 Scanner
32
A scanner is a device that analyzes images, printed text, or handwriting, or an
object (such as an ornament) and converts it to a digital image. Most scanners today are
variations of the desktop (or flatbed) scanner. The flatbed scanner is the most common in
offices. Hand-held scanners, where the device is moved by hand, were briefly popular
but are now not used due to the difficulty of obtaining a high-quality image. Both these
types of scanners use charge-coupled device (CCD) or Contact Image Sensor (CIS) as the
image sensor, whereas older drum scanners use a photomultiplier tube as the image
sensor.
Another category of scanner are digital camera scanners which are based on the
concept of reprographic cameras. Due to the increasing resolution and new features such
as anti-shake, digital cameras become an attractive alternative to regular scanners. While
still containing disadvantages compared to traditional scanners, digital cameras offer
unmatched advantages in speed and portability.
Desktop scanner, with the lid raised Scan of the jade rhinoceros
4.8 Joystick
Joysticks are often used to control video games, and usually have one or more
push-buttons whose state can also be read by the computer. The term joystick has become
a synonym for game controllers that can be connected to the computer since the computer
defines the input as a "joystick input".
33
Apart from controlling games, joysticks are also used for controlling machines
such as aircraft, cranes, trucks, powered wheelchairs and some zero turning radius lawn
mowers. More recently miniature joysticks have been adopted as navigational devices for
smaller electronic equipment such as mobile phones.
There has a been a recent and very significant drop in joystick popularity in the
gaming industry. This is primarily due to the shrinkage of the flight simulator genre, and
the almost complete disappearance of space-based simulators.
Joysticks can be used within first-person shooter games, but are significantly less
accurate than a mouse-keyboard. This is one of the fundamental reasons why multiplayer
console games are not compatible with PC versions of the same game. A handful of
recent games, including Halo 2 and Shadowrun, have allowed console-PC matchings, but
have significantly handicapped PC users by requiring them to use the auto-aim feature.
http://en.wikipedia.org/wiki/Image:Joyopis.svg
http://en.wikipedia.org/wiki/Image:Joyopis.svg
http://en.wikipedia.org/wiki/Image:Joyopis.svg
Joystick elements: 1. Stick 2. Base 3. Trigger 4. Extra buttons 5. Autofire switch 6.
Throttle 7. Hat Switch (POV Hat) 8. Suction Cup
A light pen is a computer input device in the form of a light-sensitive wand used
in conjunction with the computer's CRT monitor. It allows the user to point to displayed
objects, or draw on the screen, in a similar way to a touch screen but with greater
positional accuracy. A light pen can work with any CRT-based monitor, but not with
LCD screens, projectors and other display devices.
A light pen is fairly simple to implement. The light pen works by sensing the
sudden small change in brightness of a point on the screen when the electron gun
refreshes that spot. By noting exactly where the scanning has reached at that moment, the
X,Y position of the pen can be resolved. This is usually achieved by the light pen causing
an interrupt, at which point the scan position can be read from a special register, or
computed from a counter or timer. The pen position is updated on every refresh of the
screen.
34
The light pen became moderately popular during the early 1980s. It was notable
for its use in the Fairlight CMI, and the BBC Micro. Even some consumer products were
given Light pens. For example, the Toshiba DX-900 VHS HiFi/PCM Digital VCR came
with one. However, due to the fact that the user was required to hold his or her arm in
front of the screen for long periods of time, the light pen fell out of use as a general
purpose input device.
The first light pen was used around 1957 on the Lincoln TX-0 computer at the
MIT Lincoln Laboratory. Contestants on the game show Jeopardy! use a light pen to
write down their answers and wagers for the Final Jeopardy! round. Light pens are used
country-wide in Belgium for voting.
In this lesson we have learnt about various input devices needed for computer
graphics
4.14 References
1 Chapter 11 of William M. Newman, Robert F. Sproull, “Principles of Interactive
Computer Graphics”, Tata-McGraw Hill, 2000
35
2 Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”,
Pearson Education, 2007
3 Chapter 2 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
4 Chapter 8 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics –
principles and practice”, Addison-Wesley, 1997
36
LESSON – 5: OUTPUT PRIMITIVES
CONTENTS
37
5.1 Aims and Objectives
The objectives of this lesson are to make the student aware of the following concepts
a) points and lines
b) Rasterization
c) DDA and Bresenham’s algorithm
d) Properties of circle and ellipse
e) Pixel addressing
5.2 Introduction
The basic elements constituting a graphic are called output primitives. Each
output primitive has an associated set of attributes, such as line width and line color for
lines. The programming technique is to set values for the output primitives and then call a
basic function that will draw the desired primitive using the current settings for the
attributes. Various graphics systems have different graphics primitives. For example
GKS defines five output primitives namely, polyline (for drawing contiguous line
segments), polymarker (for marking coordinate positions with various symmetric text
symbols), text (for plotting text at various angles and sizes), fill area (for plotting
polygonal areas with solid or hatch fill), cell array (for plotting portable raster images).
At the same time GRPH1 has the output primitives namely Polyline, Polymarker, Text,
Tone and have other secondary primitives besides these namely, Line and Arrow
Low-level procedure for ploting a point on the screen at (x,y) with intensity “I”
can be given as
setPixel(x,y,I)
A line is drawn by calculating the intermediate positions between the two end
points and displaying the pixels at those positions.
5.4 Rasterization
38
geometric objects such as circles where you are given a center and radius. In these lesson
I will cover:
The digital differential analyzer (DDA) which introduces the basic concepts for
rasterization.
Bresenham's algorithm which improves on the DDA.
In this algorithm, the line is sampled at unit intervals in one coordinate and find
the corresponding values nearest to the path for the other coordinate. For a line with
positive slope less than one, x > y (where x = x2-x1 and y = y2-y1). Hence we
sample at unit x intervals and compute each successive y values as
y k 1 y k m
(xi , yi)
(xi , Round(yi))
For lines with positive slope greater than one, y > x. Hence we sample at unit
y intervals and compute each successive x values as
1
x k 1 x k
m
Since the slope, m, can be any real number, the calculated value must be rounded
to the nearest integer.
39
For a line with negative slope, if the absolute value of the slope is less than one,
we make unit increment in the x direction and calculate y values as
y k 1 y k m
For a line with negative slope, if the absolute value of the slope is greater than
one, we make unit decrement in the y direction and calculate x values as
1
x k 1 x k
m
[Note :- for all the above four cases it is assumed that the first point is on the left and
second point is in the right.]
In this method, developed by Jack Bresenham, we look at just the center of the
pixels. We determine d1 and d2 which is the "error", i.e., the difference from the "true
line."
40
Steps in the Bresenham algorithm:
Now the y coordinate on the mathematical line at pixel position xi+1 is calculated as
y = m(xi+1) + b
d1 = y - yi = m(xi +1) + b - yi
d2 = (yi +1) - y = yi +1 -m(xi +1) - b
Then
d1 - d2 = 2m(xi +1) - 2y + 2b -1
Note: pi < 0 if yi pixel is closer, pi >= 0 if yi+1 pixel is closer. Therefore we only need to
know the sign of pi .
pi = 2 * dy * xi - 2 * dx * yi + 2 * dy + dx * (2 * b - 1) (1)
Let C = 2 * dy + dx * (2 * b - 1)
41
pi+1 - pi = 2 * dy * (xi+1 - xi) - 2 * dx * ( yi+1 - yi)
if pi < 0, plot the pixel (xi+1, yi) and next decision parameter is pi+1 = pi + 2dy
else and plot the pixel (xi+1, yi+1) and next decision parameter is pi+1 = pi + 2dy - 2dx
42
5.7 Properties of Circles
The set of points in the circumference of the circle are all at equal distance r from
the centre (xc,yc) and its relation is given be pythagorean theorem as
x xc 2 y yc 2 r 2
The points in the circumference of the circle can be calculated by unit increments
in the x direction from xc - r to xc + r and the corresponding y values can be obtained as
2
y y c r 2 xc x
The major problem here is that the spacing between the points will not be same.
It can be adjusted by interchanging x and y whenever the absolute value of the slope of
the circle is greater than 1.
The unequal spacing can be eliminated by using polar coordinates and is given by
x x c r cos
y y c r sin
The major problem in the above two methods is the computational time. The
computational time can be reduced by considering the symmetry of circles. The shape of
the circle is similar in each quadrant. Thinking one step further shows that there are
symmetry between octants too.
43
Midpoint circle algorithm
To simplify the function evaluation that takes place on each iteration of our circle-
drawing algorithm, we can use Midpoint circle algorithm
f ( x, y ) x 2 y 2 r 2
If the point is inside the circle then f(x,y)<0 and if it is outside then f(x,y)>0 and if
the point is in the circumference of the circle then f(x,y)=0.
Thus the circle function is the decision parameter in the midpoint algorithm.
Assume that we have just plotted (xk,yk), we have to decide whether to point
(xk+1, yk) or (xk+1, yk - 1) nearer to the circle. Now we consider the midpoint between
the points and define the decision parameter as
1
p k f x k 1, y k
2
2
1
x k 1 y k r 2
2
2
Similarly
1
p k 1 f x k 1 1, y k 1
2
2
1
x k 1 1 y k 1 r 2
2
2
p k 1 p k 2x k 1 y k21 y k2 y k 1 y k 1
44
The initial decision parameter is obtained by evaluating the circle function at the
starting position (0,r)
1
p 0 f 1, r
2
2
1 5
1r r2 r
2 4
An ellipse is defined as the set of points such that the sum of the distances from
two fixed positions (foci) is the same for all points. If the distances to the two foci from
any point P = (x,y) on the ellipse are labeled d1 and d2 then, the general equation of an
ellipse can be stated as
d1 + d2 = constant
Let the focal coordinates be F1 = (x1,y1) and F2 = (x2,y2). Then by substituting the value
of d1 and d2 we will get
x x1 2 y y1 2 x x2 2 y y2 2 cons tan t
Ax 2 By 2 Cxy Dx Ey F 0
45
If the major and the minor axis are aligned in the directions of x-axis and y-axis,
then the equation of ellipse can be given by
2 2
x xc y yc
1
rx ry
where rx and ry are the semi-major and semi-minor axis respectively.
x x c rx cos
y y c ry sin
The Pixel Addressing feature controls the number of pixels that are read from the
Region of interest (ROI). Pixel Addressing is controlled by two parameters – a Pixel
Addressing mode and a value. The mode of Pixel Addressing can be decimate (0),
averaging (1), binning (2) or resampling (3).
With a Pixel Addressing value of 1, the Pixel Addressing mode has no effect and
all pixels in the ROI will be returned. For Pixel Addressing values greater than 1, the
number of pixels will be reduced by the square of the value. For example, a Pixel
Addressing value of 2 will result in ¼ of the pixels.
The Pixel Addressing mode determines how the number of pixels is reduced. The
Pixel Addressing value can be considered as the size of a block of pixels made up of 2x2
groups. For example, a Pixel Addressing value of 3 will reduce a 6 x 6 block of pixels to
a 2 x 2 block – a reduction of 4/36 or 1/9.
The decimate mode will drop pixels all the pixels in the block except for the top-
left group of four. At the highest Pixel Addressing value of 6, a 12 x 12 block of pixels is
reduced to 2 x 2. At this level of reduction detail in the scene can be lost and color
artifacts introduced.
46
The averaging mode will average pixels with the similar color within the block
resulting in a 2x2 Bayer pattern. This allows details in the blocks to be detected and
reduces the effects of the color artifacts.
The binning mode will sum pixels with similar color within the block reducing
the block to a 2x2 Bayer pattern. Unlike binning with CCD sensors, this summation
occurs after the image is digitized so no increase in sensitivity will be noticed but a dark
image will appear brighter.
The resampling mode uses a different approach involving the conversion of the
Bayer pattern in the blocks to RGB pixels. With a Pixel Addressing value of 1,
resampling has no effect. With a Pixel Addressing mode of 2 or more, resampling will
convert the block of 10-bit pixels to one 30-bit RGB pixel by averaging the red, green
and blue channels. Setting the video format to YUV422 mode will result in the best
image quality while resampling. Resampling will create images with the highest quality
and the least artifacts.
Pixel Addressing will reduce the amount of data coming from the camera.
However, only the Decimate mode will permit an increase in the frame rate. Averaging,
binning and resampling modes will have the same frame rate as if the Pixel Addressing
value was 1 (no decimation.). Pixel Addressing works in the same fashion with color or
monochrome sensors. For example the pixel addressing a camera and is parameters are
shown in the following tables.
Controls
Camera Auto Manual One-time Auto Off CiD
All cameras No Yes No Yes Yes
Parameters
47
5.10 Let us Sum Up
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Discuss about midpoint circle algorithm
b) Discuss about advantages of Bresenhams algorithm over DDA
5.14 References
48
UNIT – II
LESSON – 6: TWO DIMENSIONAL TRANSFORMATION
CONTENTS
49
6.1 Aim and Objectives
The aim of this lesson is to learn the concept of two dimensional transformations.
The objectives of this lesson are to make the student aware of the following concepts
a) translation
b) rotation
c) scaling
d) shear
e) Homogenous coordinates systems
6.2 Introduction
translations
scaling
rotation
shearing
For example the three points and three edges of the triangle given here are
p1=(1,0), p2=(1.5,2), p3=(2,0), e1={p1,p2}, e2={p2,p3}, and e3={p3,p1}.
50
We can also write points in vector/matrix notation as
6.4 Translation
Assume you are given a point at (x,y)=(2,1). Where will the point be if you move
it 3 units to the right and 1 unit up?
How was this obtained? This is obtained by (x',y') = (x+3,y+1). That is, to move a point
by some amount dx to the right and dy up, you must add dx to the x-coordinate and add
dy to the y-coordinate.
For example to move the green triangle, represented by 3 points given below, to the red
triangle we need dx = 3 and dy = -5.
51
x t x x t x
q pt
y t y y t y
6.5 Scaling
Suppose we want to double the size of a 2-D object. What do we mean by double?
Double in size, width only, height only, along some line only? When we talk about
scaling we usually mean some amount of scaling along each dimension. That is, we must
specify how much to change the size along each dimension. Below we see a triangle and
a house that have been doubled in both width and height (note, the area is more than
doubled).
52
The scaling for the x dimension does not have to be the same as the y dimension. If these
are different, then the object is distorted. What is the scaling in each dimension of the
pictures below?
53
And if we double the size, where is the resulting object? In the pictures above, the
scaled object is always shifted to the right. This is because it is scaled with respect to the
origin. That is, the point at the origin is left fixed. Thus scaling by more than 1 moves the
object away from the origin and scaling of less than 1 moves the object toward the origin.
This is because of how basic scaling is done. The above objects have been scaled
simply by multiplying each of its points by the appropriate scaling factor. For example,
the point p = (1.5, 2) has been scaled by 2 along x and 0.5 along y. Thus, the new point is
Scaling transformations are represented by matrices. For example, the above scaling of 2
and 0.5 is represented as a matrix:
s x 0 2 0
scale matrix : s
0 sy 0 5
s x 0 x x sx
new po int : q
0 s y y y sy
What do we do if we want to scale the objects about their center as show below?
54
Let the fixed point (xf, yf) be the center of the object, then the equation for scaling with
respect to (xf, yf) is given by
x x x f s x x f
y y y f s y y f
6.6 Rotation
Consider rotation of a point (x,y) with respect to origin in the anti clock wise
direction. Let x , y be the new point after rotation and let the angular displacement (ie.
Angle of rotation) be as shown in figure.
Let be the distance of the points from the origin. And let be the angle
between x-axis and the line joining the point (x,y) to the origin.
Substituting (b) in (a), we the get equation for rotating a point with respect to origin as
follows
55
x x cos y sin
y x sin y cos
x cos sin x
y sin cos y
Now suppose we want to rotate an object with respect to some fixed point (xf,yf)
as shown in the following figure. Then what will be the equation for rotation for a point
with respect to the fixed point (xf,yf).
The equation for rotation of a point with respect to a fixed point (xf,yf) can be given as
x x f x x f cos y y f sin
y y f x x f sin y y f cos
56
6.7 Shear
A transformation that distorts the shape of an object such that the transformed
shape appears as if the object were composed of internal layers that had been caused to
slide over each other is called a shear.
x x sh x y
y y
x x
y y sh y x
x 1 sh x x
y 0 1 y
x 1 0 x
y sh 1 y
y
We saw that the basic scaling and rotating transformations are always with respect
to the origin. To scale or rotate about a particular point (the fixed point) we must first
translate the object so that the fixed point is at the origin. We then perform the scaling or
rotation and then the inverse of the original translation to move the fixed point back to its
original position. For example, if we want to scale the triangle by 2 in each direction
57
about the point fp = (1.5,1), we first translate all the points of the triangle by T = (-1.5,1),
scale by 2 (S) , and then translate back by -T=(1.5,1). Mathematically this looks like
x 2 0 x1 1.5 1.5
q 2 y 1 1
y
2 0 2 1
Order Matters!
Notice the order in which these transformations are performed. The first
(rightmost) transformation is T and the last (leftmost) is -T. If you apply these
transformations in a different order then you will get very different results. For example,
what happens when you first apply T followed by -T followed by S? Here T and -T
cancel each other out and you are simply left with S
Sometimes (but be careful) order does not matter, For example, if you apply multiple 2D
rotations, order makes no difference:
R1 R2 = R2 R1
x dx
q
y dy
is now written as
58
x 1 0 dx x
q y T p 0 1 dy y
1 0 0 1 1
Now, we can write the scaling about a fixed point as simply a matrix multiplication:
q = (-T) S T p = A p,
where A = (-T) S T
The matrix A can be calculated once and then applied to all the points in the
object. This is much more efficient than our previous representation. It is also easier to
identify the transformations and their order when everything is in the form of matrix
multiplication.
s x 0 0
S 0 sy 0
0 0 1
cos sin 0
R sin cos 0
0 0 1
59
6.12 Points for Discussion
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Define two dimensional translation
b) Discuss about the rotation with respect to a fixed point
6.14 References
60
LESSON – 7: TWO DIMENSIONAL VIEWING AND LINE
CLIPPING
CONTENTS
7.1 Aim and Objectives
7.2 Introduction
7.3 Line Clipping
7.3.1 Clipping Individual Points
7.3.2 Simultaneous Equations
7.3.3 Cohen-Sutherland Line Clipping
7.3.4 Liang-Barsky Line Clipping
7.4 Viewing
7.4.1 Window
7.4.2 Viewport
7.4.3 Window To Viewport Transformation
7.4.4 Viewport to Physical Device Transformation
7.5 Let us Sum Up
7.6 Lesson-end Activities
7.7 Points for Discussion
7.8 Model answers to “Check your Progress”
7.9 References
61
7.1 Aims and Objectives
The aim of this lesson is to learn the concept of two dimensional viewing and line
clipping.
The objectives of this lesson are to make the student aware of the following
concepts
a) Window
b) Viewport
c) Window to Viewport Transformation
d) And lineclipping
7.2 Introduction
Clipping refers to the removal of part of a scene. Internal clipping removes parts of a
picture outside a given region; external clipping removes parts inside a region. We'll
explore internal clipping, but external clipping can almost always be accomplished as a
by-product.
There is also the question of what primitive types can we clip? We will consider line
clipping and polygon clipping. A line clipping algorithms takes as input two endpoints of
line segment and returns one (or more) line segments. A polygon clipper takes as input
the vertices of a polygon and returns one (or more) polygons.
This section treats clipping of lines against rectangles. Although there are
specialized algorithms for rectangle and polygon clipping, it is important to note that
other graphic primitives can be clipped by repeated application of the line clipper.
62
7.3.1 Clipping Individual Points
Before we discuss clipping lines, let's look at the simpler problem of clipping
individual points.
If the x coordinate boundaries of the clipping rectangle are Xmin and Xmax, and
the y coordinate boundaries are Ymin and Ymax, then the following inequalities must be
satisfied for a point at (X,Y) to be inside the clipping rectangle:
If any of the four inequalities does not hold, the point is outside the clipping rectangle.
To clip a line, we need to consider only its endpoints, not its infinitely many
interior points. If both endpoints of a line lie inside the clip rectangle, the entire line lies
inside the clip rectangle and can be trivially accepted. If one endpoint lies inside and one
outside, the line intersects the clip rectangle and we must compute the intersection point.
If both endpoints are outside the clip rectangle, the line may or may not intersect with the
clip rectangle, and we need to perform further calculations to determine whether there are
any intersections.
See figure 1 below. These 9 regions can be uniquely identified using a 4 bit code, often
called an outcode. We'll use the order: left, right, bottom, top (LRBT) for these four bits.
63
Left (first) bit is set to 1 when p lies to left of window
Right (second) bit is set to 1 when p lies to right of window
Bottom (third) bit is set to 1 when p lies below window
Top (fourth) bit set is set to 1 when p lies above window
The LRBT (Left, Right, Bottom, Top) order is somewhat arbitrary, but once an order is
chosen we must stick with it. Note that points on the clipping window edge are
considered inside (the bits are left at 0).
Given a line segment with end points and , here's the basic
flow of the Cohen-Sutherland algorithm:
Let's explore the indeterminate case more closely. First, one of two end-points must be
64
As an example, pretend the right bit is set so we want to compute the intersection with the
right clipping window edge, also, pretend we've already done the homogeneous divide, so
the right edge is x=1, and we need to find y. The y value of the intersection is found by
substituting x=1 into the line equation (from p0 to p1)
Liang and Barsky have created an algorithm that uses floating-point arithmetic
but finds the appropriate end points with at most four computations. This algorithm uses
the parametric equations for a line and solves four inequalities to find the range of the
parameter for which the line is in the viewport.
Let be the line which we want to study. The parametric equation of the
line segment from gives x-values and y-values for every point in terms of a parameter
that ranges from 0 to 1. The equations are
65
and
We can see that when t = 0, the point computed is P(x1,y1); and when t = 1, the point
computed is Q(x2,y2).
Algorithm
1. Set and
2. Calculate the values of tL, tR, tT, and tB (tvalues).
o if or ignore it and go to the next edge
o otherwise classify the tvalue as entering or exiting value (using inner
product to classify)
o if t is entering value set ; if t is exiting value set
3. If then draw a line from (x1 + dx*tmin, y1 + dy*tmin) to (x1 +
dx*tmax, y1 + dy*tmax)
4. If the line crosses over the window, you will see (x1 + dx*tmin, y1 + dy*tmin)
and (x1 + dx*tmax, y1 + dy*tmax) are intersection between line and edge.
The next step we consider if tvalue is entering or exiting by using inner product.
66
(Q-P) = (15+5,9-3) = (20,6)
At left edge (Q-P)nL = (20,6)(-10,0) = -200 < 0 entering so we set tmin = 1/4
At right edge (Q-P)nR = (20,6)(10,0) = 200 > 0 exiting so we set tmax = 3/4
The next step we consider if tvalue is entering or exiting by using inner product.
At top edge (Q-P)nT = (10,12)(0,10) = 120 > 0 exiting so we set tmax = 8/12
At left edge (Q-P)nL = (10,12)(-10,0) = -100 < 0 entering so we set tmin = 8/10
67
7.4 Viewing
When we define an image in some world coordinate system, to display that image
we must map the image to the physical output device. This is a two stage process. For 3
dimensional images we must first determine the 3D camera viewpoint, called the View
Reference Point (VRP) and orientation. Then we project from 3D to 2D, since our display
device is 2 dimensional. Next, we must map the 2D representation to the physical device.
We will first discuss the concept of a Window on the world (WDC), and then a Viewport
(in NDC), and finally the mapping WDC to NDC to PDC.
7.4.1 Window
We can use the window to change the apparent size and/or location of objects in
the image. Changing the window affects all of the objects in the image. These effects are
called "Zooming" and "Panning".
a) Zooming
68
Now increase the window size and the
house appears smaller, i.e., you have
zoomed out:
So we can change the apparent size of an image, in this case a house, by changing the
window size.
b) Panning
A. Set_window(-40, +20,-15,+15)
B. Set_window(-20,+40,-15,+15)
Moving all objects in the scene by changing the window is called "panning".
7.4.2 Viewport
The user may want to create images on different parts of the screen so we define a
viewport in Normalized Device Coordinates (NDC). Using NDC also allows for output
device independence. Later we will map from NDC to Physical Device Coordinates
(PDC).
69
Normalized Device Coordinates: Let the entire display
surface have coordinate values 0.0 <= x,y <= 1.0
Command: Set_viewport2(Xvmin,Xvmax,Yvmin,Yvmax)
Examples
70
7.4.3 2D Window To Viewport Transformation
The 2D viewing transformation performs the mapping from the window (WDC)
to the viewport (NDC) and to the physical output device (PDC). Usually all objects are
clipped to the window before the viewing transformation is performed.
We can see from above that to maintain relative position we must have the
following relationship:
X W X WMIN X V X VMIN
X WMAX X WMIN X VMAX X VMIN
YW YWMIN Y YVMIN
V
YWMAX YWMIN YVMAX YVMIN
X V X VMIN
X VMAX X VMIN X W X WMIN
X WMAX X WMIN
S X X W X WMIN X VMIN
YV YVMIN
YVMAX YVMIN YW YWMIN
YWMAX YWMIN
S Y YW YWMIN YVMIN
where S X
X VMAX X VMIN Y YVMIN
and S Y VMAX
X WMAX X WMIN YWMAX YWMIN
71
Note that Sx, Sy are
"scaling" factors.
Xp = Xv * Xnum
Yp = Yv * Ynum
Note: Remember the aspect ratio problem, e.g., for CGA mode 6 (640 x 200) =>
2.4 horizontal pixels = 1 vertical pixel.
Also have problem with 0, 0 being upper left rather than lower left so actual equation
used is:
Xp = Xv * Xnum
Yp = Ynum - Yv * Ynum
As a check if
72
7.5 Let us Sum Up
In this lesson we have learnt about two dimensional viewing and line clipping.
7.9 References
73
LESSON – 8: GUI AND INTERACTIVE IMPORT METHODS
CONTENTS
8.1 Aims and Objectives
8.2 Introduction
8.3 Modes of Input
8.3.1 Request Mode
8.3.2 Sample Mode
8.3.3 Event Mode
8.4 Classes of Logical Input
8.4.1 Locator
8.4.2 Pick
8.4.3 Choice
8.4.4 Valuator
8.4.5 String
8.4.6 Stroke
8.5 Software Techniques
8.5.1 Locating
8.5.2 Modular Constraints
8.5.3 Directional Constraints
8.5.4 Gravity Field Effect
8.5.5 Scales and Guidelines
8.5.6 Rubber-banding
8.5.7 Menus
8.5.8 Dragging
8.5.9 Inking-in
8.6 Let us Sum Up
8.7 Lesson-end Activities
8.8 Points for Discussion
8.9 Model answers to “Check your Progress”
8.10 References
74
8.1 Aims and Objectives
The aim of this lesson is to learn the concept of GUI and interactive methods of
graphics.
The objectives of this lesson are to make the student aware of the following concepts
a) Modes of input
b) Classes of logical input
c) And Software techniques.
8.2 Introduction
Most of the programs written today are interactive to some extent. The days when
programs were punched on cards and left in a tray to be collected and run by the
computer operators, who then returned cards and printout to the users' pigeonholes
several hours later, are now past. `Batch-processing', as this rather slow and tedious
process was called, may be a very efficient use of machine time but it is very wasteful of
programmers' time, and as the cost of hardware falls and that of personnel raises so
installations move from batch to interactive use. Interactive use generally results in a less
efficient use of the mainframe computer, but gives the programmer a much faster
response time, and so speeds up the development of software.
If you are not sharing a mainframe, but using your own microcomputer, then for
most of the time the speed is limited by the human response time, not that of the
computer. When you come to graphics programs, there are some additional modes of
graphics input in addition to the normal interactive input you have used before. For
example, GKS has three modes of interactive input and six classes of logical input. These
are described here, since they are typical of the type of reasoning required to write such
programs.
This is the mode you will find most familiar. The program issues a request for
data from a device and then waits until it has been transferred. It might do this by using a
`Read' statement to transfer characters from the keyboard, in which case the program will
pause in its execution and wait until the data has been typed on the keyboard and the
return key pressed to indicate the end of the input. The graphical input devices such as
mouse, cursor or digitizing tablet can also be programmed in this way.
75
8.3.2 Sample Mode
In this case the input device sends a constant stream of data to the computer and
the program samples these values as and when it is ready. The excess data is overwritten
and lost. A digitising tablet may be used in this way - it will continually send the latest
coordinates of its puck position to the buffer of the serial port and the program may copy
values from this port as often as needed.
This is similar to sample mode, but no data is lost. Each time the device transmits
a value, the program must respond. It may do so by placing the value in an event queue
for later processing, so that the logic of the program is very similar to sample mode, but
there may also be some data values which cause a different response. This type of
interrupt can be used to provide a very powerful facility.
8.4.1 Locator
This inputs the (x,y) coordinates of a position. It usually comes from a cursor,
controlled either by keys or by a mouse, and has to be transferred from Device
Coordinates to Normalised Device Coordinates to World Coordinates. If you have several
overlapping viewports, they must be ordered so that the one with the highest priority can
be used to calculate these transformations. Each pixel position on the screen must
correspond to a unique value in world coordinates. It need not remain the same
throughout the running of the program, since the priorities of the viewports may be
changed. At every moment there must be a single unambiguous path from cursor position
to world coordinates.
8.4.2 Pick
This allows the user to identify a particular object or segment from all those
displayed on the screen. It is usually indicated by moving the cursor until it coincides
with the required object, and then performing some other action such as pressing a mouse
button or a key on the keyboard to indicate that the required object is now identified. The
value transferred to this program is usually a segment identifier.
8.4.3 Choice
This works in a very similar manner to the pick input. You now have a limited set
of choices, as might be displayed in a menu, and some means of indicating your choice.
Only one of the limited list of choices is acceptable as input, any attempt to choose some
other segment displayed on the screen will be ignored.
76
8.4.4 Valuator
This inputs a single real number by some means, the simplest method being
typing it in from the keyboard.
8.4.5 String
This inputs a string of characters, again the simplest method is to type them in
from the keyboard.
8.4.6 Stroke
This inputs a series of pairs of (x,y) coordinates. The combination of Stroke, Input
and Sample Mode from a digitising tablet is a very fast method of input.
Most of the terminals or microcomputers you will meet will have some form of
cursor control for graphic input. You can write your programs using which ever
combination of logical input class and mode is most convenient. Alternatively, you could
ignore all forms of graphic input and merely rely on `Read' statements and data typed
from the keyboard. The choice is yours.
8.5.1 Locating
Probably you have all used software in which the cursor is moved around the
screen by means of keys or a mouse. The program may well give the impression that the
cursor and mouse are linked together so that any movement of the mouse is automatically
indicated by movement of the cursor on the screen. In fact, this effect is achieved by
means of a graphics program which has to read in the new coordinates indicated by the
mouse, delete the previous drawing of the cursor and then redraw it at the new position.
This small program runs very quickly and gives the impression of a continuous process.
Usually this software also contains a test for input from the keyboard and when a
suitable key is pressed, the current position of the cursor is recorded. This allows fast
input of a number of points to form a picture or diagram on the screen. Some means of
storing the data and terminating the program is also required.
Such points are recorded to the full accuracy of the screen, which has both
advantages and disadvantages. If you are using a digitising tablet instead of a mouse, then
the accuracy is even greater and the resulting problems even more extreme. You very
seldom want to record information to the nearest 0.1mm, usually to the nearest millimetre
is quite sufficient. Problems arise when you want to select the same point a second time.
Whatever accuracy you have chosen, you must be able to indicate the point to this
accuracy in order to reselect it, as you might need to do if you had several lines meeting
77
at a point. To achieve this more easily, software involving the use of various types of
constraint may be used to speed up the input process.
In this case, you should imagine a grid, which may be visible or invisible, placed
across the screen. Now, whenever you indicate a position with the cursor, the actual
coordinates are replaced by the coordinates of the nearest point on the grid. So to indicate
the same point a second time, you merely have to get sufficiently close to the same grid
point. Provided the grid allows enough flexibility to choose the shapes required in the
diagram, this gives much faster input.
These can be useful when you want some lines to be in a particular direction, such
as horizontal or vertical. You can write software to recalculate the coordinates so that a
line close to vertical becomes exactly vertical. You can choose whether this is done
automatically for every line within a few degrees of vertical or only applied when
requested by the user. If the constraint is applied automatically, then you can choose how
close the line must be to the required direction before it is moved and how the
recalculation is computed. You may wish to move both vertices by a small amount, or
one vertex by a larger amount, and if you are only moving one, you must specify some
rule or rules to decide which one.
The name implies that the line should be visualised as lying at the bottom of a
gravity well and points close to the line slide down on to it. As each line is added to the
diagram, a small area is defined which surrounds it. When a new point is defined which
lies inside the area, its actual coordinates are replaced by the coordinates of the nearest
point on the line.
There are two commonly used shapes for this area. In each case, along most of the
line, two parallel lines are drawn, one each side of the line and a small distance t from it.
In the one shape, each vertex at the end of the line is surrounded by a semi-circle of
radius t. In the other shape, each vertex is surrounded by a circle of radius greater than t,
giving a dumb-bell shape to the entire area. This second case expresses the philosophy
that users are much more likely to want to connect other lines to the vertices than to
points along the line.
Just as you may use a ruler when measuring distances on a piece of paper, so you
may wish to include software to calculate and display a ruler on the screen. The choice of
scales and the way in which the ruler is positioned on the screen must be decided when
the software is written.
78
8.5.6 Rubber-banding
This is name given to the technique where a line connects the previous point to
the present cursor position. This line expands or contracts like a rubber band as the cursor
is moved. To produce this effect, the lines must be deleted and re-drawn whenever the
cursor is moved.
8.5.7 Menus
Many programs display a menu of choices somewhere on the screen and allow the
user to indicate a choice of option by placing the cursor over the desired symbol.
Alternatively, the options could be numbered and the choice could be indicated by typing
a number on the keyboard. In either case, the resulting action will depend on the program.
8.5.8 Dragging
Many software packages provide a selection of commonly used shapes, and allow
the user to select a shape and use the cursor to drag a copy of the shape to any required
position in the drawing. Some packages continually delete and redraw the shape as it is
dragged, others only redraw it when the cursor halts or pauses.
8.5.9 Inking-in
Another type of software imitates the use of pen or paintbrush in leaving a track
as it is drawn across the paper. These routines allow the user to set the width and colour
of the pen and some also allow patterned `inks' in two or more colours. Then as the
cursor is moved across the screen, a large number of coordinates are recorded and the
lines joining these points are drawn as required.
All these techniques may be coded, using a combination of graphical input and
output. The success of such software depends very much on the user-interface. If it is
difficult or inconvenient to use, then as soon as something better comes along, the
previous software will be ignored. When designing your own graphical packages, you
need to have a clear idea of the purpose for which your package is designed and also the
habits and experience of the users for whom it is intended.
79
a) Explain about pointing
b) Explain about inking
8.10 References
80
UNIT – III
CONTENTS
81
9.1 Aims and Objectives
The aim of this lesson is to learn the concept of three dimensional graphics
The objectives of this lesson are to make the student aware of the following concepts
a) Description of 3D objects
b) Issues in 3D drawings
c) Projections
9.2 Introduction
In the following sections, we shall discuss the topics as though we were dealing
with idealised mathematical objects, points with position but no size and lines and planes
of zero thickness. Obviously this does not correspond to the real world where even the
thinnest plane is hundreds of atoms in thickness. However the ideas can be developed
without bothering about the effects of thickness, the need to specify whether we are
discussing the centre of the line or one of its outer edges, and these extra complications
can be considered later.
82
Right and Left Handed Axes
Most software then has some means of reducing the three-dimensional object to a
two-dimensional drawing in order to view the object. Possible means are to ignore the z-
value, thus giving a parallel or orthogonal projection onto the screen, to calculate a
perspective projection onto the screen or to take a slice through the object in the plane of
the screen. The projections may or may not have any provision for hidden-line or hidden-
surface removal.
83
Object in 3 Dimensions
Array of edges.
From. To.
1 2 4 5
2 *1 3 6
3 *2 4 7
4 *1 *3 8
84
5 *1 6 8
6 *2 *5 7
7 *3 *6 8
8 *4 *5 *7
The first row of this array indicates that vertex 1(A) is joined to vertex 2(B), to
vertex 4(D) and to vertex 5(E). The second row indicates that vertex 2(B) is joined to
vertex 1(A), which is certainly correct, but wasteful, as it implies that this line is drawn
twice. Examination of the array shows that the same is true of all other lines in the
diagram. To save time by only drawing it once, you need to draw only those cases where
the order of the vertices is increasing, that is you omit all the connections marked with a
star when drawing the object.
These two arrays are sufficient to produce a wire-frame drawing such as that
shown in the above figure. However if you wish to discuss solid faces, or use any form of
shading or texturing, you will need to move to a more complex representation such as
that for boundary representation models used in geometric modelling. In this case, the
following data structure is appropriate.
When you come to consider the cube, this is a very simple object. All the edges
are straight lines and all the faces are planes. If you choose to define the loops as the
square outline made up of 4 edges, then each face has one loop as its boundary.
Alternatively, you could have two edges to a loop and then each face would require two
loops to specify its boundary. When you come to study geometric modelling, you will
find that there are often several equally correct solutions to any given problem.
85
already been described.
Edges. There are 12 edges, all straight lines joining pairs of vertices. They
may be traversed in either direction.
Faces. All solutions will have 6 faces and in one choice of loop, each face
will be bounded by one loop. It is usual to adopt a standard convention
connecting the direction of circulation of the loops and the
directions of the outward-facing normals to the face. In this example,
each face will be a plane and will have a single direction for its
normal.
Object. This is made up of the single shell and the volume contained within it.
The points or lines closer to the viewpoint appear brighter if drawn on the screen
and are drawn with thicker lines when output to the plotter. A shaded drawing on the
screen can adjust the intensity, pixel by pixel, giving a result similar to a grey-scale
photograph.
Hidden lines may be removed or indicated with dotted lines, thus leading to an
easier understanding of the shape.
Rotation of the object, combined with hidden-line removal gives a very realistic
effect. It is probably the best representation, but can only be produced at a special-
purpose graphics workstation since it requires considerable computing power to carry out
the hidden-line calculations in real time.
86
9.4.4 Perspective Projections
If we have some means of knowing the relative size of the objects, then the fact
that the perspective transformation makes the closer objects appear larger will give a
good effect of depth. If the objects are easily recognized then knowledge about their
relative sizes (e.g. a hill is usually larger than a house or a tree) will be interpreted as
information about their distance from the viewer. It is only when we have a number of
objects, such as cubes or spheres, which are completely separate in space and we have no
information on their relative size, that the perspective transformation cannot be
interpreted by the viewer in this way.
In this case, we have two perspective projections, one for each eye. We need
some method of ensuring that each eye sees only its own view and then we can rely on
the human brain to merge the views together and let us see a three-dimensional object.
One method is to produce separate views at the correct distance and scale for use
with a stereographic viewer. This allows for black-and-white or colour drawings to be
seen in their true colour.
Each eye must see only its own view. So if the view from the left eye is drawn in
blue, and the right eye views the drawings through a blue filter then the blue lines will be
invisible to the right eye since they will blend into the white background when viewed
through a blue filter.
87
Similarly if the drawing for the right eye is in red, and the left eye has a filter of
the same colour, then the drawing for the right eye will be invisible to the left eye.
In the figure, the eyes are assumed to be distance 2e apart (usually about 3 inches)
and the plane onto which the pictures are projected is distance d from the eyes (frequently
12 to 15 inches). So for the left eye, we need to move the axes a distance e in the x-
direction and then project onto the plane, and finally shift the drawing back again. Thus
the projection for the left eye means that the point (x,y,z) becomes the point ((x+e)*d/z -
e, y*d/z, 0).
For the right eye, the axes must be moved a distance -e and then the point (x,y,z)
is projected onto the plane and becomes ((x-e)*d/z + e, y*d/z , 0)
When this has been done for all the vertices, they are joined up and the object is
drawn in the appropriate colours.
We are dealing with objects defined in three-dimensional space, but all the
graphics devices to which we have access are two-dimensional. This means that we
require some way of representing (i.e drawing) the three-dimensional objects in two-
dimensions in order to see the results. It is possible to calculate the intersection of any
plane with the object and draw a sucession of slices, but it is usually easier to understand
what is going on if we calculate the projection of the three dimensional object on to a
given plane and then draw its projection.
There are two types of projection, parallel (usually orthographic) and perspective.
We shall discuss both of these in the remainder of this chapter and also consider some
other ways of giving the impression of a three-dimensional object in two-dimensions.
One of the most important is the removal of hidden lines or surfaces and this is discussed
in another section
The simplest example of an orthographic projection occurs when you project onto
the plane z=0. You achieve this by ignoring the values of the z-coordinates and drawing
the object in terms of its x and y coordinates only.
88
Since calculating the intersections of general lines and planes is somewhat
tedious, you may instead apply transformations to the objects so that the plane onto
which you wish to project the drawing becomes the plane z=0. Then your final
transformation into two-dimensions is obtained by discarding the z-coordinates. A simple
example of an orthographic projection is shown in the figure below.
Orthographic Projection
89
Example of an Isometric Drawing of a Surface
1 0 0 0 1 0 0 0 cosA 0 sinA 0
T= 0 1 0 0 X 0 cosB -sinB 0 X 0 1 0 0
0 0 0 0 0 sinB cosB 0 -sinA 0 cosA 0
0 0 0 1 0 0 0 1 0 0 0 1
cosA 0 sinA 0
T= sinBsinA cosB -sinBcosA 0
0 0 0 0
0 0 0 1
These equations can be re-arranged and solved for A and B. Eventually they give:
90
B = 35.26439 degrees, since sinB = 1/3 and A = 45 degrees, since cosA = 1/SQRT(2)
Perspective projections are often preferred because they make the more distant
objects appear smaller than those closer to the viewpoint. They involve more calculation
than parallel projections, but are often preferred for their greater realism. Note that a
parallel projection may be considered as a special case of the perspective projection
where the viewpoint is at infinity.
Perspective Projection
The above figure shows an example of the perspective projection from the point E
at (0,0,-d) to the z=0 plane.
The projection is obtained by joining E to each vertex in turn and finding the
intersection of this line with the plane z=0. The vertices are then joined by straight lines
to give the wire-frame drawing of the object in the plane.
91
This method of drawing the object, makes use of some of the well-known
properties of perspective projections, namely that straight lines are projected into straight
lines and facets ( A facet is a closed sequence of co-planar line segments, a polygon in
other words) are projected into facets. Parallel sets of lines may be projected into a set of
parallel lines or into a set of lines meeting at the `vanishing point'.
We may consider the equation of the projection either as the result of the
transformation matrix or derive it from the following diagram
Consider the diagram first. This shows the y=0 plane with the Ox and Oz axes.
The point of projection is E at the point (0,0,-d) on the z-axis, so the distance OE is of
length d. The point P (with values x and -z) projects into the point P' while the point Q
(with values X and Z) projects into the point Q'.
From the first set of similar triangles, we can see that d/x' = (d-z)/x and so x'=d*x/(d-z)
From the second set of similar triangles, we can see that d/X'=(d+Z)/x and so
X'=d*X/(d+Z)
Thus if we are careful to take the correct sign for z in each case, we can quote the general
rule:
x' = d*x/(d+z)
and we have a similar position for the y-coordinate when looking at the x=0 plane.
92
X=x Y=y Z=0 H = (z+d)/d
To get back to the homogeneous coordinates, we need to make H=1 and so we have to
divide throughout by (z+d)/d. This gives:
The closer the point of projection, E, is to the object, the more widely divergent are the
lines from E to the vertices and the greater the change in size of the projected object.
Conversely, the further away we move E, the closer the lines get to a parallel set and the
smaller the change in size of the object. Thus we may think of the parallel projection as
being an extreme case of perspective when the point of projection E is an infinite distance
from both the object and the plane.
The one set of parallel lines forming edges of the cube meet at the vanishing point, while
the other sets meet at infinity (i.e. they remain parallel). The transformation matrix for
this projection may be written in the form given below, where r = 1/d.
1 0 0 0
T1 = 0 1 0 0
0 0 0 0
0 0 r 1
93
When we come to deal with two or three point perspectives, then we have two or
three sets of parallel lines meeting at their respective vanishing points. The matrices for
these are given below:
1 0 0 0 1 0 0 0
T2 = 0 1 0 0 T3 = 0 1 0 0
0 0 0 0 0 0 0 0
0 q r 1 p q r 1
We now have enough information to specify the form of a general transformation matrix.
This divides into four areas, each of which relates to a different form of transformation.
T1 | T3
T= shear, scale & | shift
rotate | .
T2 usually zero | T4=1
T2 is zero for all affine transformations and when we are dealing with perspective
projections, the number of non-zero elements in T2 will tell us whether it is a one-, two-
or three-point perspective.
94
9.6 Let us Sum Up
In this lesson we have learnt about three dimensional concepts, object representation
and projections
9.10 References
1. Chapter 20, 21, 22 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 8 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 9, 10, 11, 12 of Donald Hearn, M. Pauline Baker, “Computer
Graphics – C Version”, Pearson Education, 2007
4. Chapter 7 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 6 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
6. Computer Graphics by Susan Laflin. August 1999.
95
LESSON – 10: POLYGONS, CURVED LINES AND
CONTENTS
10.1 Aims and Objective
10.2 Introduction
10.2.1 Intersection Test
10.2.2 Angle Test
10.3 Linear Algorithm for Polygon Shading
10.4 Floodfill Algorithm for Polygon Shading
10.5 Polygon in detail
10.6 Plane Equations
10.7 Polygon meshes
10.8 Curved lines and surfaces
10.9 Let us Sum Up
10.10 Lesson-end Activities
10.11 Points for Discussion
10.12 Model answers to “Check your Progress”
10.13 References
96
LESSON 10 POLYGONS, CURVED LINES AND SURFACES
The aim of this lesson is to learn the concept of polygons, curved lines and surfaces.
The objectives of this lesson are to make the student aware of the following concepts
10.2 Introduction
The word polygon is a combination of two Greek words: "poly" means many and
"gon" means angle. Along with its angles, a polygon also has sides and vertices. "Tri"
means "three," so the simplest polygon is called the triangle, because it has three angles.
It also has three sides and three vertices. A triangle is always coplanar, which is
not true of many of the other polygons.
A regular polygon is a polygon with all angles and all sides congruent, or equal.
Here are some regular polygons.
We can use a formula to find the sum of the interior angles of any polygon. In this
formula, the letter n stands for the number of sides, or angles, that the polygon has.
97
Let's use the formula to find the sum of the interior angles of a triangle.
Substitute 3 for n. We find that the sum is 180 degrees. This is an important fact
to remember.
To find the sum of the interior angles of a quadrilateral, we can use the formula
again. This time, substitute 4 for n. We find that the sum of the interior angles of a
quadrilateral is 360 degrees.
Polygons can be separated into triangles by drawing all the diagonals that can be
drawn from one single vertex. Let's try it with the quadrilateral shown here. From vertex
A, we can draw only one diagonal, to vertex D. A quadrilateral can therefore be separated
into two triangles.
If you look back at the formula, you'll see that n – 2 gives the number of triangles
in the polygon, and that number is multiplied by 180, the sum of the measures of all the
interior angles in a triangle. Do you see where the "n – 2" comes from? It gives us the
number of triangles in the polygon. How many triangles do you think a 5-sided polygon
will have?
Here's a pentagon, a 5-sided polygon. From vertex A we can draw two diagonals
which separates the pentagon into three triangles. We multiply 3 times 180 degrees to
find the sum of all the interior angles of a pentagon, which is 540 degrees.
98
sum of angles = (n – 2)180°
= (5 – 2)180° = (3)180° = 540°
The GKS Fillarea function has the same parameters as polyline, but will always
produce a closed polygon. The filling of this polygon depends on the setting of the
appropriate GKS parameter. The FillAreaStyles are hollow, solid, pattern and hatch. The
hollow style produces a closed polygon with no filling. Solid fills with a solid colour.
Pattern uses whatever patterns are offered by the particular system. Hatch will fill it with
lines in one or two directions. Algorithms for hatching and cross-hatching are described
in this section.
Consider the situation illustrated in the figure below. To determine which of the
points Pi, (i=1,2 or 3) lie inside the polygon, it is necessary to draw a line from Pi in
some direction and project it beyond the area covered by the polygon. If the number of
intersections of the line from Pi with the sides of the polygon is an even number, then the
point lies outside the polygon. (Note that the line must start at Pi and may not extend
back behind Pi - it is an half-infinite line from Pi). This means that the triangle to the
right of the line Q4 Q5 in the figure counts as outside the polygon. If you wished to make
it doubly inside, you would have to introduce a parameter equal to the minimum number
of intersections of all half-infinite lines through Pi.
Intersection Test
It is then easy to see that the figure gives values of 0, 2 or 4 for lines through P1
with a minimum of 0. For P2, there are values of 1 or 3 with the minimum=1. All lines
from P3 have a value 2.
99
One possible problem arises when lines pass through a vertex. The line P2 Q5
must count the vertex Q5 as two intersections while P2 Q2 must only count Q2 once. The
easy way to avoid this problem is to omit all lines which pass through vertices. This still
leaves plenty of lines to test the position.
Here the point Pi is joined to all the vertices and the sum of the angles Qk Pi
Q(k+1) is calculated. If counter-clockwise angles are positive and clockwise ones are
negative, then for a point Pi outside the polygon there will be some positive angles and
some negative and the resulting sum will be zero.
For a point Pi inside the polygon, the points will be either all positive or all
negative and the sum will have a magnitude of 360 degrees. The next figure illustrates
this for the same polygon as the previous figure and a point P inside the triangle (but
outside the polygon).
Angle Test
Here the sum of angles is 2 * 360 degrees, thus implying that it is doubly inside
the polygon. To give consistency with the Intersection Test, this test must be carefully
worded. Having evaluated the sum of angles, it will be n * 360 degrees. If n is an even
number, then the point P lies outside the polygon, while if n is an odd number, then P lies
inside the polygon.
100
10.3 Linear Algorithm for Polygon Shading
Hatching a Triangle
This involves shading the polygon by drawing a series of parallel lines throughout
the interior. The lines may be close enough to touch, giving a solid fill or they may be a
noticeable distance apart, giving a hatched fill. If you have two sets of hatched lines at
right angles to each other, this gives a "cross-hatched" result. The figure shows a triangle
in the process of being hatch-filled with horizontal lines. For each horizontal scan-line,
the following process must be applied.
1. Assume (or check) that the edge of the screen is outside the polygon.
2. Calculate the intersections Pi of this horizontal line with each edge of the polygon and
store the coordinates of these intersections in an array.
3. Sort them into increasing order of one coordinate.
4. Draw the segments of the hatch-line from P1 to P2, P3 to P4 and so on. Do not draw
the intervening segments.
5. Repeat this process for each scan-line.
101
Problems at Vertices
(Note that this does not rely on the lines being horizontal, although scan-lines
parallel to one of the axes makes the calculation of the intersection points very much
easier). The figure shows one problem with this approach. The scan-line s1 will work
correctly since it has four distinct intersections, but the scan-line s2 has two coincident
intersection points at the vertex Q6. This is detectable since the number of intersection
points will be an odd number.
Looking at the vertices, you can see that moving from Q5 to Q6, y decreases as x
decreases and from Q6 to Q1, y increases as x decreases. In this case, the hatch lines have
the equation y=constant and so this reversal in the direction of y indicates a vertex which
must be included twice and consequently known as a Type 2 vertex. Q1 on the other hand
is a Type 1 vertex since y continues to increase when going from Q6 through Q1 to Q2. If
the shading uses vertical lines (x = constant) then it is necessary to study the behaviour of
x to determine the types of vertex.
If you have an odd number of intersections and only one of them coincides with a
vertex, then it is usually safe to assume that this value needs to be included twice. This
may save some time in your algorithm, and will shade most polygons successfully. The
full method, testing the type of vertex whenever a vertex is included in the intersection
list, will successfully shade even the few cases when two Type Two vertices appear in the
intersection list thus giving an even number of points and at least one segment incorrectly
drawn.
The other problem case occurs when one of the sides of the polygon is parallel to
the direction of shading. Mathematically this has an infinite number of intersection
points, but computationally only the two end points should be entered in the array so that
the whole line is shaded as part of the interior.
This works in terms of pixels and is applied after the lines forming the boundary
have been converted into pixels by a DDA algorithm. The background of the screen has
102
one pixel-value, called "old-value" and the points forming the boundary have another,
called "edge-value". The aim of the algorithm is to change all interior pixels from "old-
value" to "new-value". (e.g. from black to red) Assume the following software is
available:
a) A function Read-Pixel(x,y) which takes device coordinates (x,y) and returns the value
of the pixel at this position.
b) A routine Write-Pixel(x,y,p) which sets the new value p to the pixel at the position
(x,y) in device coordinates.
Then, starting at the designated seed point, the algorithm moves out from it in all
directions, stopping when an "edge value" is found. Each pixel with value "old value" is
changed to "new-value". The recursive method stops when all directions have come to an
"edge value".
Because this method is applied to pixels on the screen or in the display buffer, it
may run into problems arising from the quantization into pixels of a mathematical line
which is infinitely thin and recorded to the full accuracy of a floating-point number
within the computer.
Intersecting Lines
One such problem concerns the method of identifying an intersection of two lines.
If you calculate it mathematically, then the equation will give a correct result unless the
lines are parallel or nearly parallel. On the other hand, on some hardware it may be
quicker to check whether the two lines have any pixels in common and this can be
dangerously misleading in some cases. The previous figure shows two lines, one at an
angle of 45o and the other at an angle of 135o which cross near the centre of the diagram
without having any pixels in common. This type of problem is unlikely to affect the
Floodfill routine given above, since the scan-lines move parallel to the x and y axes and
the DDA algorithm described earlier ensures that every line has at least one pixel
illuminated on each scan-line.
103
However the next figure illustrates another possible problem. Note that in a
complex polygon with sides crossing each other, you will need one seed point in each
section of the interior to floodfill the whole area. This also occurs in a polygon as shown
in below, even though it does not have any of its sides, indicated by the lines in the
figure, crossing each other.
Quantisation of Pixels
Mathematically it is all one contiguous area and any tests on the equations of the
sides for intersections will confirm this. However two of the lines are nearly parallel and
very close together and consequently although both lines are quite separate and distinct in
their mathematical equations, they lead to the same row of pixels after quantisation. The
scale of this figure has been enlarged so that the quantisation into pixels appears very
coarse in order to emphasize this problem. This polygon will require two seed points in
order to shade it completely using the Floodfill algorithm. A little thought will allow you
to produce many other similar examples and these can readily be studied by drawing the
polygons on squared paper and then marking in the pixel patterns which result.
This approach remains of interest in spite of its problems because some terminals
provide a very fast hardware polygon fill from a given seed point. Similarly, some
microcomputers provide a function to fill the interior of a triangle. To use this facility,
you must first split the polygon into triangles and while this is easy for a convex polygon
(one whose internal angles are all less than 180o) it is very much more difficult for the
general case where you may have sides crossing each other and holes inside the polygon.
What is a Polygon?
A closed plane figure made up of several line segments that are joined together. The sides
do not cross each other. Exactly two sides meet at every vertex.
104
Types of Polygons
Regular - all angles are equal and all sides are the same length. Regular polygons are
both equiangular and equilateral.
Concave - you can draw at least one straight line through a concave
polygon that crosses more than two sides. At least one interior angle is
more than 180°.
Polygon Formulas
Polygon Parts
105
Exterior Angle - Angle formed by two adjacent
sides outside the polygon.
Special Polygons
http://www.math.com/school/subject3/lessons/S3U2L2GL.htmlPolygon Names
Generally accepted names
Sides Name
n N-gon
3 Triangle
4 Quadrilateral
5 Pentagon
6 Hexagon
7 Heptagon
8 Octagon
10 Decagon
12 Dodecagon
Sides Name
9 Nonagon, Enneagon
11 Undecagon, Hendecagon
13 Tridecagon, Triskaidecagon
14 Tetradecagon, Tetrakaidecagon
15 Pentadecagon, Pentakaidecagon
16 Hexadecagon, Hexakaidecagon
17 Heptadecagon, Heptakaidecagon
18 Octadecagon, Octakaidecagon
19 Enneadecagon, Enneakaidecagon
20 Icosagon
30 Triacontagon
106
40 Tetracontagon
50 Pentacontagon
60 Hexacontagon
70 Heptacontagon
80 Octacontagon
90 Enneacontagon
100 Hectogon, Hecatontagon
1,000 Chiliagon
10,000 Myriagon
Examples:
46 sided polygon - Tetracontakaihexagon
However, many people use the form n-gon, as in 46-gon, or 28-gon instead of these
names.
This is another useful way to describe planes. It is known as the cartesian form
of the equation of a plane because it is in terms of the cartesian coordinates x, y and z.
The working below follows on from the pages in this section on finding vector equations
of planes and equations of planes using normal vectors.
107
To get this nice result, we need to work with the unit normal vector. This is the vector
of unit length which is normal to the surface of the plane. (There are two choices here,
depending on which direction you choose, but one is just minus the other).
I'll call this unit normal vector n.
Next we see how using n will give us D, the perpendicular distance from the origin to the
plane. In the picture below, P is any point in the plane. It has position vector r from the
origin O.
Now we work out the dot product of r and n. This gives us r.n = |r||n|cos A.
But |n| = 1 so we have r.n = |r|cos A = D. This will be true wherever P lies in the plane.
We see that n1, n2 and n3 (the components of the unit surface normal vector) give us the
A, B and C in the equation Ax + By + Cz = D.
A numerical example
I've put this in here so that you can see everything actually happening and see how it ties
back to the earlier pages in this section.
108
We'll take m, the position vector of the known point M in the plane, to be
m = 2i + 3j + 5k.
P is any point in the plane, with OP = r = xi + yj + zk.
First, we find N, a normal vector to the plane, by working out the cross product of s and t.
This gives us
(2/3i +1/3j + 2/3k).(xi + yj + zk) = (2/3i +1/3j + 2/3k).(2i + 3j + 5k)
or 2/3x + 1/3y + 2/3z = 4/3 + 3/3 + 10/3 = 17/3.
The perpendicular distance of this plane from the origin is 17/3 units.
So what would have happened if we had found the equation of the plane using the first
normal vector we found?
It is exactly the same equation as the one we found above except that it is
multiplied through by a factor of 15, and 85 gives us 15 times the perpendicular distance
of the origin from the plane.
Also, are you confident that you will get the same equation for the plane if you start out
with the position vector of a different known point in it?
109
The point L also lies in this plane. Its position vector l is given by l = 7i - 7j + 5k.
Check that working with l instead of m does give you the same equation for the plane.
Geometrically, you can see that this will be so.
L and M are both just possible positions of P, so that both n.l and n.m give the distance
D.
The point M also lies in Q and its position vector from the origin is given by
m = 2i + 4j + 7k.
Show that the perpendicular distance of the origin to this plane is 2 units and find its
equation.
This is how the working goes with letters taking the place of the numbers we have used
in the numerical example.
Now we use n.r = n.m = D to write down the equation of the plane. This gives us
110
If you have found a normal vector which is not of unit length, you will first need to scale
it down.
Simple list of vertices with a list of indices describing which vertices are linked to
form polygons; additional information can describe a list of holes
List of vertices + list of edges (pairs of indices) + list of polygons that link edges
Winged edge data structure
111
The choice of the data structure is governed by the application: it's easier to deal with
triangles than general polygons, especially in computational geometry. For optimized
algorithms it is necessary to have a fast access to topological information such as edges
or neighboring faces; this requires more complex structures such as the winged-edge
representation.
Curved surfaces are one of the most popular ways of implementing scalable
geometry. Games applying curved surfaces look fantastic. UNREAL's characters looked
smooth whether they are a hundred yards away, or coming down on top of you. QUAKE
3: ARENA screen shots show organic levels with stunning smooth, curved walls and
tubes. There are a number of benefits to using curved surfaces. Implementations can be
very fast, and the space required to store the curved surfaces is generally much smaller
than the space required to store either a number of LOD models or a very high detail
model.
The industry demands tools that can make creation and manipulation of curves
more intuitive. A Bezier curve is a good starting point, because it can be represented and
understood with a fair degree of ease. To be more specific, we choose cubic Bezier
curves and bicubic Bezier patches for the reason of simplicity.
Bezier Curves
A cubic Bezier curve is simply described by four ordered control points, p0, p1,
p2, and p3. It is easy enough to say that the curve should "bend towards" the points. It has
three general properties:
1. The curve interpolates the endpoints: we want the curve to start at p0 and end at p3.
2. The control points have local control: we'd like the curve near a control point to move
when we move the control point, but have the rest of the curve not move as much.
3. The curve stays within the convex hull of the control points. It can be culled against
quickly for visibility culling or hit testing.
A set of functions, called the Bernstein basis functions, satisfy the three general
properties of cubic Bezier curves.
112
Bezier Patches
Since a Bezier curve was a function of one variable, f(u), it's logical that a surface
would be a function of two variables, f(u,v). Following this logic, since a Bezier curve
had a one-dimentional array of control points, it makes sense that a patch would have a
two-dimensional array of control points. The phrase "bicubic" means that the surface is a
cubic function in two variables - it is cubic along u and also along v. Since a cubic Bezier
curve has a 1x4 array of control points, bicubic Beizer patch has a 4x4 array of control
points.
To extend the original Bernstein basis function into two dimension, we evaluate
the influence of all 16 control points:
The extension from Bezier curves to patches still satisfies the three properties:
113
10.12 Model answers to “Check your Progress”
In order to check your progress try to answer the following questions
a) Plane Equations
b) Polygon meshes
10.13 References
114
LESSON – 11: SURFACE DETECTION METHODS
CONTENTS
115
LESSON 11 SURFACE DETECTION METHODS
The aim of this lesson is to learn the concept of surface detection methods
The objectives of this lesson are to make the student aware of the following concepts
a) classification of surface detection algorithms
b) Back face detection and
c) Depth buffer algorithms
11.2 Introduction
Since the amount of data needed to store the position of every point on the surface
of even quite a small object is impossibly large, we have to make some simplifying
assumptions. The choice of these simplifications will decide the form of data structure
used to store the objects and will also restrict the choice of hidden-surface algorithm
available. A typical set of simplifying assumptions might be those given below.
a) Divide the surface of the object into a number of faces surrounded by "boundary
curves" or "contours". The contours may be any closed curves and the faces may be
curved, so some means of specifying the equations of the surfaces is needed.
b) Restrict the description to allow only flat or planar faces. The contours must now be
closed polygons in the plane. (Since two planes must intersect in a straight line, an object
without any holes must have its edge curves made up of straight lines.)
d) Subdivide the polygons until the object is described in terms of triangular facets.
At each simplification, the amount of data needed to describe one face is reduced.
This should also reduce the time taken for the related calculations. However some objects
require many more faces to give an acceptable approximation to the object. A simple
116
example of an object which requires very many triangular facets to give an acceptable
approximation is a sphere.
In fact this may be an over simplification, since we frequently find that the
transition functions are used in combination with each other. For example, to decide
whether one point in 3D space is hidden by another, it will usually be necessary to apply
a combination of both Projective Mapping and Depth Test. Projective Mapping may be
either Perspective or Orthogonal. Consider the situation where we have a view point V on
one side of the objects to be drawn and are projecting them on to a plane on the far side
of the objects. Now assume we have rotated the entire figure so that the plane is z=0, the
viewpoint is on the z-axis (coordinates 0,0,Z), and all other z values lie between 0 and Z.
a) Perspective projection.
117
Consider any two points P1 and P2. P2 is hidden by P1 if and only if
(i) V, P1 and P2 are co-linear.
(ii) P1 is closer to V. i.e. VP1 < VP2.
Consider the test-point P1 (usually a vertex of the first facet F1) and facet F2. Connect
the viewpoint V and test-point P1 and calculate P2, the intersection of the line VP1
(continued if necessary) and the facet F2. Calculate the lengths of VP1 and VP2.
b) Orthogonal Projection.
Again the viewer is at the height V and looking down on the plane z=0, but now all the
lines are parallel. Indeed for an orthogonal projection, they are all perpendicular to the
plane z=0 and so parallel to the z-axis. So the point P2 is hidden by P1 if and only if
(i) The x and y coordinates of P1 and P2 are equal.
(ii) The z-coordinate P1(z) > P2(z).
This is equivalent to moving the point V a very large distance from the plane z=0 ("V
tends to infinity").
Consider the projection onto the plane z=0 and use the values of z to assign priorities to
the faces. In this case, we wish to compare facets F1 and F2. After projection onto the
plane z=0, F1 is projected on to S1 and F2 is projected onto S2. The intersection of S1
and S2 is called S. If S is empty, then the projections do not overlap and the priority is
irrelevant.
118
Any point (x,y,0), lying in S, corresponds to the point (x,y,z1) in F1 and the point
(x,y,z2) in F2. If z1 is greater than z2 for all these points, then "F1 has priority over F2".
However if this is true for some points in S and false for others, then the two facets
intersect each other and we cannot assign priorities. It will be necessary to calculate the
line of intersection of the two facets and split one of them along this line. If F1 is split
into F1a and F1b, then we can number them so that F1a has priority over F2 and F2 has
priority over F1b.
Intersection Function
Note that in each of these cases, it was necessary to discuss the intersection of the
projection of a vertex of one facet and the projection of the other facet. This may be dealt
with by use of the Intersection Function, which defines how to calculate the intersection
of two graphic elements. Other examples are the intersection of two lines, the intersection
of two segments (lines of fixed length) or the intersection of a line and a plane. In this
actual case, it is probably more relevant to use a Containment Test.
Containment Test
The Containment Test considers the question "Does the point P lie inside the
polygon F ?" and returns the result "true" or "false". It is usually applied after projection
into two dimensions and so the methods discussed in that section are immediately
applicable. Either the angle test or the intersection test may be used.
Visibility Tests
So far, we have discussed the special case of one or more objects defined as plane
facets and considered whether or not one of the facets obscures another. This is a very
119
slow process, especially when all of the very large number of facets have to compared
with all the others. There is one very simple consideration which will about halve the
number of facets to be considered. If we assume that the facets form the outer surfaces of
one or more solid objects, then those facets on the back of the object (relative to the
viewing position) cannot be seen and so a test to identify these will remove them from the
testing early in the process.
If a perspective projection is being used, then the "line of sight" is the line from
the viewpoint V to the point on the surface. However, if a parallel projection is being
used, then the relevant line is one parallel to the viewing direction which passes through
the point on the surface. In either case, let this direction be denoted by the vector d.
The surface normal, denoted by n, is the outward-pointing normal from this point,
normal to the surface of the plane. To decide whether the plane is potentially visible or
always invisible, it is necessary to consider the angle between d and n and we may use
the dot product d.n to decide this. Let A be the angle between these vectors. If A is
greater than 90degrees, then the surface is potentially visible, otherwise the surface is
invisible. If both vectors are scaled to have unit length, so that we are dealing with
direction cosines, then the dot product gives the value of cosA. Thus the face is
potentially visible if the dot product is negative.
In the above figure, the parallel projection has d = [1,0,0] and face A has outward
normal n1 = [-1,2,0] while face B has outward normal n2 = [1,0,0]. The dot product of
the visible face A has value -1 and the dot product of the invisible face has the value 1.
When the dot product is zero the face is "edge-on" to the viewing direction and may be
omitted.
120
Strategy Function or Overall Method.
The text by Giloi includes a classification based on the form of the input data and
provides examples of algorithms for some of these. This classification has been
simplified slightly (four classes reduced to three) and the algorithms identified. It is not
complete, other algorithms do not fall into these categories and other methods of
classifying these algorithms are also possible.
a) Class One
These include `solids' made up of plane polygonal facets. The resulting object
may be represented as a set of `contour lines' and `lines of intersection' or it may be
output as a shaded object. e.g. Appel's method, or Watkin's method, or Encarnacao's
Priority Method (requires input data as triangles).
b) Class Two
These are `surfaces' made up of curved faces. The resulting object is represented
as a net of grid lines. e.g. Encarnacao's Scan-Grid Method.
c) Class Three
Alternatively the methods may be grouped according to the type of method. This
gives the following:
a) Scan-line Methods
These include Watkin's method and a number of others. These work in terms of
scan lines with the pixel-colour at each point along the line calculated and output. If there
121
is enough storage to hold a copy of the entire screen, instead of just one line across it, we
may use a `Z-buffer' algorithm, in which the z value corresponding to each pixel is used
to decide on the colour of that pixel. The polygons may be added in any order, but the z-
value is used to decide whether a pixel should be changed or not as the next polygon is
added. Again coherence may be used to reduce the number of tests needed.
b) List-Priority Methods
This relies on the polygons for output being sorted into order, so that the polygon
furthest from the viewer is output first. It also assumes that output of a second polygon on
top of the first will overwrite it and none of the earlier output will remain. This is true on
most screens, but not on most printers or plotters. It is similar to the method used by
painters in situations where the latest coat of paint conceals the ones below.
This may also be used for the case where the output is an image of the isometric
projection of a surface. In this case, it is easy to output the patches of the surface with
those furthest from the viewpoint being output first and the later ones drawn on top.
c) Ray-tracing Methods.
These use the idea of dropping a line, or ray, from the viewpoint (or eye of the
viewer) onto parts of the objects and on to the viewing plane. Appel's method of hidden
surface removal introduces the concept of `quantitative invisibility' (counting the number
of faces between the surface being tested and the viewer) and uses coherence to reduce
the number of tests to give the correct output.
122
11.5 Back face detection
A fast and simple object-space method for identifying the back faces of a polyhedron is
based on the “inside-outside” tests. A point (x,y,z) is ‘inside’ a polygon surface with
plane parameters A, B, C and D if
Ax+By+Cz+D<0
When an inside point is along the line of sight to the surface, the polygon must be a back
face.
If V is a vector in the viewing direction from the eye position and N is the normal vector
N to a polygon surface, then the polygon is a back face if
V∙N>0
A commonly used image space approach to detecting visible surfaces is the depth
buffer method, which compares surface depths at each pixel position on the projection
plane.
A depth buffer is used to store depth values for each (x,y) position as surfaces are
processed, and the refresh buffer stores the intensity values for each position. Initially,
all positions in the depth buffer are set to 0 (minimum depth) and the refresh buffer is
inialialized to the background intensity. Each surface listed in the polygon tables is then
processed, one scan line at a time, calculating the depth (z value) at each (x,y) pixel
position. The calculated depth is compared to the value previously stored in the depth
buffer at that position. If the calculated depth is greater than the value stored in the depth
buffer, the new depth value is stored, and the surface intensity at that position is
determined and placed in the same xy location in the refresh buffer.
Algorithm
1. Initialize the depth buffer and refresh buffer so that for all buffer positions (x,y)
Depth(x,y) = 0
Referesh(x,y) = Ibackgnd
2. For each position on each polygon surface, compare depth values to previously
stored values in the depth buffer to determine visibility.
123
Depth(x,y) = z,
Refersh(x,y) = Isurf(x,y)
where Ibackgnd is the value for the background intensity, and Isurf(x,y) is the
projected intensity value for the surface at pixel position (x,y). After all surfaces
have been processed, the depth buffer contains depth values for the visible
surfaces and the refresh buffer contains the corresponding intensity values for
those surfaces.
11.11 References
1. Chapter 24 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 9 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 13 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
4. Chapter 9 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 15 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
6. Computer Graphics, Susan Laflin. August 1999.
124
UNIT – IV
CONTENTS
125
12.1 Aims and Objectives
The objectives of this lesson are to make the student aware of the following concepts
a) Introduction to multimedia
b) History and
c) applications
12.2 Introduction
126
Recent multimedia conferences, such as the IEEE International Conference on
Multimedia Computing and Systems, ACM Multimedia, and Multimedia Computing and
Networking, provide a good start for identifying the components of multimedia. The
range of multimedia activity is demonstrated in papers on multimedia authoring (i.e.,
specification of multimedia sequences), user interfaces, navigation (user choices),
effectiveness of multimedia in education, distance learning, video conferencing,
interactive television, video on demand, virtual reality, digital libraries, indexing and
retrieval, and support of collaborative work. The wide range of technologies is evident in
papers on disk scheduling, capacity planning, resource management, optimization,
networking, switched Ethernet LANs, Asynchronous Transfer Mode (ATM) networking,
quality of service in networks, Moving Picture Expert Group (MPEG**) encoding,
compression, caching, buffering, storage hierarchies, video servers, video file systems,
machine classification of video scenes, and Internet audio and video.
Multimedia systems need a delivery system to get the multimedia objects to the
user. Magnetic and optical disks were the first media for distribution. The Internet, as
well as the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite or
Net BIOS on isolated or campus LANs, became the next vehicles for distribution. The
rich text and graphics capabilities of the World Wide Web browsers are being augmented
with animations, video, and sound. Internet distribution will be augmented by distribution
via satellite, wireless, and cable systems.
In 1895, Gugliemo Marconi sent his first wireless radio transmission at Pontecchio, Italy.
A few years later (in 1901) he detected radio waves beamed across the Atlantic. Initially
invented for telegraph, radio is now a major medium for audio broadcasting.
Television was the new media for the 20th century. It brings the video and has
since changed the world of mass communications.
127
1989 - Tim Berners-Lee proposed the World Wide Web to CERN
(European Council for Nuclear Research)
1990 - K. Hooper Woolsey, Apple Multimedia Lab, 100 people, educ.
1991 - Apple Multimedia Lab: Visual Almanac, Classroom MM Kiosk
1992 - the first M-bone audio multicast on the Net
1993 - U. Illinois National Center for Supercomputing Applications:
NCSA Mosaic
1994 - Jim Clark and Marc Andreesen: Netscape
1995 - JAVA for platform-independent application development. Duke is
the first applet.
1996 - Microsoft, Internet Explorer.
12.5 Applications
128
Multimedia applications are primarily existing applications that can be made less
expensive or more effective through the use of multimedia technology. In addition, new,
speculative applications, like movies on demand, can be created with the technology. We
present here a few of these applications.
Video on demand (VOD), also called movies on demand, is a service that provides
movies on an individual basis to television sets in people's homes. The movies are stored
in a central server and transmitted through a communication network. A set-top box
(STB) connected to the communication network converts the digital information to
analog and inputs it to the TV set. The viewer uses a remote control device to select a
movie and manipulate play through start, stop, rewind, and visual fast forward buttons.
The capabilities are very similar to renting a video at a store and playing it on a VCR.
The service can provide indices to the movies by title, genre, actors, and director. VOD
differs from pay per view by providing any of the service's movies at any time, instead of
requiring that all purchasers of a movie watch its broadcast at the same time. Enhanced
pay per view, also a broadcast system, shows the same movie at a number of staggered
starting times.
Home shopping and information systems - Services to the home that provide
video on demand will also provide other, more interactive, home services. Many kinds of
goods and services can be sold this way. The services will help the user navigate through
the available material to plan vacations, renew driver's licenses, purchase goods, etc.
Networked games - The same infrastructure that supports home shopping could be
used to temporarily download video games with graphic-intensive functionality to the
STB, and the games could then be played for a given period of time. Groups of people
could play a game together, competing as individuals or working together in teams.
Action games would require a very fast, or low-latency , network.
129
combination of stored multimedia presentations, live teaching, and participation by the
students. Distance learning involves aspects of both teaching with multimedia and video
conferencing.
Virtual reality - Virtual reality provides a very realistic effect through sight and
sound, while allowing the user to interact with the virtual world. Because of the ability of
the user to interact with the process, realistic visual effects must be created ``on the fly.''
130
12.8 Points for Discussion
Discuss the following
a) Application of multimedia in medicine
b) Application of multimedia in education
12.10 References
1. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
2. Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI,
2002
3. S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995
4. J.F. Koegel, Multimedia Systems, Pearson Education, 2001
131
LESSON – 13: MULTIMEDIA BUILDING BLOCKS
CONTENTS
132
LESSON 13 MULTIMEDIA BUILDING BLOCKS
The aim of this lesson is to learn the concept of multimedia building blocks.
The objectives of this lesson are to make the student aware of the following concepts
a) building blocks
b) architecture
c) characteristics
d) challenges
13.2 Introduction
Multimedia is obviously a fertile ground for both research and the development of
new products, because of the breadth of possible usage, the dependency on a wide range
of technologies, and the value of reducing cost by improving the technology. Now that
the technology has been developed, however, the marketplace will determine future
direction. The technology will be used when clear value is found. For example,
multimedia is widely used on PCs using CDs to store the content. The CDs are
inexpensive to reproduce, and the players are standard equipment on most PCs purchased
today. The acceptance caused a greater demand for players, which, in turn, caused greater
production and further reduced prices.
The computer industry is providing demand, and an expanding market, for the key
hardware technologies that underlie multimedia. These include solid-state memory, logic,
microprocessors, modems, switches, and disk storage. The price declines of 30-60% per
year that we have seen for several decades will continue into the foreseeable future. As a
result, the application of multimedia, which appears expensive now, will become less
expensive and more attractive. An exception to this fast rate of improvement is the cost of
data communications. Communications depend both on technology with rapidly
decreasing cost and on mundane and basically unchanging tasks such as laying cable with
the help of a backhoe or stringing cables from poles. The cost of communication is not
likely to decline significantly for quite a while.
We feel that multimedia will spread from low-bit-rate to high-bit-rate, and will
begin on established intranets first, move to the Internet, and finally be transmitted on
broadband connections (ADSL or cable modems) to the home.
133
infrastructure. The availability of switched LAN technology and faster LANs will allow
increases in both the bit rate per user and the number of users. As the cost of
communications decreases, the cost for Internet attachment for servers will decline, and
higher-quality video will be used on the Internet. Multimedia will be a compelling
interface for commerce and advertising on the Internet. Eventually, cable modems and/or
ADSL will provide bandwidth for movies to the home, and the declining computer and
switching costs will allow a cost-effective service. The winner between ADSL and cable
modems will have as much to do with the ability of cable companies and RBOCs to raise
capital as with the inherent cost and value of the two technologies.
a) Text
b) Vedio
c) Sound
d) Images
e) Animation
f) Hyper Text and Hypermedia
Hypertext is a text which contains links to other texts. The term was invented by
Ted Nelson around 1965. Hypertext is therefore usually non-linear (as indicated below).
134
HyperMedia is not constrained to be text-based. It can include other media, e.g.,
graphics, images, and especially the continuous media - sound and video. Apparently,
Ted Nelson was also the first to use this term.
The World Wide Web (WWW) is the best example of hypermedia applications.
Multimedia systems may have to render a variety of media at the same instant -- a
distinction from normal applications. There is a temporal relationship between many
forms of media (e.g. Video and Audio. These 2 are forms of problems here
The key issues multimedia systems need to deal with here are:
135
How to represent and store temporal information.
How to strictly maintain the temporal relationships on play back/retrieval
What processes are involved in the above.
Data has to represented digitally so many initial source of data needs to be digitise
-- translated from analog source to digital representation. This will involve scanning
(graphics, still images) and sampling (audio/video) although digital cameras now exist for
direct scene to digital capture of images and video.
Given the above challenges the following feature a desirable (if not a prerequisite)
for a Multimedia System:
Very High Processing Power -- needed to deal with large data processing and real time
delivery of media.
Efficient and High I/O -- input and output to the file subsystem needs to be efficient and
fast. Needs to allow for real-time recording as well as playback of data. e.g. Direct to
Disk recording systems.
Special Operating System -- to allow access to file system and process data efficiently
and quickly. Needs to support direct transfers to disk, real-time scheduling, fast interrupt
processing, I/O streaming etc.
Storage and Memory -- large storage units (of the order of 50 -100 Gb or more) and
large memory (50 -100 Mb or more). Large Caches also required and frequently of Level
2 and 3 hierarchy for efficient management.
Software Tools -- user friendly tools needed to handle media, design & develop
applications, and deliver media.
136
13.8 Components of a Multimedia System
Now let us consider the Components (Hardware and Software) required for a multimedia
system:
Capture devices
-- Video Camera, Video Recorder, Audio Microphone, Keyboards, mice, graphics
tablets, 3D input devices, tactile sensors, VR devices. Digitising/Sampling
Hardware
Storage Devices
-- Hard disks, CD-ROMs, Jaz/Zip drives, DVD, etc
Communication Networks
-- Ethernet, Token Ring, FDDI, ATM, Intranets, Internets.
Computer Systems
-- Multimedia Desktop machines, Workstations, MPEG/VIDEO/DSP Hardware
Display Devices
-- CD-quality speakers, HDTV,SVGA, Hi-Res monitors, Colour printers etc.
13.9.1 Networks
Telephone networks dedicate a set of resources that forms a complete path from
end to end for the duration of the telephone connection. The dedicated path guarantees
that the voice data can be delivered from one end to the other end in a smooth and timely
way, but the resources remain dedicated even when there is no talking. In contrast, digital
packet networks, for communication between computers, use time-shared resources
(links, switches, and routers) to send packets through the network. The use of shared
resources allows computer networks to be used at high utilization, because even small
periods of inactivity can be filled with data from a different user. The high utilization and
shared resources create a problem with respect to the timely delivery of video and audio
over data networks. Current research centers around reserving resources for time-
sensitive data, which will make digital data networks more like telephone voice networks.
13.9.2 Internet
The Internet and intranets, which use the TCP protocol suite, are the most
important delivery vehicles for multimedia objects. TCP provides communication
137
sessions between applications on hosts, sending streams of bytes for which delivery is
always guaranteed by means of acknowledgments and retransmission. User Datagram
Protocol (UDP) is a ``best-effort'' delivery protocol (some messages may be lost) that
sends individual messages between hosts. Internet technology is used on single LANs
and on connected LANs within an organization, which are sometimes called intranets,
and on ``backbones'' that link different organizations into one single global network.
Internet technology allows LANs and backbones of totally different technologies to be
joined together into a single, seamless network.
Ethernet LANs use a common wire to transmit data from station to station.
Mediation between transmitting stations is done by having stations listen before sending,
so that they will not interfere with each other. However, two stations could begin to send
at the same time and collide, or one station could start to send significantly later than
another but not know it because of propagation delay. In order to detect these other
situations, stations continue to listen while they transmit and determine whether their
message was possibly garbled by a collision. If there is a collision, a retransmission takes
place (by both stations) a short but random time later. Ethernet LANs can transmit data at
10 Mb/s. However, when multiple stations are competing for the LAN, the throughput
may be much lower because of collisions and retransmissions.
Switched Ethernet - Switches may be used at a hub to create many small LANs
where one large one existed before. This reduces contention and permits higher
throughput. In addition, Ethernet is being extended to 100Mb/s throughput. The
combination, switched Ethernet, is much more appropriate to multimedia than regular
Ethernet, because existing Ethernet LANs can support only about six MPEG video
streams, even when nothing else is being sent over the LAN.
138
processing) of data preceded by five octets of control information. An ATM network
consists of a set of communication links interconnected by switches. Communication is
preceded by a setup stage in which a path through the network is determined to establish
a circuit. Once a circuit is established, 53-octet packets may be streamed from point to
point.
ATM networks can be used to implement parts of the Internet by simulating links
between routers in separate intranets. This means that the ``direct'' intranet connections
are actually implemented by means of shared ATM links and switches.
ATM, both between LANs and between servers and workstations on a LAN, will
support data rates that will allow many users to make use of motion video on a LAN.
b) ISDN - Integrated Service Digital Network (ISDN) extends the telephone company
digital network by sending the digital form of the signal all the way to the customer.
ISDN is organized around 64Kb/s transmission speeds, the speed used for digitized voice.
An ISDN line was originally intended to simultaneously transmit a digitized voice signal
and a 64Kb/s data stream on a single wire. In practice, two channels are used to produce a
128Kb/s line, which is faster than the 28.8Kb/s speed of typical computer modems but
not adequate to handle MPEG video.
139
ADSL is a critical technology for the Regional Bell Operating Companies (RBOCs),
because it allows them to use the existing twisted-pair infrastructure to deliver high data
rates to the home.
d) Cable systems - Cable television systems provide analog broadcast signals on a coaxial
cable, instead of through the air, with the attendant freedom to use additional frequencies
and thus provide a greater number of channels than over-the-air broadcast. The systems
are arranged like a branching tree, with ``splitters'' at the branch points. They also require
amplifiers for the outbound signals, to make up for signal loss in the cable. Most modern
cable systems use fiber optic cables for the trunk and major branches and use coaxial
cable for only the final loop, which services one or two thousand homes. The root of the
tree, where the signals originate, is called the head end.
e) Cable modems are used to modulate digital data, at high data rates, into an analog 6-
MHz-bandwidth TV-like signal. These modems can transfer 20 to 40 Mb/s in a frequency
bandwidth that would have been occupied by a single analog TV signal, allowing
multiple compressed digital TV channels to be multiplexed over a single analog channel.
The high data rate may also be used to download programs or World Wide Web content
or to play compressed video. Cable modems are critical to cable operators, because it
enables them to compete with the RBOCs using ADSL.
f) Set-top box - The STB is an appliance that connects a TV set to a cable system,
terrestrial broadcast antenna, or satellite broadcast antenna. The STB in most homes has
two functions. First, in response to a viewer's request with the remote-control unit, it
shifts the frequency of the selected channel to either channel 3 or 4, for input to the TV
set. Second, it is used to restrict access and block channels that are not paid for.
Addressable STBs respond to orders that come from the head end to block and unblock
channels.
g) Admission control - Digital multimedia systems that are shared by multiple clients
can deliver multimedia data to a limited number of clients. Admission control is the
function which ensures that once delivery starts, it will be able to continue with the
required quality of service (ability to transfer isochronous data on time) until completion.
The maximum number of clients depends upon the particular content being used and
other characteristics of the system.
140
i) Authoring systems - Multimedia authoring systems are used to edit and arrange
multimedia objects and to describe their presentation. The authoring package allows the
author to specify which objects may be played next. The viewer dynamically chooses
among the alternatives. Metadata created during the authoring process is normally saved
as a file. At play time, an ``execution package'' reads the metadata and uses it as a script
for the playout.
In this section we show how the multimedia technologies are organized in order
to create multimedia systems, which in general consist of suitable organizations of
clients, application servers, and storage servers that communicate through a network.
Some multimedia systems are confined to a stand-alone computer system with content
stored on hard disks or CD-ROMs. Distributed multimedia systems communicate through
a network and use many shared resources, making quality of service very difficult to
achieve and resource management very complex.
• Multi-user systems
141
reservation and admission control, the only way to give some assurance of continuous
video is to operate with small LANs and make sure that the server is on the same LAN as
the client. In the future, ATM and fast Ethernet will provide capacity more appropriate to
multimedia.
DBS requires a set-top box with much more function than a normal cable STB.
The STB contains a demodulator to reconstruct the digital data from the analog satellite
broadcast. The MPEG compressed form is decompressed, and a standard TV signal is
produced for input to the TV set. The STB uses a telephone modem to periodically verify
that the premium channels are still authorized and report on use of the pay-per-view
channels so that billing can be done.
Interactive TV and video to the home - Interactive TV and video to the home
allow viewers to select, interact with, and control video play on a TV set in real time. The
user might be viewing a conventional movie, doing home shopping, or engaging in a
network game. The compressed video flowing to the home requires high bandwidth, from
1.5 to 6 Mb/s, while the return path, used for selection and control, requires far lower
bandwidth.
The STB used for interactive TV is similar to that used for DBS. The
demodulation function depends upon the network used to deliver the digital data. A
microprocessor with memory for limited buffering as well as an MPEG decompression
chip is needed. The video is converted to a standard TV signal for input to the TV set.
The STB has a remote-control unit, which allows the viewer to make choices from a
distance. Some means are needed to allow the STB to relay viewer commands back to the
server, depending upon the network being used.
Cable systems appear to be broadcast systems, but they can actually be used to
deliver different content to each home. Cable systems often use fiber optic cables to send
the video to converters that place it on local loops of coaxial cable. If a fiber cable is
dedicated to each final loop, which services 500 to 1500 homes, there will be enough
bandwidth to deliver an individual signal to many of those houses. The cable can also
provide the reverse path to the cable head end. Ethernet-like protocols can be used to
share the same channel with the other STBs in the local loop. This topology is attractive
to cable companies because it uses the existing cable plant. If the appropriate amplifiers
142
are not present in the cable system for the back channel, a telephone modem can be used
to provide the back channel.
As mentioned above, the asymmetric data rates of ADSL are tailored for
interactive TV. The use of standard twisted-pair wire, which has been brought to virtually
every house, is attractive to the telephone industry. However, the twisted pair is a more
noisy medium than coaxial cable, so more expensive modems are needed, and distances
are limited. ADSL can be used at higher data rates if the distance is further reduced.
Interactive TV architectures are typically three-tier, in which the client and server
tiers interact through an application server. (In three-tier systems, the tier-1 systems are
clients, the tier-2 systems are used for application programs, and the tier-3 systems are
data servers.) The application tier is used to separate the logic of looking up material in
indexes, maintaining the shopping state of a viewer, interacting with credit card servers,
and other similar functions from the simple function of delivering multimedia objects.
The key research questions about interactive TV and video-on-demand are not
computer science questions at all. Rather, they are the human-factors issues concerning
ease of the on-screen interface and, more significantly, the marketing questions regarding
what home viewers will find valuable and compelling.
Internet over cable systems - World Wide Web browsing allows users to see a
rich text, video, sound, and graphics interface and allows them to access other
information by clicking on text or graphics. Web pages are written in HyperText Markup
Language (HTML) and use an application communications protocol called HTTP. The
user responses, which select the next page or provide a small amount of text information,
are normally quite short. On the other hand, the graphics and pictures require many times
the number of bytes to be transmitted to the client. This means that distribution systems
that offer asymmetric data rates are appropriate.
Cable TV systems can be used to provide asymmetric Internet access for home
computers in ways that are very similar to interactive TV over cable. The data being sent
to the client is digitized and broadcast over a prearranged channel over all or part of the
cable system. A cable modem at the client end tunes to the right channel and demodulates
the information being broadcast. It must also filter the information destined for the
particular station from the information being sent to other clients. The low-bandwidth
reverse channel is the same low-frequency band that is used in interactive TV. As with
interactive TV, a telephone modem might be used for the reverse channel. The cable head
end is then attached to the Internet using a router. The head end is also likely to offer
other services that Internet Service Providers sell, such as permanent mailboxes. This
asymmetric connection would not be appropriate for a Web server or some other type of
commerce server on the Internet, because servers transmit too much data for the low-
speed return path. The cable modem provides the physical link for the TCP/IP stack in
the client computer. The client software treats this environment just like a LAN
connected to the Internet.
143
Video servers on a LAN - LAN-based multimedia systems go beyond the simple,
client-server, remote file system type of video server, to advanced systems that offer a
three-tier architecture with clients, application servers, and multimedia servers. The
application servers provide applications that interact with the client and select the video
to be shown. On a company intranet, LAN-based multimedia could be used for just-in-
time education, on-line documentation of procedures, or video messaging. On the
Internet, it could be used for a video product manual, interactive video product support,
or Internet commerce. The application server chooses the video to be shown and causes it
to be sent to the client.
There are three different ways that the application server can cause playout of the
video: By giving the address of the video server and the name of the content to the client,
which would then fetch it from the video server; by communicating with the video server
and having it send the data to the client; and by communicating with both to set up the
relationship.
The transmission of data to the client may be in push mode or pull mode. In push
mode, the server sends data to the client at the appropriate rate. The network must have
quality-of-service guarantees to ensure that the data gets to the client on time. In pull
mode, the client requests data from the server, and thus paces the transmission.
The current protocols for Internet use are TCP and UDP. TCP sets up sessions,
and the server can push the data to the client. However, the ``moving-window'' algorithm
of TCP, which prevents client buffer overrun, creates acknowledgments that pace the
sending of data, thus making it in effect a pull protocol. Another issue in Internet
architecture is the role of firewalls, which are used at the gateway between an intranet
and the Internet to keep potentially dangerous or malicious Internet traffic from getting
onto the intranet. UDP packets are normally never allowed in. TCP sessions are allowed,
if they are created from the inside to the outside. A disadvantage of TCP for isochronous
data is that error detection and retransmission is automatic and required--whereas it is
preferable to discard garbled video data and just continue.
144
used with small window sizes for the ``talking heads'' and most of the other visuals.
Scalability of a video conferencing system is important, because if all participants send to
all other participants, the traffic goes up as the square of the number of participants. This
can be made linear by having all transmissions go through a common server. If the
network has a multicast facility, the server can use that to distribute to the participants.
13.15 References
145
LESSON – 14: TEXT AND SOUND
CONTENTS
146
14.1 Aims and Objectives
The aim of this lesson is to learn the concept of text and sound in multimedia
The objectives of this lesson are to make the student aware of the following concepts
a) text
b) sound
c) sound formats
14.2 Introduction
Text is the most widely used and flexible means of presenting information on
screen and conveying ideas. The designer should not necessarily try to replace textual
elements with pictures or sound, but should consider how to present text in an acceptable
way and supplementing it with other media. For a public system, where the eyesight of its
users will vary considerably, a clear reasonably large font should be used. Users will also
be put off by the display of large amounts of text and will find it hard to scan. To present
tourist information about a hotel, for example, information should be presented concisely
under clear separate headings such as location, services available, prices, contact details
etc.
Guidelines Conventional upper and lower case text should be used for the presentation
since reading is faster compared to all upper case text.
All upper case can be used if a text item has to attract attention as in warnings and alarm
messages.
The length of text lines should be no longer than around 60 characters to achieve optimal
reading speed.
Proportional spacing and ragged lines also minimizes unpleasant visual effects.
12 point text is the practical minimum to adopt for PC based screens, with the use of 14
point or higher for screens of poorer resolution than a normal desktop PC
If the users do not have their vision corrected for VDU use e.g. the public. It is
recommended that text of 16 point is preferred if it is to be usable by people with visual
impairments.
Sentences should be short and concise and not be split over pages.
147
Technical expressions should be used only where the user is familiar with them from
their daily routine, and should be made as understandable as possible e.g. "You are now
contacting with Paul Andrews" rather than "Connection to Multipoint Control Unit".
An explanation of the abbreviations used in the system should be readily available to the
user through on-line help facilities or at least through written documentation.
Characters can be more than letters - they can be digits, punctuation. Even the carriage-
return when you hit the return key is stored as a character. Computers deal with all data
by turning switches off and on in a sequence. We look at this by calling an off switch "0"
and and on switch "1". These 0's and 1's are called bits. Everything in a computer is
ultimately represented by sequences of 0's and 1's - bits. If the sequence were of length 2,
we could have 00, 01, 10, or 11. Four items. Similarly, we find that a sequence of length
3 can represent 8 items (000, 001, 010, ...). A sequence of length 4 can represent 16
things (0000, 0001, 0010, ...). There are about 128 characters that a computer has to
store. This should take a sequence of length 7. In reality, 8 bits are used instead of 7 (the
8th bit is used to check on the data). The point to remember here is that: n bits can
represent 2^n items
148
The linear dimensions of the sound sources can determined from calibrated photographs
enabling investigation of, for example, the relationship between the length of a string or a
pipe and the fundamental frequency of the sound it generates. An audio narration
describes the key concepts illustrated by each example and a supporting text file provides
essential data, poses challenging questions and suggests possible investigations.
Additional features of the disc include: a six component sound synthesiser which
students can use to generate their own sound samples; the facility to import sounds
recorded with a microphone plugged into the PC's sound card or taken from an audio CD;
extensive help files outlining the fundamental Physics of sound.
The MIDI (Musical Instrument Digital Interface) is a format for sending music
information between electronic music devices like synthesizers and PC sound cards.
The MIDI format was developed in 1982 by the music industry. The MIDI format is very
flexible and can be used for everything from very simple to real professional music
making.
MIDI files do not contain sampled sound, but a set of digital musical instructions
(musical notes) that can be interpreted by your PC's sound card.
The downside of MIDI is that it cannot record sounds (only notes). Or, to put it another
way: It cannot store songs, only tunes.
The upside of the MIDI format is that since it contains only instructions (notes), MIDI
files can be extremely small. The example above is only 23K in size but it plays for
nearly 5 minutes.
The MIDI format is supported by many different software systems over a large range of
platforms. MIDI files are supported by all the most popular Internet browsers.
Sounds stored in the MIDI format have the extension .mid or .midi.
The RealAudio format was developed for the Internet by Real Media. The format also
supports video.
The format allows streaming of audio (on-line music, Internet radio) with low
bandwidths. Because of the low bandwidth priority, quality is often reduced.
Sounds stored in the RealAudio format have the extension .rm or .ram.
149
14.6 The AU Format
The AU format is supported by many different software systems over a large range of
platforms.
AIFF files are not cross-platform and the format is not supported by all web browsers.
Sounds stored in the AIFF format have the extension .aif or .aiff.
SND files are not cross-platform and the format is not supported by all web browsers.
It is supported by all computers running Windows, and by all the most popular web
browsers.
MP3 files are actually MPEG files. But the MPEG format was originally developed for
video by the Moving Pictures Experts Group. We can say that MP3 files are the sound
part of the MPEG video format.
MP3 is one of the most popular sound formats for music recording. The MP3 encoding
system combines good compression (small files) with high quality. Expect all your future
software systems to support it.
Sounds stored in the MP3 format have the extension .mp3, or .mpga (for MPG Audio).
150
What Format To Use?
The WAVE format is one of the most popular sound format on the Internet, and it is
supported by all popular browsers. If you want recorded sound (music or speech) to be
available to all your visitors, you should use the WAVE format.
14.15 References
1. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
2. Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002
3. S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995
4. J.F. Koegel, Multimedia Systems, Pearson Education, 2001
151
UNIT – V:
CONTENTS
15.1 Aims and Objectives
15.2 Introduction
15.3 Different Graphic Formats?
15.4 Pixels and the Web
15.5 Meta/Vector Image Formats
15.6 What's A Bitmap?
15.7 Compression
15.8 The GIF Image Formats
15.9 Animation
15.10 Transparency
15.11 Interlaced vs. Non-Interlaced GIF
15.12 JPEG Image Formats
15.13 Progressive JPEGs
15.14 Which image do I use where?
15.15 How do I save in these formats?
15.16 Do you edit and create images in GIF or JPEG?
15.17 Animation
15.18 Multimedia Animation
15.19 Let us Sum Up
15.20 Lesson-end Activities
15.21 Points for Discussion
15.22 Model answers to “Check your Progress”
15.23 References
152
LESSON 15 IMAGES AND ANIMATION
The aim of this lesson is to learn the concept of images and animations in multimedia.
The objectives of this lesson are to make the student aware of the following concepts
a) various imaging formats
b) multimedia
15.2 Introduction
If you really want to be strict, computer pictures are files, the same way word
documents or solitaire games are files. They're all a bunch of ones and zeros all in a row.
But we do have to communicate with one another so let's decide.
Image. We'll use "image". That seems to cover a wide enough topic range.
I went to my reference books and there I found that "graphic" is more of an adjective,
as in "graphic format." You see, we denote images on the Internet by their graphic
format. GIF is not the name of the image. GIF is the compression factors used to create
the raster format set up by CompuServe. (More on that in a moment).
So, they're all images unless you're talking about something specific.
There actually are only two basic methods for a computer to render, or store and
display, an image. When you save an image in a specific format you are creating either a
raster or meta/vector graphic format. Here's the lowdown:
Raster
Raster image formats (RIFs) should be the most familiar to Internet users. A Raster
format breaks the image into a series of colored dots called pixels. The number of ones
153
and zeros (bits) used to create each pixel denotes the depth of color you can put into your
images.
If your pixel is denoted with only one bit-per-pixel then that pixel must be black or
white. Why? Because that pixel can only be a one or a zero, on or off, black or white.
Bump that up to 4 bits-per-pixel and you're able to set that colored dot to one of 16
colors. If you go even higher to 8 bits-per-pixel, you can save that colored dot at up to
256 different colors.
Does that number, 256 sound familiar to anyone? That's the upper color level of a GIF
image. Sure, you can go with less than 256 colors, but you cannot have over 256.
That's why a GIF image doesn't work overly well for photographs and larger images.
There are a whole lot more than 256 colors in the world. Images can carry millions. But if
you want smaller icon images, GIFs are the way to go.
Raster image formats can also save at 16, 24, and 32 bits-per-pixel. At the two highest
levels, the pixels themselves can carry up to 16,777,216 different colors. The image looks
great! Bitmaps saved at 24 bits-per-pixel are great quality images, but of course they also
run about a megabyte per picture. There's always a trade-off, isn't there?
The three main Internet formats, GIF, JPEG, and Bitmap, are all Raster formats.
154
RAW Unencoded image format
Run-Length Encoding
RLE
(Used to lower image bit rates)
TIFF Aldus Corporation format
WPG WordPerfect image format
There is a delicate balance between the crispness of a picture and the number of pixels
needed to display it. Let's say you have two images, each is 5 inches across and 3 inches
down. One uses 300 pixels to span that five inches, the other uses 1500. Obviously, the
one with 1500 uses smaller pixels. It is also the one that offers a more crisp, detailed
look. The more pixels, the more detailed the image will be. Of course, the more pixels the
more bytes the image will take up.
So, how much is enough? That depends on whom you are speaking to, and right now
you're speaking to me. I always go with 100 pixels per inch. That creates a ten-thousand
pixel square inch. I've found that allows for a pretty crisp image without going overboard
on the bytes. It also allows some leeway to increase or decrease the size of the image and
not mess it up too much.
The lowest I'd go is 72 pixels per inch, the agreed upon low end of the image scale. In
terms of pixels per square inch, it's a whale of a drop to 5184. Try that. See if you like it,
but I think you'll find that lower definition monitors really play havoc with the image.
Where the Meta/Vector formats have it over Raster is that they are more than a simple
grid of colored dots. They're actual vectors of data stored in mathematical formats rather
than bits of colored dots. This allows for a strange shaping of colors and images that can
be perfectly cropped on an arc. A squared-off map of dots cannot produce that arc as
well. In addition, since the information is encoded in vectors, Meta/Vector image formats
can be blown up or down (a property known as "scalability") without looking jagged or
crowded (a property known as "pixelating").
155
So that I do not receive e-mail from those in the computer image know, there is a
difference in Meta and Vector formats. Vector formats can contain only vector data
whereas Meta files, as is implied by the name, can contain multiple formats. This means
there can be a lovely Bitmap plopped right in the middle of your Windows Meta file.
You'll never know or see the difference but, there it is. I'm just trying to keep everybody
happy.
If you're using an MSIE browser, you can view this first example. The image is St.
Sophia in Istanbul. The picture is taken from the city's hippodrome.
Against what I said above, Bitmaps will display on all browsers, just not in the
familiar <IMG SRC="--"> format we're all used to. I see Bitmaps used mostly as return
images from PERL Common Gateway Interfaces (CGIs). A counter is a perfect example.
Page counters that have that "odometer" effect are Bitmap images created by the server,
rather than as an inline image. Bitmaps are perfect for this process because they're a
simple series of colored dots. There's nothing fancy to building them.
It's actually a fairly simple process. In the script that runs the counter, you "build"
each number for the counter to display. Note the counter is black and white. That's only a
one bit-per-pixel level image. To create the number zero in the counter above, you would
build a grid 7 pixels wide by 10 pixels high. The pixels you want to remain black, you
would denote as zero. Those you wanted white, you'd denote as one.
Bitmaps are good images, but they're not great. If you've played with Bitmaps versus
any other image formats, you might have noticed that the Bitmap format creates images
that are a little heavy on the bytes. The reason is that the Bitmap format is not very
efficient at storing data. What you see is pretty much what you get, one series of bits
stacked on top of another.
15.7 Compression
I said above that a Bitmap was a simple series of pixels all stacked up. But the same
image saved in GIF or JPEG format uses less bytes to make up the file. How?
Compression.
156
"Compression" is a computer term that represents a variety of mathematical formats
used to compress an image's byte size. Let's say you have an image where the upper
right-hand corner has four pixels all the same color. Why not find a way to make those
four pixels into one? That would cut down the number of bytes by three-fourths, at least
in the one corner. That's a compression factor.
GIF, which stands for "Graphic Interchange Format," was first standardized in 1987
by CompuServe, although the patent for the algorithm (mathematical formula) used to
create GIF compression actually belongs to Unisys. The first format of GIF used on the
Web was called GIF87a, representing its year and version. It saved images at 8 pits-per-
pixel, capping the color level at 256. That 8-bit level allowed the image to work across
multiple server styles, including CompuServe, TCP/IP, and AOL. It was a graphic for all
seasons, so to speak.
CompuServe updated the GIF format in 1989 to include animation, transparency, and
interlacing. They called the new format, you guessed it: GIF89a.
15.9 Animation
I remember when animation really came into the mainstream of Web page
development. I was deluged with e-mail asking how to do it. There's been a tutorial up
for a while now at http://www.htmlgoodies.com/tutors/animate.html. Stop by and see it
for instruction on how to create the animations yourself.
157
What you are seeing in that example are 12 different images, each set one "hour"
farther ahead than the one before it. Animate them all in a row and you get that stopwatch
effect.
The concept of GIF89a animation is much the same as a picture book with small
animation cells in each corner. Flip the pages and the images appear to move. Here, you
have the ability to set the cell's (technically called an "animation frame") movement
speed in 1/100ths of a second. An internal clock embedded right into the GIF keeps count
and flips the image when the time comes.
The animation process has been bettered along the way by companies who have found
their own method of compressing the GIFs further. As you watch an animation you might
notice that very little changes from frame to frame. So, why put up a whole new GIF
image if only a small section of the frame needs to be changed? That's the key to some of
the newer compression factors in GIF animation. Less changing means fewer bytes.
15.10 Transparency
As you can see, the bytes came out the same after the image was put through the
transparency filter. The process is best described as similar to the weather forecaster on
your local news. Each night they stand in front of a big green (sometimes blue) screen
and deliver the weather while that blue or green behind them is "keyed" out and replaced
by another source. In the case of the weather forecaster, it's usually a large map with lots
of Ls and Hs.
Think of that in terms of a transparent GIF. There are only 256 colors available in the
GIF. The computer is told to hone in on one of them. It's done by choosing a particular
red/green/blue shade already found in the image and blanking it out. The color is
basically dropped from the palette that makes up the image. Thus whatever is behind it
shows through.
The shape is still there though. Try this: Get an image with a transparent background
and alter its height and width in your HTML code. You'll see what should be the
transparent color seeping through.
158
Any color that's found in the GIF can be made transparent, not just the color in the
background. If the background of the image is speckled then the transparency is going to
be speckled. If you cut out the color blue in the background, and that color also appears
in the middle of the image, it too will be made transparent.
When I put together a transparent image, I make the image first, then copy and paste it
onto a slightly larger square. That square is the most hideous green I can mix up. I'm sure
it doesn't appear in the image. That way only the background around the image will
become clear.
The GIF images of me playing the Turkish Sitar were non-interlaced format images.
This is what is meant when someone refers to a "normal" GIF or just "GIF".
When you do NOT interlace an image, you fill it in from the top to the bottom, one
line after another. The following image is of two men coming onto a boat we used to
cross from the European to the Asian side of Turkey. The flowers they are carrying were
sold in the manner of roses we might buy our wife here in the U.S. I bought one. (What a
guy.)
Hopefully, you're on a slower connection computer so you got the full effect of
waiting for the image to come in. It can be torture sometimes. That's where the brilliant
Interlaced GIF89a idea came from.
Interlacing is the concept of filling in every other line of data, then going back to the
top and doing it all again, filling in the lines you skipped. Your television works that way.
The effect on a computer monitor is that the graphic appears blurry at first and then
sharpens up as the other lines fill in. That allows your viewer to at least get an idea of
what's coming up rather than waiting for the entire image, line by line.
Both interlaced and non-interlaced GIFs get you to the same destination. They just do
it differently. It's up to you which you feel is better.
159
For a long while, GIF ruled the Internet roost. I was one of the people who didn't
really like this new JPEG format when it came out. It was less grainy than GIF, but it also
caused computers without a decent amount of memory to crash the browser. (JPEGs have
to be "blown up" to their full size. That takes some memory.) There was a time when
people only had 8 or 4 megs or memory in their boxes. Really. It was way back in the
Dark Ages.
JPEGs are "lossy." That's a term that means you trade-off detail in the displayed
picture for a smaller storage file. I always save my JPEGs at 50% or medium
compression.
Here's a look at the same image saved in normal, or what's called "sequential"
encoding. That's a top-to-bottom, single-line, equal to the GIF89a non-interlaced format.
The image is of an open air market in Basra. The smell was amazing. If you like olives,
go to Turkey. Cucumbers, too, believe it or not.
The difference between the 1% and 50% compression is not too bad, but the drop in
bytes is impressive. The numbers I am showing are storage numbers, the amount of hard
drive space the image takes up.
You've probably already surmised that 50% compression means that 50% of the image
is included in the algorithm. If you don't put a 50% compressed image next to an exact
duplicate image at 1% compression, it looks pretty good. But what about that 99%
compression image? It looks horrible, but it's great for teaching. Look at it again. See
how it appears to be made of blocks? That's what's meant by lossy. Bytes are lost at the
expense of detail. You can see where the compression algorithm found groups of pixels
that all appeared to be close in color and just grouped them all together as one. You might
be hard pressed to figure out what the image was actually showing if I didn't tell you.
You can almost guess what this is all about. A progressive JPEG works a lot like the
interlaced GIF89a by filling in every other line, then returning to the top of the image to
fill in the remainder.
Obviously, here's where bumping up the compression does not pay off. Rule of
thumb: If you're going to use progressive JPEG, keep the compression up high, 75% or
better.
160
15.14 Which image do I use where?
There's just not a good answer to this question. No matter what I say, someone else
can give you just as compelling a reason why you should do the opposite. I'll tell you the
rules I follow:
That said, I also follow the thinking, "Do people really need to see this image?" Can I
get away with text rather than an image link? Can I make links to images allowing the
viewer to choose whether to look or not? The fewer images I have on a page, the faster it
comes in. I also attempt to have the same images across multiple pages, if possible. That
way the viewer only has to wait once. After that, the images are in the cache and they pop
right up.
To get these formats, you need to make a point of saving in these formats. When your
image editor is open and you have an image you wish to save, always choose SAVE AS
from the FILE menu. You'll get a dialogue box that asks where you'd like to save the
image. Better yet, somewhere on that dialogue box is the opportunity for you to choose a
different image format. Let's say you choose GIF. Keep looking. Somewhere on the same
dialogue box will be an OPTIONS button (or something close). That's where you'll
choose 87a or 89a, interlaced or non-interlaced, formats.
If you choose JPEG, you'll get the option of choosing the compression rate. You may
not get to play with the sliding scale I get. You may only get a series of compression
choices, high, medium, low, etc. Go high.
161
15.16 Do you edit and create images in GIF or JPEG?
Neither. I always edit in the PaintShop Pro or Bitmap format. Others have told me that
image creation and editing should only be done in a Vector format. Either way, make a
point of editing with large images. The larger the image, the better chance you have of
making that perfect crop. Edit at the highest color level the image program will allow.
You can always resize and save to a low-byte format after you've finished creating the
file.
15.17 Animation
Most Web animation requires special plug-ins for viewing. The exception is the animated
GIF format, which is by far the most prevalent animation format on the Web, followed
closely by Macromedia's Flash format. The animation option of the GIF format combines
individual GIF images into a single file to create animation. You can set the animation to
loop on the page or to play once, and you can designate the duration for each frame in the
animation.
Animated GIFs have several drawbacks. One concerns the user interface. GIF animations
do not provide interface controls, so users have no easy way to stop a looping animation
short of closing the browser window. They also lack the means to replay nonlooping
animation. Second, the animated GIF format does not perform interframe compression,
which means that if you create a ten-frame animation and each frame is a 20 KB GIF ,
you'll be putting a 200 KB file on your page. And the final drawback is a concern that
pertains to animations in general. Most animation is nothing more than a distraction. If
you place animation alongside primary content you will simply disrupt your readers'
162
concentration and keep them from the objective of your site. If you require users to sit
through your spiffy Flash intro every time they visit your site, you are effectively turning
them away at the door.
There is a place for animation on the Web, however. Simple animation on a Web site's
main home page can provide just the right amount of visual interest to invite users to
explore your materials. There, the essential content is typically a menu of links, so the
threat of distraction is less than it would be on an internal content page. Also, subtle
animation such as a rollover can help guide the user to interface elements that they might
otherwise overlook. Animation can also be useful in illustrating concepts or procedures,
such as change over time. When you have animation that relates to the content of your
site, one way to minimize the potential distraction is to present the animation in a
secondary window. This technique offers a measure of viewer control: readers can open
the window to view the animation and then close the window when they're through.
From the early days of the web, when the only thing that moved on your screen was the
mouse cursor - there's now a bewildering array of methods for animating pages. Here's a
selection:-
Java.
Shockwave, Flash (formerly FutureSplash). Macromedia's Shockwave plug-ins
and Flash are leaders in plug-in animation.
QuickTime is the multi-platform industry-standard multimedia architecture used
by software tool vendors and content creators to create and deliver synchronized
graphics, sound, video, text and music. FLiCs, AVI all require pre-existing
software to be on your computer before you can view them.
mBED. mbedlets are interactive multimedia interfaces within web pages. They
include graphics, animation, sound. They stream data directly off the web as
needed and attempt to use bandwidth as efficiently as possible. They can
communicate back to the server using standard HTTP methods. And they respond
to user actions such as mouse clicks and key events.
Enliven,
and Sizzler.
Javascript animations require preloading and users can disable Javascript in their
browser.
Framation (TM) is a technique using a combination of meta-refresh and frames.
GIF animation. Self-contained GIF files are downloaded once and played from
the computer's disk cache. You can download several per page, and even place a
single animated GIF dozens of times on the same page, creating effects that would
not be easy with other solutions. Unlike other movie formats, GIF still supports
transparency, even in animations. They are as simple to use and implement as any
still GIF image. The only thing GIF lacks is sound (and BTW sound has been
163
added to GIFs in the past) and real-time speed variation (like AVI's ability to skip
frames when on a slow machine).
15.23 References
164
LESSON – 16: VIDEO
CONTENTS
165
16.14 Model answers to “Check your Progress”
16.15 References
The objectives of this lesson are to make the student aware of the following concepts
a) Video
b) Various video formats
16.2 Introduction
Video is the most challenging multimedia content to deliver via the Web. One second
of uncompressed NTSC (National Television Standards Committee) video, the
international standard for television and video, requires approximately 27 megabytes of
disk storage space. The amount of scaling and compression required to turn this quantity
of data into something that can be used on a network is significant, sometimes so much
so as to render the material useless. If at all possible, tailor your video content for the
Web.
Shoot original video; that way you can take steps to create video that will
compress efficiently and still look good at low resolution and frame rates.
Shoot close-ups. Wide shots have too much detail to make sense at low
resolution.
166
Avoid zooming and panning. These can make low frame-rate movies confusing to
view and interpret and can cause them to compress poorly.
When editing your video, use hard cuts between shots. Don't use the transitional
effects offered by video editing software, such as dissolves or elaborate wipes,
because they will not compress efficiently and will not play smoothly on the Web.
If you are digitizing material that was originally recorded for video or film,
choose your material carefully. Look for clips that contain minimal motion and
lack essential but small details. Motion and detail are the most obvious
shortcomings of low-resolution video.
In the past, video has been defined as multimedia. Video makes use of all of the
elements of multimedia, bringing your products and services alive, but at a high cost.
Scripting, hiring actors, set creation, filming, post-production editing and mastering can
add up very quickly. Five minutes of live action video can cost many times more than a
multimedia production.
167
There are two basic approaches to delivering video on a computer screen –
analogue and digital video.
Video, like audio. Is usually recorded and played as an analog signal. It must
therefore be digitized in order to be incorporated into a multimedia title.
Figure below shows the process for digitizing an analog video signal.
168
PAL is the standard for most of Europe and the Commonwealth, NTSC for North
and South America. The standards are inter-convertible, but conversion normally has to
be performed by a facilities house and some quality loss may occur.
Analogue video can be delivered into the computing interface from any
compatible video source (video recorder, videodisc player, live television) providing the
computer is equipped with a special overlay board, which synchronizes video and
computer signals and displays computer-generated text and graphics over the video.
One of the advantages of digitized video is that it can be easily edited. Analog
video, such as a videotape, is linear; there is a beginning, middle, and end. If you want to
edit it, you need to continually rewind, pause, and fast forward the tape to display the
desired frames.
Digitized video on the other hand, allows random access to any part of the video,
and editing can be as easy as the cut and paste process in a word processing program. In
addition, adding special effects such as fly-in titles and transitions is relatively simple.
Other advantages:
169
The video is stored as a standard computer file. Thus it can be copied with no loss
in quality, and also can be transmitted over standard computer networks.
Software motion video does not require specialized hardware for playback.
Unlike analog video, digital video requires neither a video board in the computer
nor an external device (which adds extra costs and complexity) such as a
videodisc player.
Digitized video files can be extremely large. A single second of high-quality color
video that takes up only one-quarter of a computer screen can be as large as 1 MB.
Several elements determine the file size; in addition to the length of the video,
these include:
Frame Rate
Image Size
Color Depth
170
In most cases, a quarter-screen image size (320 x 240), an 8-bit color depth (256
colors), and a frame rate of 15 fps is acceptable for a multimedia title. And even this
minimum results in a very large file size.
Lossless compression
Lossy compression
Lossless compression preserves the exact image throughout the compression and
decompression process. An example of when this is important is in the use of text
images. Text needs to appear exactly the same before and after file compression. One
technique for text compression is to identify repeating words and assign them a code.
For example, if the word multimedia appears several times in a text file, it would
be assigned a code that takes up less space than the actual word. During decompression,
the code would be changed back to the word multimedia.
171
16.5.2 Lossy Compression
Lossy compression actually eliminates some of the data in the image and
therefore provides greater compression ratios than lossless compression. The greater the
compression ratio, however, the poorer the decompressed image. Thus, the trade-off is
file size versus image quality. Lossy compression is applied to video because some drop
in the quality is not noticeable in moving images.
JPEG
MPEG
Microsoft’s Video for Windows
Apple’s QuickTime
16.6.1 JPEG
JPEG; Although strictly a still image compression standard, stills can become
movie if delivered at 25 (or 30) frames per second. JPEG compression requires hardware,
but decompression can now be achieved in software only (e.g under QuickTime and
Video forWindows).
Figure below shows how the JPEG process works. Often areas of an image
(especially backgrounds) contain similar information. JPEG compression identifies these
areas and stores them as blocks of pixels instead of pixel by pixel, thus reducing the
amount of information needed to store the image.
172
The blocks are then reassembled when the file is decompressed Rather than
separately storing data for each of the 256 blue pixels in this block of background color,
JPEG eliminates the redundant information and record just the color, size, and location of
the block in graphic
Fewer blocks make a smaller file but result in more-lossy compression; data is
irrevocably lost.
173
16.6.2 MPEG
MPEG also add another process to the still image compression when working
with video. MPEG looks for the changes in the image from frame to frame. Keyframes
are identified every few frames, and the changes that occur from keyframe to keyframe
are recorded
MPEG can provide greater compression ratios than JPEG, but it requires
hardware (a card inserted in the computer) that is not needed for JPEG compression. This
limits the use of MPEG compression for multimedia titles, because MPEG cards are
standard on the typical multimedia playback system.
Microsoft’s Video for Windows software is based on the .AVI (Audio Video
Interleave) file format where the audio and video are interleaved. This permits the sound
to appear to be in synchronize with the motion of a video file.
174
QuickTime for Windows integrates video, animation, high-quality still images,
and high quality sound with Windows applications – boosting the impact of all types of
communications.
Video camcorders or video tape recorders can be used for gathering this original
information. Depending on the frequency of use of these pieces of equipment it may be
necessary to purchase them as part of the multimedia setup. or it may be better to loan
equipment from Audio/Visual Services near you
Once this media has been collected it is necessary to translate it into a digital
format, using a video digitizing card, so it can be used on the computer. PCs do not
generally come with video cards, but the Macintosh AV series of computers have built in
cards.
175
Video capture boards are designed to either grab still frames or capture motion
video as input into a computer. In some cases, the video plays through into a window on
the monitor.
Guidelines
1. Care should be taken not to present a video just for the sake of it. For example,
voice output only can be as effective, and requires less storage space, than a video
of someone speaking (i.e. a "talking-head").
2. Using video as part of a multimedia application usually requires a quality as high
as that of television sets to fulfil users expectations.
3. Use of techniques such as cut, fade, dissolve, wipe, overlap, multiple exposure,
should be limited to avoid distracting the user from the content.
4. To make proper use of video sequences in multimedia applications, short
sequences are needed as a part of a greater whole. This is different from watching
a film which usually involves watching it from beginning to end in a single
sequence. Video sequences should be limited to about 45 seconds; longer video
sequences can reduce the users concentration.
5. Video should be accompanied by a soundtrack in order to give extra information
or to add specific detail to the information.
6. Videos need time and careful direction if they are to present information
attractively.
7. If the lighting conditions under which the video is to be viewed may be poor,
controls may be provided for the user to alter display characteristics such as
brightness, contrast, and colour strength.
8. Provide low quality video within a small window, since full screen video raises
the expectation of the user. Often some kind of stage or other 'decoration', e.g. a
cinema metaphor (i.e. background) may be used to show low resolution video in a
part of a screen.
9. The actual position within the video or animation sequence, and the total length of
the sequence, should be shown on a time scale.
10. The user should be able to interrupt the video (or animation) sequence at any time
and to repeat parts of it. The most important controls to provide are: play, pause,
replay from start. However a minimum requirement is that users should be able to
cancel the video or animation sequence at any time, and move on the next part of
the interface.
11. Video controls should be based on the controls on a video recorder VCR or hi-fi
which are familiar to many people.
12. It is also desirable to provide controls to set video characteristics such as
brightness, contrast, colour and hue.
176
16.9 Multimedia Video Formats
The AVI format is supported by all computers running Windows, and by all the
most popular web browsers. It is a very common format on the Internet, but not always
possible to play on non-Windows computers.
Windows Media is a common format on the Internet, but Windows Media movies
cannot be played on non-Windows computer without an extra (free) component installed.
Some later Windows Media movies cannot play at all on non-Windows computers
because no player is available.
Videos stored in the Windows Media format have the extension .wmv.
The MPEG (Moving Pictures Expert Group) format is the most popular format on
the Internet. It is cross-platform, and supported by all the most popular web browsers.
Videos stored in the MPEG format have the extension .mpg or .mpeg.
177
16.9.4 The QuickTime Format
The RealVideo format was developed for the Internet by Real Media.
The format allows streaming of video (on-line video, Internet TV) with low
bandwidths. Because of the low bandwidth priority, quality is often reduced.
Videos stored in the RealVideo format have the extension .rm or .ram.
Videos can be played "inline" or by a "helper", depending on the HTML element you use.
Inline video can be added to a web page by using the <img> element.
178
If you plan to use inline videos in your web applications, be aware that many
people find inline videos annoying. Also note that some users might have turned off the
inline video option in their browser.
Our best advice is to include inline videos only in web pages where the user
expects to see a video. An example of this is a page which opens after the user has
clicked on a link to see the video.
Helper applications can be launched using the <embed> element, the <applet>
element, or the <object> element.
One great advantage of using a helper application is that you can let some (or all)
of the player settings be controlled by the user.
Most helper applications allow manual (or programmed) control over the volume
settings and play functions like rewind, pause, stop and play.
The code fraction above displays an AVI file embedded in a web page.
Note: The dynsrc attribute is not a standard HTML or XHTML attribute. It is supported
by Internet Explorer only.
179
<embed src="video.avi" />
The code fraction above displays an AVI file embedded in a web page.
A list of attributes for the <embed> element can be found in a later chapter of this
tutorial.
Note: The <embed> element is supported by both Internet Explorer and Netscape,
but it is not a standard HTML or XHTML element. The World Wide Web Consortium
(W3C) recommend using the <object> element instead.
Internet Explorer and Netscape both support an HTML element called <object>.
The code fraction above displays an AVI file embedded in a web page.
A list of attributes for the <object> element can be found in a later chapter of this tutorial.
If a web page includes a hyperlink to a media file, most browsers will use a
"helper application" to play the file:
<a href="video.avi">
Click here to play a video file
</a>
The code fraction above displays a link to an AVI file. If the user clicks on the link,
the browser will launch a helper application like Windows Media Player to play the AVI
file.
180
16.11 Let us Sum Up
In this lesson we have learnt about
a) Video formats
b) Compression techniques
16.15 References
181