Operations Research
Operations Research
Operations Research
Chapter 1
Introduction
Operations research, or operational research in British usage, is a discipline that deals with the
application of advanced analytical methods to help make better decisions. Further, the term
'operational analysis' is used in the British (and some British Commonwealth) military, as an
intrinsic part of capability development, management and assurance. In particular, operational
analysis forms part of the Combined Operational Effectiveness and Investment Appraisals
(COEIA), which support British defence capability acquisition decision-making.
Overview
Operational research (OR) encompasses a wide range of problem-solving techniques and
methods applied in the pursuit of improved decision-making and efficiency, such as simulation,
mathematical optimization, queueing theory and other stochastic-process models, Markov
decision processes, econometric methods, data envelopment analysis, neural networks, expert
systems, decision analysis, and the analytic hierarchy process. Nearly all of these techniques
involve the construction of mathematical models that attempt to describe the system. Because of
the computational and statistical nature of most of these fields, OR also has strong ties to
computer science and analytics. Operational researchers faced with a new problem must
determine which of these techniques are most appropriate given the nature of the system, the
goals for improvement, and constraints on time and computing power.
The major subdisciplines in modern operational research, as identified by the journal Operations
Research, are:
Computing and information technologies
Financial engineering
Manufacturing, service sciences, and supply chain management
Marketing Engineering
Policy modeling and public sector work
Revenue management
Simulation
Stochastic models
Transportation
Definition
"Operations research is the application of scientific methods to arrive at the optimal solutions to
the problems."
Operations research OR means to apply scientific and mathematical methods for decision
making and problem solving.
OR does not provide decisions else it provides quantitative data to the managers. The managers
use this data for making decisions.
OR tries to find better solutions to different problems. Hence, it is used to solve complex
management problems.
OR was first used during the Second World War by England to solve their complex war
problems. England made OR teams. These teams included expert mathematicians, statisticians,
scientists, engineers, etc. These OR teams were very successful in solving England's war
problems. Therefore, United States of America (USA) also started using OR to solve their war
problems. After the war, soon industries and businesses also started using OR to solve their
complex management problems.
Scope of Operation Research
In recent years of organized development, OR has entered successfully in many different areas of
research. It is useful in the following various important fields
In agriculture
With the sudden increase of population and resulting shortage of food, every country is facing
the problem of
Hence there is a requirement of determining best policies under the given restrictions. Therefore
a good quantity of work can be done in this direction.
In finance
In these recent times of economic crisis, it has become very essential for every government to do
a careful planning for the economic progress of the country. OR techniques can be productively
applied
In industry
If the industry manager makes his policies simply on the basis of his past experience and a day
approaches when he gets retirement, then a serious loss is encounter ahead of the industry. This
heavy loss can be right away compensated through appointing a young specialist of OR
techniques in business management. Thus OR is helpful for the industry director in deciding
optimum distribution of several limited resources like men, machines, material, etc to reach at
the optimum decision.
In marketing
Where to allocate the products for sale so that the total cost of transportation is set to be
minimum
The minimum per unit sale price
The size of the stock to come across with the future demand
How to choose the best advertising media with respect to cost, time etc?
How, when and what to buy at the minimum likely cost?
In personnel management
In L.I.C
The Operation Research may be considered as a tool which is employed to raise the efficiency of
management decisions. OR is the objective complement to the subjective feeling of the
administrator (decision maker). Scientific method of OR is used to comprehend and explain the
phenomena of operating system.
The benefits of OR study approach in business and management decision making may be
categorize as follows
Better control
The management of large concerns finds it much expensive to give continuous executive
supervisions over routine decisions. An OR approach directs the executives to dedicate their
concentration to more pressing matters. For instance, OR approach handles production
scheduling and inventory control.
Better coordination
Sometimes OR has been very helpful in preserving the law and order situation out of disorder.
For instance, an OR based planning model turns out to be a vehicle for coordinating marketing
decisions with the restrictions forced on manufacturing capabilities.
Better system
OR study is also initiated to examine a particular problem of decision making like setting up a
new warehouse. Later OR approach can be more developed into a system to be employed
frequently. As a result the cost of undertaking the first application may get better profits.
Better decisions
OR models regularly give actions that do enhance an intuitive decision making. Sometimes a
situation may be so complex that the human mind can never expect to assimilate all the
significant factors without the aid of OR and computer analysis.
Characteristics
Thus, operations Research makes use of experience and expertise of people
from different disciplines for developing new methods and procedures. These new
methods may be more effective because these methods evolve from specific tools and
techniques of various disciplines and may be often applied with or without some
modifications and refinements to the business problems.
For example,
(b) O.R is a continuing process. It cannot stop on the application of the model
to one problem, for this may create new problems in other sectors and in the
implementation of the decisions taken. O.R must also specify the organizational
changes required to implement decisions and control the results thereof. Without this
work of the O.R. practitioner is incomplete.
(c) Objective O.R. attempts to find the best or optimal solution to the
problem under consideration. To do this it is necessary to define a measure of
effectiveness that takes into account the goals (objective) of the organization. In other
words "Operations Research is the scientific study of large systems with a view to
identify problem areas and provide the managers with a quantitative basis for
decisions which will enhance their effectiveness in achieving the specified
objectives."
(h) O.R gives only bad answers to the problems where worse could be given i.e., it
cannot give perfect answers to the problems. Thus O.R. improves only the quality of
the solution.
(i) Methodological Approach. O.R. utilizes the scientific method. Specifically the
process begins with the careful observation and formulation of the problem. The next
step is to construct a scientific (typically mathematical or simulation) model that
attempts to abstract the essence of the real problem. Form this model conclusion or
solutions are obtained which are also valid for the real problem in an iterative fashion
the model is then verified through appropriate experimentation.
Operations research is a robust tool and offers directions in making the best decisions possible
given the data available.
Although, these steps are logically placed in a sequence, there is always an interplay among these steps.
Each step may be subjected to change in the light of revelation through experience. The steps are listed
Many of the Operations Research and statistical techniques have been very
popular among managers for not only defining the problems but also
determining the optimal solutions to such problems.
A word of caution here would not be out of context. Many a time, a problem is
defined by the managers in terms of the quantitative technique selected for
solution which results in ignoring important decision variables.
It is necessary, thus, to focus on the problem and seek its solution with the
help of technique and avoid the tendency of finding ‘right solutions’ to ‘wrong
problems’.
It has been observed that a manager has to take decisions with inadequate
information regarding the decision variables; the time frame available for
analysis is too small to permit detailed modeling without entailing opportunity
losses due to delayed decisions.
The decision making process under this approach involves the following steps:
i. Environment intelligence for searching problems and opportunities;
identifying the available informational inputs regarding the decision variables.
iii. Making a choice among the alternatives designed in the preceding stage.
As managers reflect their goals in terms of different programmes, they
evaluate alternatives on the basis of the goals set forth in the concerned
programme.
The information may be gathered using adhoc queries from the information
systems. An alternative to this could be where the information systems them-
selves are proactive and report the opportunities and threats automatically
Modern business information systems, generally, offer exception reporting
facilities with varying degrees of analysis of information.
Many software companies are now bundling ‘intelligent agents’ into their
software products so that the analysis of information is done automatically by
the software and exceptional circumstances are reported to the user.
Information for designing:
Designing the model for decision making can be greatly helped by modern
information systems. Information for programmable decisions, which can be
taken using predetermined algorithm, can be generated easily and
automatically. For example, the inventory problems, sequencing of jobs,
production planning and scheduling can easily be automated by information
systems.
The ‘what-if’ feature offers answers to questions like ‘By how much will our
pay bill increase if the proposals of the trade union are accepted?’ or ‘What
will be the impact of the proposed increase in the price as well as distributors’
margin on net sales?’
The ‘goal seeking’ helps to answer questions like ‘ By how much percentage
should the fixed overheads be decreased in order to lower the breakeven
point by 20%.’ Answers to a combination of such questions regarding a given
non-programmable problem can help in identifying alternative courses of
action.
Information for choice:
Business information systems can help a manager in evaluating the alternatives and making the choice
out of the available alternatives. The choice in case of programmable decisions can be easily made by
The computing power available with IT infrastructure would make it possible and convenient to apply
In the case of non-programmable decisions, the information systems can help in identifying the satisfying
solution based on bounded rationality. The advantage of IT infrastructure would be both in terms of speed
For example, in an investment decision, a manager can use various methods of evaluating investment
proposals such as Pay-Back method, Net Present Value method, Internal Rate of Return method, etc. For
using these methods, the manager shall need information regarding pay-back period, discounted cash
flows, internal rate of return, etc. Such values can easily be calculated with the help of IT infrastructure.
In fact, most of the electronic spreadsheets offer facilities for calculating these and other such related
values quickly using simple procedures. Business information systems can also help in monitoring the
performance and obtaining quick feedback during implementation of the decision. The quicker feedback
When the decision is being implemented, information regarding the performance and the feedback
regarding the success of decision is very helpful in identifying mistakes in playing the decision role. Better
alternatives are thought of and evaluated for revision of earlier decision, if necessary.
This approach is more appropriate for higher order decisions, i.e. tactical and strategic. It is
iv. Outcome of the decision is subject to a number of factors beyond the control of the manager.
A variation of this approach suggests successive limited comparisons as values become clear only at the
margin when specific policies are considered. Other approaches to decision making have behavioural
frames of references.
They recognise the psychological biases in decision making, role of cognitive style of decision making
and the influence of differing personal traits of the individual managers in the decision making process.
One of the important dimensions of managerial decision making is that many decisions are taken not by a
manager alone, but by a group of people in the business enterprise. As there are many decisions that
have implications on different organisational units having sometimes conflicting objectives, the decision
Generally, such decisions are taken on the basis of consensus among the concerned managers. Key to
the successful group decision making lies in better communication and frequency of meetings to
deliberate on a problem. Modern information systems are equipped with powerful communication systems
E-mail is now recognised as an important channel of communication in group decision making. The
business information systems are helpful not only in making a choice among the alternatives but also
offer the line of reasoning followed to arrive at the selection. This helps in convincing others in the group
regarding the merits of the decision. Many software companies are now offering ‘groupware’ software for
by Henry Mintzberg.According to his model, a manager plays the following three basic roles:
Interpersonal role:
A manager plays the role of a leader of his subordinates, maintains liaison with the external environment
Information role:
His information role includes the responsibility of managing information in the organisation. He is
responsible for making information available within the organisation and should be able to communicate
Decision role:
A manager is supposed to take decision for bringing about changes in the light of changes in the
environment. He should make decisions in case any problem arises, i.e. he should take up the role of a
disturbance handler.
He is also supposed to take up the role of resource allocator because he is accountable for the proper
use of resources. Associated with this responsibility, is also the role of a negotiator who resolves
For performing these roles, a manager needs a lot of information. Modern information systems can be of
great help in improving interpersonal communication, for which managers have so far been relying mainly
on verbal communication. The information roles can be best played with the help of proper IT
infrastructure.
A manager is far better equipped now to perform this role, with the improved information handling tools
The executive information systems can help a manager in monitoring information on the performance of
different organisational units. It can also help in dissemination of information among his peers and
He is also able to communicate more effectively with the external entities regarding state of affairs in the
enterprise. Today, a manager is in a better position to explain any decline in the performance to the
shareholders and investors than ever before, thanks to the availability of executive information systems.
Although the analyst would hope to study the broad implications of the problem using a systems approach, a model
cannot include every aspect of a situation. A model is always an abstraction that is of necessity simpler than the real
situation. Elements that are irrelevant or unimportant to the problem are to be ignored, hopefully leaving sufficient
detail so that the solution obtained with the model has value with regard to the original problem.
Models must be both tractable, capable of being solved, and valid, representative of the original situation. These
dual goals are often contradictory and are not always attainable. It is generally true that the most powerful solution
methods can be applied to the simplest, or most abstract, model.
Linear Programming
A typical mathematical program consists of a single objective function, representing either a profit to be
maximized or a cost to be minimized, and a set of constraints that circumscribe the decision variables. In
the case of a linear program (LP) the objective function and constraints are all linear functions of the
decision variables. At first glance these restrictions would seem to limit the scope of the LP model, but
this is hardly the case. Because of its simplicity, software has been developed that is capable of solving
problems containing millions of variables and tens of thousands of constraints. Countless real-world
applications have been successfully modeled and solved using linear programming techniques.
The term network flow program describes a type of model that is a special case of the more general
linear program. The class of network flow programs includes such problems as the transportation
problem, the assignment problem, the shortest path problem, the maximum flow problem, the pure
minimum cost flow problem, and the generalized minimum cost flow problem. It is an important class
because many aspects of actual situations are readily recognized as networks and the representation of
the model is much more compact than the general linear program. When a situation can be entirely
modeled as a network, very efficient algorithms exist for the solution of the optimization problem, many
times more efficient than linear programming in the utilization of computer time and space resources.
Integer Programming
Integer programming is concerned with optimization problems in which some of the variables
are required to take on discrete values. Rather than allow a variable to assume all real values in a
given range, only predetermined discrete values within the range are permitted. In most cases,
these values are the integers, giving rise to the name of this class of models.
Models with integer variables are very useful. Situations that cannot be modeled by linear
programming are easily handled by integer programming. Primary among these involve binary
decisions such as yes-no, build-no build or invest-not invest. Although one can model a binary
decision in linear programming with a variable that ranges between 0 and 1, there is nothing that
keeps the solution from obtaining a fractional value such as 0.5, hardly acceptable to a decision
maker. Integer programming requires such a variable to be either 0 or 1, but not in-between.
Unfortunately integer programming models of practical size are often very difficult or impossible
to solve. Linear programming methods can solve problems orders of magnitude larger than
integer programming methods. Still, many interesting problems are solvable, and the growing
power of computers makes this an active area of interest in Operations Research.
Nonlinear Programming
When expressions defining the objective function or constraints of an optimization model are not
linear, one has a nonlinear programming model. Again, the class of situations appropriate for
nonlinear programming is much larger than the class for linear programming. Indeed it can be
argued that all linear expressions are really approximations for nonlinear ones.
Since nonlinear functions can assume such a wide variety of functional forms, there are many
different classes of nonlinear programming models. The specific form has much to do with how
easily the problem is solve, but in general a nonlinear programming model is much more difficult
to solve than a similarly sized linear programming model.
Dynamic Programming
Dynamic programming (DP) models are represented in a different way than other mathematical
programming models. Rather than an objective function and constraints, a DP model describes a
process in terms of states, decisions, transitions and returns. The process begins in some initial
state where a decision is made. The decision causes a transition to a new state. Based on the
starting state, ending state and decision a return is realized. The process continues through a
sequence of states until finally a final state is reached. The problem is to find the sequence that
maximizes the total return.
The models considered here are for discrete decision problems. Although traditional integer
programming problems can be solved with DP, the models and methods are most appropriate for
situations that are not easily modeled using the constructs of mathematical programming.
Objectives with very general functional forms may be handled and a global optimal solution is
always obtained. The price of this generality is computational effort. Solutions to practical
problems are often stymied by the "curse of dimensionally" where the number of states grows
exponentially with the number of dimensions of the problem.
Stochastic Programming
Stochastic programming explicitly recognizes uncertainty by using random variables for some
aspects of the problem. With probability distributions assigned to the random variables, an
expression can be written for the expected value of the objective to be optimized. Then a variety
of computational methods can be used to maximize or minimize the expected value. This page
provides a brief introduction to the modeling process.
Combinatorial Optimization
The most general type of optimization problem and one that is applicable to most spreadsheet models is
the combinatorial optimization problem. Many spreadsheet models contain variables and compute
measures of effectiveness. The spreadsheet user often changes the variables in an unstructured way to
look for the solution that obtains the greatest or least of the measure. In the words of OR, the analyst is
searching for the solution that optimizes an objective function, the measure of effectiveness.
Combinatorial optimization provides tools for automating the search for good solutions and can be of
Stochastic Processes
In many practical situations the attributes of a system randomly change over time. Examples
include the number of customers in a checkout line, congestion on a highway, the number of
items in a warehouse, and the price of a financial security, to name a few. When aspects of the
process are governed by probability theory, we have a stochastic process.
The model is described in part by enumerating the states in which the system can be found. The
state is like a snapshot of the system at a point in time that describes the attributes of the system.
The example for this section is an Automated Teller Machine (ATM) system and the state is the
number of customers at or waiting for the machine. Time is the linear measure through which the
system moves. Events occur that change the state of the system. For the ATM example the
events are arrivals and departures.
Say a system is observed at regular intervals such as every day or every week. Then the stochastic
process can be described by a matrix which gives the probabilities of moving to each state from every
other state in one time interval. Assuming this matrix is unchanging with time, the process is called a
Discrete Time Markov Chain (DTMC). Computational techniques are available to compute a variety of
system measures that can be used to analyze and evaluate a DTMC model. This section illustrates how
to construct a model of this type and the measures that are available.
activities are exponentially distributed. Time is a continuous parameter. The process satisfies the
Markovian property and is called a Continuous Time Markov Chain (CTMC). The process is entirely
described by a matrix showing the rate of transition from each state to every other state. The rates are
the parameters of the associated exponential distributions. The analytical results are very similar to
those of a DTMC. The ATM example is continued with illustrations of the elements of the model and the
Simulation
Chapter 2
Definition of Matrices
A matrix is a rectangular array of numbers or other mathematical objects for which operations
such as addition and multiplication are defined. Most commonly, a matrix over a field F is a
rectangular array of scalars each of which is a member of F. Most of this article focuses on real
and complex matrices, that is, matrices whose elements are real numbers or complex numbers,
respectively.
The numbers, symbols or expressions in the matrix are called its entries or its elements. The
horizontal and vertical lines of entries in a matrix are called rows and columns, respectively.
Size
The size of a matrix is defined by the number of rows and columns that it contains. A matrix
with m rows and n columns is called an m × n matrix or m-by-n matrix, while m and n are called
its dimensions. For example, the matrix A above is a 3 × 2 matrix.
Matrices which have a single row are called row vectors, and those which have a single column
are called column vectors. A matrix which has the same number of rows and columns is called a
square matrix. A matrix with an infinite number of rows or columns (or both) is called an infinite
matrix. In some contexts, such as computer algebra programs, it is useful to consider a matrix
with no rows or no columns, called an empty matrix.
Row
1 × n A matrix with one row, sometimes used to represent a vector
vector
Column
n × 1 A matrix with one column, sometimes used to represent a vector
vector
A matrix with the same number of rows and columns, sometimes used to
Square
n × n represent a linear transformation from a vector space to itself, such as
matrix
reflection, rotation, or shearing.
Types of Matrices
If all entries of A below the main diagonal are zero, A is called an upper triangular matrix.
Similarly if all entries of A above the main diagonal are zero, A is called a lower triangular
matrix. If all off-diagonal elements are zero, A is called a diagonal matrix.
Identity matrix
The identity matrix In of size n is the n-by-n matrix in which all the elements on the main
diagonal are equal to 1 and all other elements are equal to 0, e.g.
Diagonal and triangular matrix
If all entries of A below the main diagonal are zero, A is called an upper triangular matrix.
Similarly if all entries of A above the main diagonal are zero, A is called a lower triangular
matrix. If all off-diagonal elements are zero, A is called a diagonal matrix.
Identity matrix
The identity matrix In of size n is the n-by-n matrix in which all the elements on the main
diagonal are equal to 1 and all other elements are equal to 0, e.g.
It is a square matrix of order n, and also a special kind of diagonal matrix. It is called an identity
matrix because multiplication with it leaves a matrix unchanged:
A square matrix A that is equal to its transpose, that is, A = A, is a symmetric matrix. If instead,
A is equal to the negative of its transpose, that is, A = −A, then A is a skew-symmetric matrix. In
complex matrices, symmetry is often replaced by the concept of Hermitian matrices, which
satisfy A = A, where the star or asterisk denotes the conjugate transpose of the matrix, that is, the
transpose of the complex conjugate of A.
By the spectral theorem, real symmetric matrices and complex Hermitian matrices have an
eigenbasis; that is, every vector is expressible as a linear combination of eigenvectors. In both
cases, all eigenvalues are real. This theorem can be generalized to infinite-dimensional situations
related to matrices with infinitely many rows and columns, see below.
A square matrix A is called invertible or non-singular if there exists a matrix B such that
AB = BA = In.
Definite matrix
Q(x) = xAx
takes only positive values (respectively only negative values; both some negative and some
positive values). If the quadratic form takes only non-negative (respectively only non-positive)
values, the symmetric matrix is called positive-semidefinite (respectively negative-semidefinite);
hence the matrix is indefinite precisely when it is neither positive-semidefinite nor negative-
semidefinite.
A symmetric matrix is positive-definite if and only if all its eigenvalues are positive, that is, the
matrix is positive-semidefinite and it is invertible. The table at the right shows two possibilities
for 2-by-2 matrices.
Allowing as input two different vectors instead yields the bilinear form associated to A:
BA (x, y) = xAy.
Orthogonal matrix
An orthogonal matrix is a square matrix with real entries whose columns and rows are
orthogonal unit vectors (that is, orthonormal vectors).
An orthogonal matrix A is necessarily invertible (with inverse A = A), unitary (A = A*), and
normal (A*A = AA*). The determinant of any orthogonal matrix is either +1 or −1. A special
orthogonal matrix is an orthogonal matrix with determinant +1. As a linear transformation, every
orthogonal matrix with determinant +1 is a pure rotation, while every orthogonal matrix with
determinant -1 is either a pure reflection, or a composition of reflection and rotation.
Main operations
Trace
The trace, tr(A) of a square matrix A is the sum of its diagonal entries. While matrix
multiplication is not commutative as mentioned above, the trace of the product of two matrices is
independent of the order of the factors:
tr(AB) = tr(BA).
Also, the trace of a matrix is equal to that of its transpose, that is,
tr(A) = tr(A).
Determinant
The determinant det(A) or |A| of a square matrix A is a number encoding certain properties of
the matrix. A matrix is invertible if and only if its determinant is nonzero. Its absolute value
equals the area (in R) or volume (in R) of the image of the unit square (or cube), while its sign
corresponds to the orientation of the corresponding linear map: the determinant is positive if and
only if the orientation is preserved.
The determinant of 3-by-3 matrices involves 6 terms (rule of Sarrus). The more lengthy Leibniz
formula generalises these two formulae to all dimensions.
The determinant of a product of square matrices equals the product of their determinants:
Adding a multiple of any row to another row, or a multiple of any column to another column,
does not change the determinant. Interchanging two rows or two columns affects the determinant
by multiplying it by −1. Using these operations, any matrix can be transformed to a lower (or
upper) triangular matrix, and for such matrices the determinant equals the product of the entries
on the main diagonal; this provides a method to calculate the determinant of any matrix. Finally,
the Laplace expansion expresses the determinant in terms of minors, that is, determinants of
smaller matrices. This expansion can be used for a recursive definition of determinants (taking as
starting case the determinant of a 1-by-1 matrix, which is its unique entry, or even the
determinant of a 0-by-0 matrix, which is 1), that can be seen to be equivalent to the Leibniz
formula. Determinants can be used to solve linear systems using Cramer's rule, where the
division of the determinants of two related square matrices equates to the value of each of the
system's variables.
Av = λv
Algebra of Matrix
Matrices can be generalized in different ways. Abstract algebra uses matrices with entries in
more general fields or even rings, while linear algebra codifies properties of matrices in the
notion of linear maps. It is possible to consider matrices with infinitely many columns and rows.
Another extension are tensors, which can be seen as higher-dimensional arrays of numbers, as
opposed to vectors, which can often be realised as sequences of numbers, while matrices are
rectangular or two-dimensional arrays of numbers. Matrices, subject to certain requirements tend
to form groups known as matrix groups. Similarly under certain conditions matrices form rings
known as matrix rings. Though the product of matrices is not in general commutative yet certain
matrices form fields known as matrix fields .
This article focuses on matrices whose entries are real or complex numbers. However, matrices
can be considered with much more general types of entries than real or complex numbers. As a
first step of generalization, any field, that is, a set where addition, subtraction, multiplication and
division operations are defined and well-behaved, may be used instead of R or C, for example
rational numbers or finite fields. For example, coding theory makes use of matrices over finite
fields. Wherever eigenvalues are considered, as these are roots of a polynomial they may exist
only in a larger field than that of the entries of the matrix; for instance they may be complex in
case of a matrix with real entries. The possibility to reinterpret the entries of a matrix as elements
of a larger field (e.g., to view a real matrix as a complex matrix whose entries happen to be all
real) then allows considering each square matrix to possess a full set of eigenvalues.
Alternatively one can consider only matrices with entries in an algebraically closed field, such as
C, from the outset.
More generally, abstract algebra makes great use of matrices with entries in a ring R. Rings are a
more general notion than fields in that a division operation need not exist. The very same
addition and multiplication operations of matrices extend to this setting, too. The set M(n, R) of
all square n-by-n matrices over R is a ring called matrix ring, isomorphic to the endomorphism
ring of the left R-module R. If the ring R is commutative, that is, its multiplication is
commutative, then M(n, R) is a unitary noncommutative (unless n = 1) associative algebra over
R. The determinant of square matrices over a commutative ring R can still be defined using the
Leibniz formula; such a matrix is invertible if and only if its determinant is invertible in R,
generalising the situation over a field F, where every nonzero element is invertible. Matrices
over superrings are called supermatrices.
Matrices do not always have all their entries in the same ring – or even in any ring at all. One
special but common case is block matrices, which may be considered as matrices whose entries
themselves are matrices. The entries need not be quadratic matrices, and thus need not be
members of any ordinary ring; but their sizes must fulfil certain compatibility conditions.
Linear maps R → R are equivalent to m-by-n matrices, as described above. More generally, any
linear map f: V → W between finite-dimensional vector spaces can be described by a matrix A =
(aij), after choosing bases v1, ..., vn of V, and w1, ..., wm of W (so n is the dimension of V and m is
the dimension of W), which is such that
In other words, column j of A expresses the image of vj in terms of the basis vectors wi of W;
thus this relation uniquely determines the entries of the matrix A. Note that the matrix depends
on the choice of the bases: different choices of bases give rise to different, but equivalent
matrices. Many of the above concrete notions can be reinterpreted in this light, for example, the
transpose matrix A describes the transpose of the linear map given by A, with respect to the
dual bases.
These properties can be restated in a more natural way: the category of all matrices with entries
in a field with multiplication as composition is equivalent to the category of finite dimensional
vector spaces and linear maps over this field.
More generally, the set of m×n matrices can be used to represent the R-linear maps between the
free modules R and R for an arbitrary ring R with unity. When n = m composition of these maps
is possible, and this gives rise to the matrix ring of n×n matrices representing the endomorphism
ring of R.
Matrix groups
Any property of matrices that is preserved under matrix products and inverses can be used to
define further matrix groups. For example, matrices with a given size and with a determinant of 1
form a subgroup of (that is, a smaller group contained in) their general linear group, called a
special linear group. Orthogonal matrices, determined by the condition
MM = I,
form the orthogonal group. Every orthogonal matrix has determinant 1 or −1. Orthogonal
matrices with determinant 1 form a subgroup called special orthogonal group.
Every finite group is isomorphic to a matrix group, as one can see by considering the regular
representation of the symmetric group. General groups can be studied using matrix groups,
which are comparatively well-understood, by means of representation theory.
Infinite matrices
It is also possible to consider matrices with infinitely many rows and/or columns even if, being
infinite objects, one cannot write down such matrices explicitly. All that matters is that for every
element in the set indexing rows, and every element in the set indexing columns, there is a well-
defined entry (these index sets need not even be subsets of the natural numbers). The basic
operations of addition, subtraction, scalar multiplication and transposition can still be defined
without problem; however matrix multiplication may involve infinite summations to define the
resulting entries, and these are not defined in general.
If R is any ring with unity, then the ring of endomorphisms of as a right R module is isomorphic
to the ring of column finite matrices whose entries are indexed by , and whose columns each
contain only finitely many nonzero entries. The endomorphisms of M considered as a left R
module result in an analogous object, the row finite matrices whose rows each only have
finitely many nonzero entries.
If infinite matrices are used to describe linear maps, then only those matrices can be used all of
whose columns have but a finite number of nonzero entries, for the following reason. For a
matrix A to describe a linear map f: V→W, bases for both spaces must have been chosen; recall
that by definition this means that every vector in the space can be written uniquely as a (finite)
linear combination of basis vectors, so that written as a (column) vector v of coefficients, only
finitely many entries vi are nonzero. Now the columns of A describe the images by f of individual
basis vectors of V in the basis of W, which is only meaningful if these columns have only finitely
many nonzero entries. There is no restriction on the rows of A however: in the product A·v there
are only finitely many nonzero coefficients of v involved, so every one of its entries, even if it is
given as an infinite sum of products, involves only finitely many nonzero terms and is therefore
well defined. Moreover, this amounts to forming a linear combination of the columns of A that
effectively involves only finitely many of them, whence the result has only finitely many
nonzero entries, because each of those columns do. One also sees that products of two matrices
of the given type is well defined (provided as usual that the column-index and row-index sets
match), is again of the same type, and corresponds to the composition of linear maps.
If R is a normed ring, then the condition of row or column finiteness can be relaxed. With the
norm in place, absolutely convergent series can be used instead of finite sums. For example, the
matrices whose column sums are absolutely convergent sequences form a ring. Analogously of
course, the matrices whose row sums are absolutely convergent series also form a ring.
In that vein, infinite matrices can also be used to describe operators on Hilbert spaces, where
convergence and continuity questions arise, which again results in certain constraints that have to
be imposed. However, the explicit point of view of matrices tends to obfuscate the matter, and
the abstract and more powerful tools of functional analysis can be used instead.
Empty matrices
An empty matrix is a matrix in which the number of rows or columns (or both) is zero. Empty matrices help
dealing with maps involving the zero vector space. For example, if A is a 3-by-0 matrix and B is a 0-by-3
matrix, then AB is the 3-by-3 zero matrix corresponding to the null map from a 3-dimensional space V to
itself, while BA is a 0-by-0 matrix. There is no common notation for empty matrices, but most computer
algebra systems allow creating and computing with them. The determinant of the 0-by-0 matrix is 1 as
follows from regarding the empty product occurring in the Leibniz formula for the determinant as 1. This
value is also consistent with the fact that the identity map from any finite dimensional space to itself has
determinant 1, a fact that is often used as a part of the characterization of determinants.
Transpose
In linear algebra, the transpose of a matrix A is another matrix A (also written A′, A,A or A) created
by any one of the following equivalent actions:
Formally, the i th row, j th column element of A is the j th row, i th column element ofA:
The transpose of a matrix was introduced in 1858 by the British mathematician Arthur Cayley.
Properties
For matrices A, B and scalar c we have the following properties of transpose:
A square matrix whose transpose is equal to its negative is called a skew-symmetric matrix; that
is, A is skew-symmetric if
A square complex matrix whose transpose is equal to the matrix with every entry replaced by
its complex conjugate(denoted here with an overline) is called a Hermitian matrix (equivalent to the
matrix being equal to its conjugate transpose); that is, A is Hermitian if
A square complex matrix whose transpose is equal to the negation of its complex conjugate is called
a skew-Hermitian matrix; that is, A is skew-Hermitian if
A square matrix whose transpose is equal to its inverse is called an orthogonal matrix; that is, A is
orthogonal if
where ⟨·,·⟩ is the duality pairing. This definition also applies unchanged to left modules and to vector
spaces.
The definition of the transpose may be seen to be independent of any bilinear form on the vector
spaces, unlike the adjoint (below).
If the matrix A describes a linear map with respect to bases of V and W, then the matrix A describes
the transpose of that linear map with respect to the dual bases.
Transpose of a bilinear form[edit]
Main article: Bilinear form
Every linear map to the dual space f : V → V defines a bilinear form B : V × V → F, with the
relation B(v, w) = f(v)(w). By defining the transpose of this bilinear form as the bilinear form B defined
by the transpose f : V → V i.e.B(w, v) = f(w)(v), we find that B(v, w) = B(w, v).
Adjoint[edit]
Not to be confused with Hermitian adjoint.
The adjoint allows us to consider whether g : W → V is equal to f : W → V. In particular, this allows
the orthogonal groupover a vector space V with a quadratic form to be defined without reference to
matrices (nor the components thereof) as the set of all linear maps V → V for which the adjoint
equals the inverse.
Over a complex vector space, one often works with sesquilinear forms (conjugate-linear in one
argument) instead of bilinear forms. The Hermitian adjoint of a map between such spaces is defined
similarly, and the matrix of the Hermitian adjoint is given by the conjugate transpose matrix if the
bases are orthonormal.
Adjoint
The matrix formed by taking the transpose of the cofactor matrix of a given original matrix. The
adjoint of matrix A is often written adj A.
Note: This is in fact only one type of adjoint. More generally, an adjoint of a matrix is any
mapping of a matrix which possesses certain properties.
Elementary and Inverse of a matrix
In mathematics, an elementary matrix is a matrix which differs from the identity matrix by one single
elementary row operation. The elementary matrices generate the general linear group of invertible
matrices. Left multiplication (pre-multiplication) by an elementary matrix represents elementary row
operations, while right multiplication (post-multiplication) represents elementary column
operations. The acronym "ERO" is commonly used for "elementary row operations".
Elementary row operations are used in Gaussian elimination to reduce a matrix to row echelon form.
They are also used inGauss-Jordan elimination to further reduce the matrix to reduced row echelon
form.
Operations[edit]
There are three types of elementary matrices, which correspond to three types of row operations
(respectively, column operations):
Row switching
A row within the matrix can be switched with another row.
Row multiplication
Each element in a row can be multiplied by a non-zero constant.
Row addition
A row can be replaced by the sum of that row and a multiple of another row.
If E is an elementary matrix, as described below, to apply the elementary row operation to a
matrix A, one multiplies the elementary matrix on the left, E⋅A. The elementary matrix for any row
operation is obtained by executing the operation on the identity matrix.
Row-switching transformations[edit]
The first type of row operation on a matrix A switches all matrix elements on row i with their
counterparts on row j. The corresponding elementary matrix is obtained by swapping row i and
row j of the identity matrix.
Much like normal numbers we use the notation A to denote the inverse of matrix A. Some
important things to remember about inverse matrices is they are not commutative, and a full
generalization is possible only if the matrices you are using a square. (meaning they have the
same number of rows and columns, an n x n matrix)
Theorem 5 reveals something else useful about the inverse of matrices. Theorem 5 states that
if matrix A is invertible then the equation Ax = b has a unique solution, x. We can find this
solution by x = A b. The following example demonstrates this usefulness of this equation.
The following proof will help prove theorem 5 by proving 1) that the solution exists and 2)
this solution is unique.
More useful properties of inverse matrices are revealed in Theorem 6. This theorem states:
Are you skeptical at all with theorem 6? Well, incase you are, here are the proofs for each
part.
To Prove 1: we need to find matrix C so that A C = I and C A = I We already know that these
equations will still hold true if we put A in place of C. (see above) Thus A is invertible, and
A is its inverse.
Also, it is useful to remember that the product of n x n invertible matrices is invertible, and
the inverse is the product of their inverses in the reverse order.
Elementary Matrices
The usefulness of matrices continues to expand with the introduction of elementary matrices.
An elementary matrix is a matrix that is obtained by performing a single elementary row
operation to an identity matrix. An elementary row operation is the process of either (1)
replacing one row of a matrix with the sum of itself and a multiple of another row (2)
Interchanging two rows (3) Multiplying all entries in a row by a nonzero constant. If an
elementary row operation is performed on an n x n matrix A, the resulting matrix can be
written as E A, where the n x n matrix E is created by performing the same row operation on
Im . The following example demonstrates this concept:
It should also be noted that each elementary matrix E is invertible. The inverse of E is the
elementary matrix of the same type that transforms E back into I.
Finally theorem 7 gives us a way to visualize an inverse matrix and helps us develop a
method of finding inverse matrices. Theorem 7 says that an n x n matrix, called A is
invertible iff (if and only if) A is row equivalent to In , and any sequence of elementary row
operations that reduces A to In also transforms In into A.
Say if we placed a matrix A and its identity matrix I next to each other and formed an
augmented matrix. Row operations done to this matrix would produce the same results
on both A and I. The following is an algorithm for
finding A or the inverse of matrix A. First row
reduce the augmented matrix [ A I ] . If I and A are
row equivalent then the matrix [ A I ] is row
equivalent to [ I A ]. If not the A does not have an
inverse.
Another View of Matrix Inversion
Finally, this section gives us another way to view inverse matrices. This new way to view
matrices also introduces a new trick to us. We see that the "super augmented" matrix [ A I ]
which is matrix A and its identity matrix row reduces to the matrix [ I A ] Now how any why
does this work!? Well, in general the matrix [ A B ] row reduces to [ I A B ]
The matrix method of solving systems of linear equations is just the elimination method in
disguise. By using matrices, the notation becomes a little easier.
{3x+4y=52x−y=7{3x+4y=52x−y=7
The first step is to convert this into a matrix. Make sure all equations are in standard form
(Ax+By=C)(Ax+By=C), and use the coefficients of each equation to form each row of the
matrix. It may help you to separate the right column with a dotted line.
[324−1∣∣∣57][342−1|57]
Next, we use the matrix row operations to change the 2×22×2 matrix on the left side to the
identity matrix. First, we want to get a zero in Row 11, Column 22. So, add 44 times Row 22 to
Row 11.
[1120−1∣∣∣337] → added (4×Row 2)to Row 1[1102−1|337] → added (4×Row 2)to Row 1
[120−1∣∣∣37] → divided Row 1 by 11[102−1|37] → divided Row 1 by 11
[100−1∣∣∣31] → added (−2×Row 1)to Row 2[100−1|31] → added (−2×Row 1)to Row 2
[1001∣∣∣3−1] → multiplied Row 2 by −1[1001|3−1] → multiplied Row 2 by −1
Now that we have the 2×22×2 identity matrix on the left, we can read off the solutions from the
right column:
x=3y=−1x=3y=−1
The same method can be used for nn linear equations in nn unknowns; in this case you would
create an n×(n−1)n×(n−1) matrix, and use the matrix row operations to get the identity
n×nn×n matrix on the left side.
Important Note: If the equations represented by your original matrix represent parallel lines,
you will not be able to get the identity matrix using the row operations. In this case, the solution
either does not exist or there are infinitely many solutions to the system.
Input-Output analysis
Despite the clear ability of the input-output model to depict and analyze the dependence of one industry
or sector on another, Leontief and others never managed to introduce the full spectrum of dependency
relations in a market economy. In 2003, Mohammad Gani [1], a pupil of Leontief, introduced consistency
analysis in his book 'Foundations of Economic Science' (ISBN 984320655X), which formally looks
exactly like the input–output table but explores the dependency relations in terms of payments and
intermediation relations. Consistency analysis explores the consistency of plans of buyers and sellers by
decomposing the input–output table into four matrices, each for a different kind of means of payment. It
integrates micro and macroeconomics in one model and deals with money in an ideology-free manner. It
deals with the flow of funds via the movement of goods.
The practice is most associated with highly collaborative and complex projects, such as building
aircraft, but is also widely used in many product/project management situations. Even when a
company does not label its structure a matrix system or represent it as such on an organization
chart, there may be an implicit matrix structure any time employees are grouped into work teams
(this does not normally include committees, task forces, and the like) that are headed by someone
other than their primary supervisor.
Theoretically, managers of project groups and managers of functional groups have roughly equal
authority within the company. As indicated by the matrix, many employees report to at least two
managers. For instance, a member of the accounting department might be assigned to work with
the consumer products division, and would report to managers of both departments. Generally,
however, managers of functional areas and divisions report to a single authority, such as a
president or vice president.
Although all matrix structures entail some form of dual authority and multidisciplinary grouping,
there are several variations. For example, Kenneth Knight identified three basic matrix
management models: coordination, overlay, and secondment. Each of the models can be
implemented in various forms that differ in attributes related to decision-making roles,
relationships with outside suppliers and buyers, and other factors. Organizations choose different
models based on such factors as competitive environments, industries, education and maturity
level of the workforce, and existing corporate culture.
In the coordination model, staff members remains part of their original departments (or the
departments they would most likely belong to under a functional or product structure).
Procedures are instituted to ensure cross-departmental cooperation and interaction towards the
achievement of extra-departmental goals. In the overlay model, staff members officially become
members of two groups, each of which has a separate manager. This model represents the
undiluted matrix form described above. In the third version, the secondment model, individuals
move from functional departments into project groups and back again, but may effectively
belong to one or the other at different times.
Chapter 3
Game Theory
Theory of Game
Game theory is "the study of mathematical models of conflict and cooperation between
intelligent rational decision-makers." Game theory is mainly used in economics, political
science, and psychology, as well as logic, computer science, biology and poker. Originally, it
addressed zero-sum games, in which one person's gains result in losses for the other participants.
Today, game theory applies to a wide range of behavioral relations, and is now an umbrella term
for the science of logical decision making in humans, animals, and computers.
Modern game theory began with the idea regarding the existence of mixed-strategy equilibria in
two-person zero-sum games and its proof by John von Neumann. Von Neumann's original proof
used Brouwer fixed-point theorem on continuous mappings into compact convex sets, which
became a standard method in game theory and mathematical economics. His paper was followed
by the 1944 book Theory of Games and Economic Behavior, co-written with Oskar Morgenstern,
which considered cooperative games of several players. The second edition of this book provided
an axiomatic theory of expected utility, which allowed mathematical statisticians and economists
to treat decision-making under uncertainty.
This theory was developed extensively in the 1950s by many scholars. Game theory was later
explicitly applied to biology in the 1970s, although similar developments go back at least as far
as the 1930s. Game theory has been widely recognized as an important tool in many fields. With
the Nobel Memorial Prize in Economic Sciences going to game theorist Jean Tirole in 2014,
eleven game-theorists have now won the economics Nobel Prize. John Maynard Smith was
awarded the Crafoord Prize for his application of game theory to biology.
There is no single "correct" way to build a model and as often noted, model-building is more an
art than a science. The key point to be kept in mind is that most often there is a natural trade-off
between the accuracy of a model and its tractability. At the one extreme, it may be possible to
build a very comprehensive, detailed and exact model of the system at hand; this has the
obviously desirable feature of being a highly realistic representation of the original system.
While the very process of constructing such a detailed model can often aid immeasurably in
better understanding the system, the model may well be useless from an analytical perspective
since its construction may be extremely time-consuming and its complexity precludes any
meaningful analysis. At the other extreme, one could build a less comprehensive model with a
lot of simplifying assumptions so that it can be analyzed easily. However, the danger here is that
the model may be so lacking in accuracy that extrapolating results from the analysis back to the
original system could cause serious errors. Clearly, one must draw a line somewhere in the
middle where the model is a sufficiently accurate representation of the original system, yet
remains tractable. Knowing where to draw such a line is precisely what determines a good
modeler, and this is something that can only come with experience. In the formal definition of a
model that was given above, the key word is "selective." Having a clear problem definition
allows one to better determine the crucial aspects of a system that must be selected for
representation by the model, and the ultimate intent is to arrive at a model that captures all the
key elements of the system while remaining simple enough to analyze.
Physical Models: These are actual, scaled down versions of the original. Examples include a
globe, a scale-model car or a model of a flow line made with elements from a toy construction
set. In general, such models are not very common in operations research, mainly because getting
accurate representations of complex systems through physical models is often impossible.
Analogic Models: These are models that are a step down from the first category in that they are
physical models as well, but use a physical analog to describe the system, as opposed to an exact
scaled-down version. Perhaps the most famous example of an analogic model was the ANTIAC
model (the acronym stood for anti-automatic-computation) which demonstrated that one could
conduct a valid operations research analysis without even resorting to the use of a computer. In
this problem the objective was to find the best way to distribute supplies at a military depot to
various demand points. Such a problem can be solved efficiently by using techniques from
network flow analysis. However the actual procedure that was used took a different approach.
An anthill on a raised platform was chosen as an analog for the depot and little mounds of sugar
on their own platforms were chosen to represent each demand point. The network of roads
connecting the various nodes was constructed using bits of string with the length of each being
proportional to the actual distance and the width to the capacity along that link. An army of ants
was then released at the anthill and the paths that they chose to get to the mounds of sugar were
then observed. After the model attained a steady state, it was found that the ants by virtue of their
own tendencies had found the most efficient paths to their destinations! One could even conduct
some postoptimality analysis. For instance, various transportation capacities along each link
could be analyzed by proportionately varying the width of the link, and a scenario where certain
roads were unusable could be analyzed by simply removing the corresponding links to see what
the ants would then do. This illustrates an analogic model. More importantly, it also illustrates
that while O.R. is typically identified with mathematical analysis, the use of an innovative model
and problem-solving procedure such as the one just described is an entirely legitimate way to
conduct an O.R. study.
Computer Simulation Models: With the growth in computational power these models have
become extremely popular over the last ten to fifteen years. A simulation model is one where the
system is abstracted into a computer program. While the specific computer language used is not
a defining characteristic, a number of languages and software systems have been developed
solely for the purpose of building computer simulation models; a survey of the most popular
systems may be found in OR/MS Today (October 1997, pp. 38-46). Typically, such software has
syntax as well as built-in constructs that allow for easy model development. Very often they also
have provisions for graphics and animation that can help one visualize the system being
simulated. Simulation models are analyzed by running the software over some length of time that
represents a suitable period when the original system is operating under steady state. The inputs
to such models are the decision variables that are under the control of the decision-maker. These
are treated as parameters and the simulation is run for various combinations of values for these
parameters. At the end of a run statistics are gathered on various measures of performance and
these are then analyzed using standard techniques. The decision-maker then selects the
combination of values for the decision variables that yields the most desirable performance.
Simulation models are extremely powerful and have one highly desirable feature: they can be
used to model very complex systems without the need to make too many simplifying
assumptions and without the need to sacrifice detail. On the other hand, one has to be very
careful with simulation models because it is also easy to misuse simulation. First, before using
the model it must be properly validated. While validation is necessary with any model, it is
especially important with simulation. Second, the analyst must be familiar with how to use a
simulation model correctly, including things such as replication, run length, warmup etc; a
detailed explanation of these concepts is beyond the scope of this chapter but the interested
reader should refer to a good text on simulation. Third, the analyst must be familiar with various
statistical techniques in order to analyze simulation output in a meaningful fashion. Fourth,
constructing a complex simulation model on a computer can often be a challenging and relatively
time consuming task, although simulation software has developed to the point where this is
becoming easier by the day. The reason these issues are emphasized here is that a modern
simulation model can be very flashy and attractive, but its real value lies in its ability to yield
insights into very complex problems. However, in order to obtain such insights a considerable
level of technical skill is required.
A final point to keep in mind with simulation is that it does not provide one with an indication of
the optimal strategy. In some sense it is a trial and error process since one experiments with
various strategies that seem to make sense and looks at the objective results that the simulation
model provides in order to evaluate the merits of each strategy. If the number of decision
variables is very large, then one must necessarily limit oneself to some subset of these to analyze,
and it is possible that the final strategy selected may not be the optimal one. However, from a
practitioner’s perspective, the objective often is to find a good strategy and not necessarily the
best one, and simulation models are very useful in providing a decision-maker with good
solutions.
Mathematical Models: This is the final category of models, and the one that traditionally has
been most commonly identified with O.R. In this type of model one captures the characteristics
of a system or process through a set of mathematical relationships. Mathematical models can be
deterministic or probabilistic. In the former type, all parameters used to describe the model are
assumed to be known (or estimated with a high degree of certainty). With probabilistic models,
the exact values for some of the parameters may be unknown but it is assumed that they are
capable of being characterized in some systematic fashion (e.g., through the use of a probability
distribution). As an illustration, the Critical Path Method (CPM) and the Program Evaluation and
Review Technique (PERT) are two very similar O.R. techniques used in the area of project
planning. However, CPM is based on a deterministic mathematical model that assumes that the
duration of each project activity is a known constant, while PERT is based on a probabilistic
model that assumes that each activity duration is random but follows some specific probability
distribution (typically, the Beta distribution). Very broadly speaking, deterministic models tend
to be somewhat easier to analyze than probabilistic ones; however, this is not universally true.
Most mathematical models tend to be characterized by three main elements: decision variables,
constraints and objective function(s). Decision variables are used to model specific actions that
are under the control of the decision-maker. An analysis of the model will seek specific values
for these variables that are desirable from one or more perspectives. Very often – especially in
large models – it is also common to define additional "convenience" variables for the purpose of
simplifying the model or for making it clearer. Strictly speaking, such variables are not under the
control of the decision-maker, but they are also referred to as decision variables. Constraints are
used to set limits on the range of values that each decision variable can take on, and each
constraint is typically a translation of some specific restriction (e.g., the availability of some
resource) or requirement (e.g., the need to meet contracted demand). Clearly, constraints dictate
the values that can be feasibly assigned to the decision variables, i.e., the specific decisions on
the system or process that can be taken. The third and final component of a mathematical model
is the objective function. This is a mathematical statement of some measure of performance (such
as cost, profit, time, revenue, utilization, etc.) and is expressed as a function of the decision
variables for the model. It is usually desired either to maximize or to minimize the value of the
objective function, depending on what it represents. Very often, one may simultaneously have
more than one objective function to optimize (e.g., maximize profits and minimize changes in
workforce levels, say). In such cases there are two options. First, one could focus on a single
objective and relegate the others to a secondary status by moving them to the set of constraints
and specifying some minimum or maximum desirable value for them. This tends to be the
simpler option and the one most commonly adopted. The other option is to use a technique
designed specifically for multiple objectives (such as goal programming).
In using a mathematical model the idea is to first capture all the crucial aspects of the system
using the three elements just described, and to then optimize the objective function by choosing
(from among all values for the decision variables that do not violate any of the constraints
specified) the specific values that also yield the most desirable (maximum or minimum) value for
the objective function. This process is often called mathematical programming. Although many
mathematical models tend to follow this form, it is certainly not a requirement; for example, a
model may be constructed to simply define relationships between several variables and the
decision-maker may use these to study how one or more variables are affected by changes in the
values of others. Decision trees, Markov chains and many queuing models could fall into this
category.
Before concluding this section on model formulation, we return to our hypothetical example and
translate the statements made in the problem definition stage into a mathematical model by using
the information collected in the data collection phase. To do this we define two decision
variables G and W to represent respectively the number of gizmos and widgets to be made and
sold next month. Then the objective is to maximize total profits given by 10G+9W. There is a
constraint corresponding to each of the three limited resources, which should ensure that the
production of G gizmos and W widgets does not use up more of the corresponding resource than
is available for use. Thus for resource 1, this would be translated into the following mathematical
statement 0.7G+1.0W ≤ 630, where the left-hand-side of the inequality represents the resource
usage and the right-hand-side the resource availability. Additionally, we must also ensure that
each G and W value considered is a nonnegative integer, since any other value is meaningless in
terms of our definition of G and W. The completely mathematical model is:
o 0.7G+1.0W ≤ 630
o 1.0G+(2/3)W ≤ 708
o 0.1G+0.25W ≤ 135
o G, W ≥ 0 and integers.
This mathematical program tries to maximize the profit as a function of the production quantities
(G and W), while ensuring that these quantities are such that the corresponding production is
feasible with the resources available.
Two-Person Zero-Sum Games
Game theory provides a mathematical framework for analyzing the decision-making processes
and strategies of adversaries (or players) in different types of competitive situations. The
simplest type of competitive situations are two-person, zero-sum games. These games involve
only two players; they are called zero-sum games because one player wins whatever the other
player loses.
Consider the simple game called odds and evens. Suppose that player 1 takes evens and player 2
takes odds. Then, each player simultaneously shows either one finger or two fingers. If the
number of fingers matches, then the result is even, and player 1 wins the bet ($2). If the number
of fingers does not match, then the result is odd, and player 2 wins the bet ($2). Each player has
two possible strategies: show one finger or show two fingers. The payoff matrix shown below
represents the payoff to player 1.
This game of odds and evens illustrates important concepts of simple games.
A two-person game is characterized by the strategies of each player and the payoff
matrix.
The payoff matrix shows the gain (positive or negative) for player 1 that would result
from each combination of strategies for the two players. Note that the matrix for player 2
is the negative of the matrix for player 1 in a zero-sum game.
The entries in the payoff matrix can be in any units as long as they represent the utility
(or value) to the player.
There are two key assumptions about the behavior of the players. The first is that both
players are rational. The second is that both players are greedy meaning that they choose
their strategies in their own interest (to promote their own wealth).
2 x N and M x 2 Games
A two-person game has two players. A game in which one player wins what the other player
loses is called a zero-sum game. The theory of two-person zero-sum games is the foundation of
more complicated games, such as games with more than two players (n-person games), and
games in which the players can benefit through cooperation, with or without collusion or binding
agreements. This introduction is primarily concerned with two-person zero-sum games.
A two-person zero-sum game with a pay-off matrix with dimensions either 2 x n, or m x 2 can be
solved graphically.
Using the graphical illustration it is possible to specify active strategies of the opponent. There
are the strategies which are crossed in point N.
This game differs from game 1 in that it has no dominant strategies. The rules are as
follows: If player 1 plays a nickel, player 2 gives him 5 cents. If player 2 plays a
nickel and player 1 plays a quarter, player 1 gets 25 cents. If both players play
quarters, player 2 gets 25 cents. We get a payoff matrix for this game:
Player 2
Nickel Quarter
Nickel 5 5
Player 1
Quarter 25 -25
Notice that there are no longer any dominant strategies. To solve this game, we need a
more sophisticated approach. First, we can define lower and upper values of a game.
These specify the least and most (on average) that a player can expect to win in the
game if both player play rationally. To find the lower value of the game, first look at
the minimum of the entries in each row. In our example, the first row has minimum
value 5 and the second has minimum -25. The lower value of the game is the
maximum of these numbers, or 5. In other words, player 1 expects to win at least an
average of 5 cents per game. To find the upper value of the game, do the opposite.
Look at the maximum of every column. In this case, these values are 25 and 5. The
upper value of the game is the minimum of these numbers, or 5. So, on average,
player 1 should win at most 5 cents per game.
Notice that, in our example, the upper and lower values of the game are the same.
This is not always true; however, when it is, we just call this number the pure value of
the game. The row with value 5 and the column with value 5 intersect in the top right
entry of the payoff matrix. This entry is called the saddle point or minimax of the
game and is both the smallest in its row and the largest in its column. The row and
column that the saddle point belongs to are the best strategies for the players. So, in
this example, player 1 should always play a nickel while player 2 should always play
a quarter.
It will have a pay-off 10 against A’s 6. The only solution to such a problem is
to employ the maximin- minimax strategies. When A employs the maximin
strategy, it gains 6 while В gains 7 by employing the minimax strategy. Each
fears that the other might discover its choice of strategy and so wants to play
it safe to be sure of a certain minimum of profit 1, the difference between 7
and 6 measures the extent of indeterminacy. This is because the maximin and
the minimax are unequal, 67. The solution is not stable.
One fundamental conclusion follows that where the pay-off matrix has no
saddle point, minimax always exceeds the maximin, as is apparent from Table
2. The reason being that player (firm) A in the game always selects the
maximum of the minimum rows, whereas В always chooses the minimum of
the maximum columns.
The minimax is thus bound to exceed the maximin. This can also be proved
algebraically. Suppose aij is the maximin and aik the minimax. Since aij is a
“Row Min.”, it is either less than or equal to all elements in its row, including
aih. However, aih cannot exceed aik of the “Col. Max.” which is the maximum
in its column.
MIXED TRATEGIES
But the duopoly problem without a saddle point can be solved by allowing
each firm to adopt mixed strategies. A mixed strategy refers to the introduction
of an element of chance in choice making on a probabilistic basis. It “is a
probability distribution that assigns a definite probability to the choice of each
pure strategy in such manner that the sum of the probabilities is unity for each
participant.” It is just giving a player a set of dice to throw and determine the
strategy to be chosen. Each player has a pair of mixed strategies that leads to
an equilibrium position.
Each tries to have the most desirable expected value of the game (or pay-off)
as against his rival; and is therefore, in search of a set of probabilities for his
mixed strategy so as to have the highest expected pay-off. This is known as
the optimal mixed strategy. If the game has value V, A till try to have the
highest expected pay-off V by playing his mixed strategy; playing the same
mixed strategy, В will try to keep A’s expected pay-off to the minimum V.
To illustrate, the pay-off matrix in Table 3 is used where each duopolist has
two strategies 1 and 2. This Table has no saddle point. Both resort to the
game of dice to arrive at a solution. The rule is that if A throws the dice and
the result is 1 or 2, he will choose strategy 1 and if the result is 3, 4, 5 or 6, he
chooses strategy 2. Following this rule, the probability of A choosing strategy
1 is 1/3, and of choosing strategy 2 is 2/3. В will employ the same strategies
but with opposite probabilities in order to keep A’s expected pay-off to the
minimum.
Each duopolist will try to maximise the “mathematical expectation of his profit”
rather than the profit itself. The expected pay-off or the mathematical
expectation of profit for each of the duopolists equals the value of the game,
(F=4) when both adopt their optimal probabilities.
If A uses his optimal mixed strategy, his expected pay-off cannot be less than
V, whatever B’s choice of strategies may be. Similarly, if В uses his optimal
strategy, his expected loss cannot be greater than V, whatever A’s choice of
strategies may be. Thus the problem is always determinate when mixed
strategies are employed.
Non-Constant-Sum Games:
In constant-sum game no player is able to affect the combined pay-off. But in
non-constant-sum game if player A employs an optimal mixed strategy, player
В can increase his expected pay-off by not following the same mixed strategy.
The solution lies in either collusion or non-collusion between the two players.
The former is known as cooperative non-constant-sum game and the latter as
non-cooperative non-constant-sum game.
Nash Equilibrium:
In the cooperative non-constant-sum game, the most rational thing for the two
players is to collude and thus to increase their combined pay-off without
reducing any one’s pay-off. But the problem is not so simple as it appears. It is
too much to expect the players to act rationally, especially when the problem
is one of distributing their joint profit equitably. The Nash Equilibrium tries to
arrive at a “fair division” by evaluating the pay-off for both players.
In Nash equilibrium, each player adopts a strategy that is his best choice,
given what the other player does. To explain Nash equilibrium, take two
players who are involved in a simple game of writing words. The game
assumes that each player writes two words independently on a paper. Player
A writes ‘top’ or ‘bottom’ and player В writes ‘right’ and ‘left’. Then the
scrutinization of their papers reveal- the pay-off got by each is, as shown in
Table 4.
Suppose player A prefers the top and player В prefers the left from the Top-
Left box of the matrix. It is seen that the pay-off to player A is 2 as the first
entry in the left box, and pay-off to player В is the second entry, 4 in this box.
Next if the player A prefers bottom and player В prefers right then the pay-off
to player A is 2 and to player В is 0 in the Bottom-Right box.
From the above, we can infer that player A has two strategies; he can choose
either the top or the bottom. From the point of view of player A, it is always
better for him to prefer the bottom because the choices 4 and 2 are greater
than the figures at the top. Likewise, it is always better for player В to prefer
left because the choices 4 and 2 are greater than the figures at the right i.e. 2
and 0. Here the equilibrium strategy is for player A to prefer the bottom and for
player В to prefer the left.
The above matrix reveals that there is one optimal choice of strategy for a
player without considering the choice of the other player. Whenever player A
prefers the bottom, he will get a higher pay-off irrespective of what player В
prefers. Similarly, player В will get a higher pay-off if he prefers left
irrespective of what player A prefers. The preferences bottom and left
dominate the other two alternatives and hence we get equilibrium in dominant
strategies. But the dominant strategy equilibrium does not occur often. The
matrix in Table 5 shows an example of this particular phenomenon.
In the above matrix when player В prefers the left, the pay-offs to player A are
4 and 0 because he prefers the top. Likewise when player В prefers the right,
the pay-offs to player A are 0 and 2 because he prefers the bottom. When
player В prefers the left, player A would prefer the top, and again when player
В prefers the right, player A would prefer the bottom. Here the optimal choice
of player A is based on what he imagines player В will do.
Thirdly, the various strategies followed by a rival against the other lead to an
endless chain of thought which is highly impracticable. For instance, in Table
1, there is no end to the chain of thought when A chooses one strategy and В
adopts a counter-strategy and vice versa.
Fifthly, even in its application to duopoly, game theory with its assumption of a
constant-sum game is unrealistic. For it implies that the “stakes of interest” are
objectively measurable and transferable. Further, the minimax principle which
provides a solution to the constant-sum game assumes that each player
makes the best of the worst possible situation. How can the best situation be
known if the worst does not arise? Moreover, most entrepreneurs act on the
presumption of the existence of favourable market conditions and the question
of making the best of the worst does not arise at all.
Sixthly, the use of mixed strategies for making non-zero sum games
determinate is unlikely to be found in real market situations. No doubt random
choice of strategies introduces secrecy and uncertainty but most
entrepreneurs, who like secrecy in business, avoid, uncertainty. It is, however,
possible that an oligopolist may wish his rivals to know his business secrets
and strategies for the purpose of entering into collusion with them in order to
earn maximum joint profits.
Conclusion:
Thus like the other duopoly models, game theory fails to provide a satisfactory
solution to the duopoly problem. “Although game theory has developed far
since 1944,” writes Prof. Watson, its contribution to the theory of oligopoly has
been disappointing.” To date, there have been no serious attempts to apply
game theory to actual market problems, or to economic problems in general.
Chapter 4
Decision Theory
Decision theory (or theory of choice) is the study of the reasoning underlying an agent's choices.
Decision theory can be broken into two branches: normative decision theory, which gives advice
on how to make the best decisions, given a set of uncertain beliefs and a set of values; and
descriptive decision theory, which analyzes how existing, possibly irrational, agents actually
make decisions.
Decision theory is closely related to the field of game theory; decision theory is concerned with
the choices of individual agents whereas game theory is concerned with interactions of agents
whose decisions affect each other. Decision theory is an interdisciplinary topic, studied by
economists, statisticians, psychologists, political and social scientists, and philosophers.
The new prescriptions or predictions about behaviour that positive decision theory produces
allow for further tests of the kind of decision-making that occurs in practice. There is a thriving
dialogue with experimental economics, which uses laboratory and field experiments to evaluate
and inform theory. In recent decades, there has also been increasing interest in what is sometimes
called 'behavioral decision theory' and this has contributed to a re-evaluation of what rational
decision-making requires.
This area represents the heart of decision theory. The procedure now referred to as expected
value was known from the 17th century. Blaise Pascal invoked it in his famous wager, which is
contained in his Pensées, published in 1670. The idea of expected value is that, when faced with
a number of actions, each of which could give rise to more than one possible outcome with
different probabilities, the rational procedure is to identify all possible outcomes, determine their
values (positive or negative) and the probabilities that will result from each course of action, and
multiply the two to give an expected value. The action to be chosen should be the one that gives
rise to the highest total expected value. In 1738, Daniel Bernoulli published an influential paper
entitled Exposition of a New Theory on the Measurement of Risk, in which he uses the St.
Petersburg paradox to show that expected value theory must be normatively wrong. He gives an
example in which a Dutch merchant is trying to decide whether to insure a cargo being sent from
Amsterdam to St Petersburg in winter. In his solution, he defines a utility function and computes
expected utility rather than expected financial value (see for a review).
In the 20th century, interest was reignited by Abraham Wald's 1939 paper pointing out that the
two central procedures of sampling–distribution–based statistical-theory, namely hypothesis
testing and parameter estimation, are special cases of the general decision problem. Wald's paper
renewed and synthesized many concepts of statistical theory, including loss functions, risk
functions, admissible decision rules, antecedent distributions, Bayesian procedures, and minimax
procedures. The phrase "decision theory" itself was used in 1950 by E. L. Lehmann.
The revival of subjective probability theory, from the work of Frank Ramsey, Bruno de Finetti,
Leonard Savage and others, extended the scope of expected utility theory to situations where
subjective probabilities can be used. At this time, von Neumann and Morgenstern theory of
expected utility proved that expected utility maximization followed from basic postulates about
rational behavior.
The work of Maurice Allais and Daniel Ellsberg showed that human behavior has systematic and
sometimes important departures from expected-utility maximization. The prospect theory of
Daniel Kahneman and Amos Tversky renewed the empirical study of economic behavior with
less emphasis on rationality presuppositions. Kahneman and Tversky found three regularities —
in actual human decision-making, "losses loom larger than gains"; persons focus more on
changes in their utility–states than they focus on absolute utilities; and the estimation of
subjective probabilities is severely biased by anchoring.
Intertemporal choice
Intertemporal choice is concerned with the kind of choice where different actions lead to
outcomes that are realised at different points in time. If someone received a windfall of several
thousand dollars, they could spend it on an expensive holiday, giving them immediate pleasure,
or they could invest it in a pension scheme, giving them an income at some time in the future.
What is the optimal thing to do? The answer depends partly on factors such as the expected rates
of interest and inflation, the person's life expectancy, and their confidence in the pensions
industry. However even with all those factors taken into account, human behavior again deviates
greatly from the predictions of prescriptive decision theory, leading to alternative models in
which, for example, objective interest rates are replaced by subjective discount rates.
Some decisions are difficult because of the need to take into account how other people in the
situation will respond to the decision that is taken. The analysis of such social decisions is more
often treated under the label of game theory, rather than decision theory, though it involves the
same mathematical methods. From the standpoint of game theory most of the problems treated in
decision theory are one-player games (or the one player is viewed as playing against an
impersonal background situation). In the emerging socio-cognitive engineering, the research is
especially focused on the different types of distributed decision-making in human organizations,
in normal and abnormal/emergency/crisis situations.
Complex decisions
Other areas of decision theory are concerned with decisions that are difficult simply because of
their complexity, or the complexity of the organization that has to make them. Individuals
making decisions may be limited in resources or are boundedly rational. In such cases the issue
is not the deviation between real and optimal behaviour, but the difficulty of determining the
optimal behaviour in the first place. The Club of Rome, for example, developed a model of
economic growth and resource usage that helps politicians make real-life decisions in complex
situations. Decisions are also affected by whether options are framed together or separately. This
is known as the distinction bias.
Heuristics
One method of decision-making is heuristic. The heuristic approach makes decisions based on
routine thinking. While this is quicker than step-by-step processing, heuristic decision-making
opens the risk of inaccuracy. Mistakes that otherwise would have been avoided in step-by-step
processing can be made. One common and incorrect thought process that results from heuristic
thinking is the gambler's fallacy. The gambler's fallacy makes the mistake of believing that a
random event is affected by previous random events. For example, there is a fifty percent chance
of a coin landing on heads. Gambler's fallacy suggests that if the coin lands on tails, the next time
it flips, it will land on heads, as if it's “the coin's turn” to land on heads. This is simply not true.
Such a fallacy is easily disproved in a step-by-step process of thinking.
In another example, when choosing between options involving extremes, decision-makers may
have a heuristic that moderate alternatives are preferable to extreme ones. The Compromise
Effect operates under a mindset driven by the belief that the most moderate option, amid
extremes, carries the most benefits from each extreme.
Alternatives
A highly controversial issue is whether one can replace the use of probability in decision theory
by other alternatives.
Probability theory
the work of Richard Threlkeld Cox for justification of the probability axioms,
the Dutch book paradoxes of Bruno de Finetti as illustrative of the theoretical difficulties that can
arise from departures from the probability axioms, and
the complete class theorems, which show that all admissible decision rules are equivalent to the
Bayesian decision rule for some utility function and some prior distribution (or for the limit of a
sequence of prior distributions). Thus, for every decision rule, either the rule may be
reformulated as a Bayesian procedure (or a limit of a sequence of such), or there is a rule that is
sometimes better and never worse.
The proponents of fuzzy logic, possibility theory, quantum cognition, Dempster–Shafer theory,
and info-gap decision theory maintain that probability is only one of many alternatives and point
to many examples where non-standard alternatives have been implemented with apparent
success; notably, probabilistic decision theory is sensitive to assumptions about the probabilities
of various events, while non-probabilistic rules such as minimax are robust, in that they do not
make such assumptions.
Focusing on two methods for designing decision agents, planning and reinforcement learning, the book
covers probabilistic models, introducing Bayesian networks as a graphical model that captures
probabilistic relationships between variables; utility theory as a framework for understanding optimal
decision making under uncertainty; Markov decision processes as a method for modeling sequential
problems; model uncertainty; state uncertainty; and cooperative decision making involving multiple
interacting agents. A series of applications shows how the theoretical concepts can be applied to
systems for attribute-based person search, speech applications, collision avoidance, and unmanned
aircraft persistent surveillance.
Decision Making Under Uncertainty unifies research from different communities using consistent
notation, and is accessible to students and researchers across engineering disciplines who have some
prior exposure to probability theory and calculus. It can be used as a text for advanced undergraduate
and graduate students in fields including computer science, aerospace and electrical engineering, and
management science. It will also be a valuable professional reference for researchers in a variety of
disciplines.
Maxmin and Minmax
The maximin person looks at the worst that could happen under each action and then choose the action
with the largest payoff. They assume that the worst that can happen will, and then they take the action
with the best worst case scenario. The maximum of the minimums or the "best of the worst". This is the
person who puts their money into a savings account because they could lose money at the stock market.
Minmax
Minimax (sometimes MinMax or MM) is a decision rule used in decision theory, game theory, statistics
and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. Originally
formulated for two-player zero-sum game theory, covering both the cases where players take alternate
moves and those where they make simultaneous moves, it has also been extended to more complex
games and to general decision-making in the presence of uncertainty.
Game theory
In general games
The maximin value of a player is the largest value that the player can be sure to get without
knowing the actions of the other players. Its formal definition is:
Where:
Calculating the maximin value of a player is done in a worst-case approach: for each possible
action of the player, we check all possible actions of the other players and determine the worst
possible combination of actions - the one that gives player the smallest value. Then, we
determine which action player can take in order to make sure that this smallest value is the
largest possible.
For example, consider the following game for two players, where the first player ("row player")
may choose any of three moves, labelled T, M, or B, and the second player ("column" player)
may choose either of two moves, L or R. The result of the combination of both moves is
expressed in a payoff table:
L R
T 3,1 2,-20
M 5,0 -10,1
B -100,2 4,4
(where the first number in each cell is the pay-out of the row player and the second number is the
pay-out of the column player).
For the sake of example, we consider only pure strategies. Check each player in turn:
The row player can play T, which guarantees him a payoff of at least 2 (playing B is risky since it
can lead to payoff -100, and playing M can result in a payoff of -10). Hence: .
The column player can play L and secure a payoff of at least 0 (playing R puts him in the risk of
getting -20). Hence: .
If both players play their maximin strategies (T,L), the payoff vector is (3,1). In contrast, the
only Nash equilibrium in this game is (B,R), which leads to a payoff vector of (4,4).
The minimax value of a player is the smallest value that the other players can force the player to
receive, without knowing his actions. Equivalently, it is the largest value the player can be sure
to get when he knows the actions of the other players. Its formal definition is:
The definition is very similar to that of the maximin value - only the order of the maximum and
minimum operators is inverse.
Intuitively, in maximin the maximization comes before the minimization, so player i tries to
maximize his value before knowing what the others will do; in minimax the maximization comes
after the minimization, so player i is in a much better position - he maximizes his value knowing
what the others did.
Usually, the maximin is strictly smaller than the minimax. Consider the game in the above
example:
The row player can get a value of 4 (if the other player plays R) or 5 (if the other player plays L),
so: .
The column player can get 1 (if the other player plays T), 1 (if M) or 4 (if B). Hence: .
In zero-sum games
In zero-sum games, the minimax solution is the same as the Nash equilibrium.
(a) Given player 2's strategy, the best payoff possible for player 1 is V, and
(b) Given player 1's strategy, the best payoff possible for player 2 is −V.
Equivalently, Player 1's strategy guarantees him a payoff of V regardless of Player 2's strategy,
and similarly Player 2 can guarantee himself a payoff of −V. The name minimax arises because
each player minimizes the maximum payoff possible for the other—since the game is zero-sum,
he/she also minimizes his/their own maximum loss (i.e. maximize his/her minimum payoff). See
also example of a game without a value.
Example
The following example of a zero-sum game, where A and B make simultaneous moves,
illustrates minimax solutions. Suppose each player has three choices and consider the payoff
matrix for A displayed on the right. Assume the payoff matrix for B is the same matrix with the
signs reversed (i.e. if the choices are A1 and B1 then B pays 3 to A). Then, the minimax choice
for A is A2 since the worst possible result is then having to pay 1, while the simple minimax
choice for B is B2 since the worst possible result is then no payment. However, this solution is
not stable, since if B believes A will choose A2 then B will choose B1 to gain 1; then if A
believes B will choose B1 then A will choose A1 to gain 3; and then B will choose B2; and
eventually both players will realize the difficulty of making a choice. So a more stable strategy is
needed.
Some choices are dominated by others and can be eliminated: A will not choose A3 since either
A1 or A2 will produce a better result, no matter what B chooses; B will not choose B3 since
some mixtures of B1 and B2 will produce a better result, no matter what A chooses.
A can avoid having to make an expected payment of more than 1∕3 by choosing A1 with
probability 1∕6 and A2 with probability 5∕6: The expected payoff for A would be 3 × (1∕6) − 1 ×
(5∕6) = −1∕3 in case B chose B1 and −2 × (1∕6) + 0 × (5∕6) = −1/3 in case B chose B2. Similarly,
B can ensure an expected gain of at least 1/3, no matter what A chooses, by using a randomized
strategy of choosing B1 with probability 1∕3 and B2 with probability 2∕3. These mixed minimax
strategies are now stable and cannot be improved.
Maximin
Frequently, in game theory, maximin is distinct from minimax. Minimax is used in zero-sum
games to denote minimizing the opponent's maximum payoff. In a zero-sum game, this is
identical to minimizing one's own maximum loss, and to maximizing one's own minimum gain.
"Maximin" is a term commonly used for non-zero-sum games to describe the strategy which
maximizes one's own minimum payoff. In non-zero-sum games, this is not generally the same as
minimizing the opponent's maximum gain, nor the same as the Nash equilibrium strategy.
In repeated games
The minimax values are very important in the theory of repeated games. One of the central
theorems in this theory, the folk theorem, relies on the minimax values.
A simple version of the minimax algorithm, stated below, deals with games such as tic-tac-toe,
where each player can win, lose, or draw. If player A can win in one move, his best move is that
winning move. If player B knows that one move will lead to the situation where player A can
win in one move, while another move will lead to the situation where player A can, at best, draw,
then player B's best move is the one leading to a draw. Late in the game, it's easy to see what the
"best" move is. The Minimax algorithm helps find the best move, by working backwards from
the end of the game. At each step it assumes that player A is trying to maximize the chances of
A winning, while on the next turn player B is trying to minimize the chances of A winning (i.e.,
to maximize B's own chances of winning).
A minimax algorithm is a recursive algorithm for choosing the next move in an n-player game,
usually a two-player game. A value is associated with each position or state of the game. This
value is computed by means of a position evaluation function and it indicates how good it would
be for a player to reach that position. The player then makes the move that maximizes the
minimum value of the position resulting from the opponent's possible following moves. If it is
A's turn to move, A gives a value to each of his legal moves.
A possible allocation method consists in assigning a certain win for A as +1 and for B as −1.
This leads to combinatorial game theory as developed by John Horton Conway. An alternative is
using a rule that if the result of a move is an immediate win for A it is assigned positive infinity
and, if it is an immediate win for B, negative infinity. The value to A of any other move is the
minimum of the values resulting from each of B's possible replies. For this reason, A is called the
maximizing player and B is called the minimizing player, hence the name minimax algorithm.
The above algorithm will assign a value of positive or negative infinity to any position since the
value of every position will be the value of some final winning or losing position. Often this is
generally only possible at the very end of complicated games such as chess or go, since it is not
computationally feasible to look ahead as far as the completion of the game, except towards the
end, and instead positions are given finite values as estimates of the degree of belief that they
will lead to a win for one player or another.
This can be extended if we can supply a heuristic evaluation function which gives values to non-
final game states without considering all possible following complete sequences. We can then
limit the minimax algorithm to look only at a certain number of moves ahead. This number is
called the "look-ahead", measured in "plies". For example, the chess computer Deep Blue (the
first one to beat a reigning world champion, Garry Kasparov at that time) looked ahead at least
12 plies, then applied a heuristic evaluation function.
The algorithm can be thought of as exploring the nodes of a game tree. The effective branching
factor of the tree is the average number of children of each node (i.e., the average number of
legal moves in a position). The number of nodes to be explored usually increases exponentially
with the number of plies (it is less than exponential if evaluating forced moves or repeated
positions). The number of nodes to be explored for the analysis of a game is therefore
approximately the branching factor raised to the power of the number of plies. It is therefore
impractical to completely analyze games such as chess using the minimax algorithm.
The performance of the naïve minimax algorithm may be improved dramatically, without
affecting the result, by the use of alpha-beta pruning. Other heuristic pruning methods can also
be used, but not all of them are guaranteed to give the same result as the un-pruned search.
A naïve minimax algorithm may be trivially modified to additionally return an entire Principal
Variation along with a minimax score.
When managers make choices or decisions under risk or uncertainty, they must somehow
incorporate this risk into their decision-making process. This chapter presented some basic rules
for managers to help them make decisions under conditions of risk and uncertainty. Conditions
of risk occur when a manager must make a decision for which the outcome is not known with
certainty. Under conditions of risk, the manager can make a list of all possible outcomes and
assign probabilities to the various outcomes. Uncertainty exists when a decision maker cannot
list all possible outcomes and/or cannot assign probabilities to the various outcomes. To measure
the risk associated with a decision, the manager can examine several characteristics of the
probability distribution of outcomes for the decision. The various rules for making decisions
under risk require information about several different characteristics of the probability
distribution of outcomes: (1) the expected value (or mean) of the distribution, (2) the variance
and standard deviation, and (3) the coefficient of variation.
While there is no single decision rule that managers can follow to guarantee that profits are
actually maximized, we discussed a number of decision rules that managers can use to help them
make decisions under risk: (1) the expected value rule, (2) the mean–variance rules, and (3) the
coefficient of variation rule. These rules can only guide managers in their analysis of risky
decision making. The actual decisions made by a manager will depend in large measure on the
manager's willingness to take on risk. Managers' propensity to take on risk can be classified in
one of three categories: risk averse, risk loving, or risk neutral.
Expected utility theory explains how managers can make decisions in risky situations. The
theory postulates that managers make risky decisions with the objective of maximizing the
expected utility of profit. The manager's attitude for risk is captured by the shape of the utility
function for profit. If a manager experiences diminishing (increasing) marginal utility for profit,
the manager is risk averse (risk loving). If marginal utility for profit is constant, the manager is
risk neutral.
If a manager maximizes expected utility for profit, the decisions can differ from decisions
reached using the three decision rules discussed for making risky decisions. However, in the case
of a risk-neutral manager, the decisions are the same under maximization of expected profit and
maximization of expected utility of profit. Consequently, a risk-neutral decision maker can
follow the simple rule of maximizing the expected value of profit and simultaneously also be
maximizing utility of profit.
In the case of uncertainty, decision science can provide very little guidance to managers beyond
offering them some simple decision rules to aid them in their analysis of uncertain situations. We
discussed four basic rules for decision making under uncertainty in this chapter: (1) the maximax
rule, (2) the maximin rule, (3) the minimax regret rule, and (4) the equal probability rule.
Expected Value
In probability theory, the expected value of a random variable, intuitively, is the long-run
average value of repetitions of the experiment it represents. For example, the expected value in
rolling a six-sided die is 3.5 because, roughly speaking, the average of all the numbers that come
up in an extremely large number of rolls is very nearly always quite close to three and a half.
Less roughly, the law of large numbers states that the arithmetic mean of the values almost
surely converges to the expected value as the number of repetitions approaches infinity. The
expected value is also known as the expectation, mathematical expectation, EV, average,
mean value, mean, or first moment.
More practically, the expected value of a discrete random variable is the probability-weighted
average of all possible values. In other words, each possible value the random variable can
assume is multiplied by its probability of occurring, and the resulting products are summed to
produce the expected value. The same principle applies to a continuous random variable, except
that an integral of the variable with respect to its probability density replaces the sum. The
formal definition subsumes both of these and also works for distributions which are neither
discrete nor continuous: the expected value of a random variable is the integral of the random
variable with respect to its probability measure.
The expected value does not exist for random variables having some distributions with large
"tails", such as the Cauchy distribution. For random variables such as these, the long-tails of the
distribution prevent the sum/integral from converging.
The expected value is a key aspect of how one characterizes a probability distribution; it is one
type of location parameter. By contrast, the variance is a measure of dispersion of the possible
values of the random variable around the expected value. The variance itself is defined in terms
of two expectations: it is the expected value of the squared deviation of the variable's value from
the variable's expected value.
The expected value plays important roles in a variety of contexts. In regression analysis, one
desires a formula in terms of observed data that will give a "good" estimate of the parameter
giving the effect of some explanatory variable upon a dependent variable. The formula will give
different estimates using different samples of data, so the estimate it gives is itself a random
variable. A formula is typically considered good in this context if it is an unbiased estimator—
that is, if the expected value of the estimate (the average value it would give over an arbitrarily
large number of separate samples) can be shown to equal the true value of the desired parameter.
In decision theory, and in particular in choice under uncertainty, an agent is described as making
an optimal choice in the context of incomplete information. For risk neutral agents, the choice
involves using the expected values of uncertain quantities, while for risk averse agents it
involves maximizing the expected value of some objective function such as a von Neumann–
Morgenstern utility function. One example of using expected value in reaching optimal decisions
is the Gordon–Loeb model of information security investment. According to the model, one can
conclude that the amount a firm spends to protect information should generally be only a small
fraction of the expected loss (i.e., the expected value of the loss resulting from a
cyber/information security breach).
Decision tree learning uses a decision tree as a predictive model which maps observations about
an item to conclusions about the item's target value. It is one of the predictive modelling
approaches used in statistics, data mining and machine learning. Tree models where the target
variable can take a finite set of values are called classification trees. In these tree structures,
leaves represent class labels and branches represent conjunctions of features that lead to those
class labels. Decision trees where the target variable can take continuous values (typically real
numbers) are called regression trees.
In decision analysis, a decision tree can be used to visually and explicitly represent decisions and
decision making. In data mining, a decision tree describes data but not decisions; rather the
resulting classification tree can be an input for decision making. This page deals with decision
trees in data mining.
Chapter 5
Many practical problems in operations research can be broadly formulated as linear programming problems, for
which the simplex this is a general method and cannot be used for specific types of problems like,
(i) transportation models,
(ii) transshipment models and
(iii) the assignment models.
The above models are also basically allocation models. We can adopt the simplex technique to solve them, but
easier algorithms have been developed for solution of such problems. The following sections deal with the
transportation problems and their streamlined procedures for solution.
Transportation Problem
A typical transportation problem is shown in Fig. 1. It deals with sources where a supply of some
commodity is available and destinations where the commodity is demanded. The classic
statement of the transportation problem uses a matrix with the rows representing sources and
columns representing destinations. The algorithms for solving the problem are based on this
matrix representation. The costs of shipping from sources to destinations are indicated by the
entries in the matrix. If shipment is impossible between a given source and destination, a large
cost of M is entered. This discourages the solution from using such cells. Supplies and demands
are shown along the margins of the matrix. As in the example, the classic transportation problem
has total supply equal to total demand.
The network model of the transportation problem is shown in Fig. 2. Sources are identified as the
nodes on the left and destinations on the right. Allowable shipping links are shown as arcs, while
disallowed links are not included.
Figure 2. Network flow model of the transportation problem.
Only arc costs are shown in the network model, as these are the only relevant parameters. All
other parameters are set to the default values. The network has a special form important in graph
theory; it is called a bipartite network since the nodes can be divided into two parts with all arcs
going from one part to the other.
On each supply node the positive external flow indicates supply flow entering the network. On
each destination node a demand is a negative fixed external flow indicating that this amount
must leave the network. The optimum solution for the example is shown in Fig. 3.
Variations of the classical transportation problem are easily handled by modifications of the
network model. If links have finite capacity, the arc upper bounds can be made finite. If supplies
represent raw materials that are transformed into products at the sources and the demands are in
units of product, the gain factors can be used to represent transformation efficiency at each
source. If some minimal flow is required in certain links, arc lower bounds can be set to nonzero
values.
Assignment problem is one of the special cases of transportation problems. The goal of the
assignment problem is to minimize the cost or time of completing a number of jobs by a number
of persons. An important characteristic of the assignment problem is the number of sources is
equal to the number of destinations .It is explained in the following way.
Management has faced with problems whose structures are identical with assignment problems.
Ex: A manager has five persons for five separate jobs and the cost of assigning each job to each
person is given. His goal is to assign one and only job to each person in such a way that the total
cost of assignment is minimized.
Balanced assignment problem: This is an assignment where the number of persons is equal to
the number of jobs.
Unbalanced assignment problem: This is the case of assignment problem where the number of
persons is not equal to the number of jobs. A dummy variable, either for a person or job ( as it
required) is introduced with zero cost or time to make it a balanced one.
Dummy job/ person: Dummy job or person is an imaginary job or person with zero cost or time
introduced in the unbalanced assignment problem to make it balanced one .
To solve assignment problem using this method, one should know time of completion or cost of
making all the possible assignments. Each assignment problem has a table, persons one wishes to
assign are represented in the rows and jobs or tasks to be assigned are expressed in the columns.
Cost for each particular assignment is in the numbers in the table. It is referred as cost matrix.
Hungarian method is based on the principle that if a constant is added to the elements of cost
matrix, the optimum solution of the assignment problem is the same as the original problem.
Original cost matrix is reduced to another cost matrix by adding a constant value to the elements
of rows and columns of cost matrix where the total completion time or total cost of an
assignment is zero. This assignment is also referred as the optimum solution since the optimum
solution remains unchanged after the reduction.
ormulation of Assignment problem: Suppose there are r m laborers to whom n tasks are
assigned. No labour can be either left idle or given more than one job. take up more than one job
an
This method was developed by D. Konig, a Hungarian mathematician and is therefore known as
the Hungarian method of assignment problem. In order to use this method, one needs to know
only the cost of making all the possible assignments. Each assignment problem has a matrix
(table) associated with it. Normally, the objects (or people) one wishes to assign are expressed in
rows, whereas the columns represent the tasks (or things) assigned to them. The number in the
table would then be the costs associated with each particular assignment. It may be noted that the
assignment problem is a variation of transportation problem with two characteristics.(i)the cost
matrix is a square matrix, and (ii)the optimum solution for the problem would be such that there
would be only one assignment in a row or column of the cost matrix .
Mathematical
Although assignment problem can be formulated as linear programming problem, solution can
be found out by using Hungarian bm .
Because of this special structure, solution of the assignment problems can be found out using
Hungarian method .
A transportation tableau is given below. Each cell represents a shipping route (which is an arc on
the network and a decision variable in the LP formulation), and the unit shipping costs are given
in an upper right hand box in the cell.
To solve the transportation problem by its special purpose algorithm, the sum of the supplies at
the origins must equal the sum of the demands at the destinations (Balanced transportation
problem).
• If the total supply is greater than the total demand, a dummy destination is added with demand
equal to the excess supply, and shipping costs from all origins are zero.
• Similarly, if total supply is less than total demand, a dummy origin is added. The supply at the
dummy origin is equal to the difference of the total supply and the total demand. The costs
associated with the dummy origin are equal to zero.
When solving a transportation problem by its special purpose algorithm, unacceptable shipping
routes are given a cost of +M (a large number).
1. Starting from the northwest corner of the transportation tableau, allocate as much quantity as
possible to cell (1,1) from Origin 1 to Destination 1, within the supply constraint of source 1 and
the demand constraint of destination 1.
2. The first allocation will satisfy either the supply capacity of Source 1 or the destination
requirement of Destination 1.
ƒ If the demand requirement for destination 1 is satisfied but the supply capacity for Source 1 is
not exhausted, move on to cell (1,2) for next allocation.
ƒ If the demand requirement for destination 1 is not satisfied but the supply capacity for Source 1
is exhausted, move to cell (2,1)
ƒ If the demand requirement for Destination 1 is satisfied and the supply capacity for Source 1 is
also exhausted, move on to cell (2,2).
3. Continue the allocation in the same manner toward the southeast corner of the transportation
tableau until the supply capacities of all sources are exhausted and the demands of all
destinations are satisfied.
Although the North-west Corner Rule is the easiest, it is not the most attractive because our
objective is not included in the process.
1. Select the cell with the minimum cell cost in the tableau and allocate as much to this cell as
possible, but within the supply and demand constraints.
2. Select the cell with the next minimum cell-cost and allocate as much to this cell as possible
within the demand and supply constraints.
3. Continue the procedure until all of the supply and demand requirements are satisfied. In a case
of tied minimum cell-costs between two or more cells, the tie can be broken by selecting the cell
that can accommodate the greater quantity.
Associate a number, ui, with each row and vj with each column.
• Step 1: Set u1 = 0.
• Step 2: Calculate the remaining ui’s and vj’s by solving the relationship cij
• Step 3: For unoccupied cells (i,j), the reduced cost = cij – ui – vj.
Step 1: For each unoccupied cell, calculate the reduced cost by the MODI method. Select the
unoccupied cell with the most negative reduced cost. (For maximization problems select the
unoccupied cell with the largest reduced cost.) If none, STOP.
Step 2: For this unoccupied cell, generate a stepping stone path by forming a closed loop with
this cell and occupied cells by drawing connecting alternating horizontal and vertical lines
between them. Determine the minimum allocation where a subtraction is to be made along this
path.
Step 3: Add this allocation to all cells where additions are to be made, and subtract this
allocation to all cells where subtractions are to be made along the stepping stone path. (Note: An
occupied cell on the stepping stone path now
becomes 0 (unoccupied). If more than one cell becomes 0, make only one unoccupied; make the
others occupied with 0’s.)
GO TO STEP 1.
Acme Block Company has orders for 80 tons of concrete blocks at three suburban locations as
follows: Northwood — 25 tons, Westwood — 45 tons, and Eastwood — 10 tons. Acme has two
plants, each of which can produce 50 tons per week. Delivery cost per ton from each plant to
each suburban location is shown below.
How should end of week shipments be made to fill the above orders?
Since total supply = 100 and total demand = 80, a dummy destination is created with demand of
20 and 0 unit costs.
Iteration 1: Tie for least cost (0), arbitrarily select x14. Allocate 20. Reduce s1 by 20 to 30 and
delete the Dummy column.
Iteration 2: Of the remaining cells the least cost is 24 for x11. Allocate 25. Reduce s1 by 25 to 5
and eliminate the Northwood column.
Iteration 3: Of the remaining cells the least cost is 30 for x12. Allocate 5. Reduce the Westwood
column to 40 and eliminate the Plant 1 row. Iteration 4: Since there is only one row with two
cells left, make the final allocations of 40 and 10 to x22 and x23, respectively.
1. Set u1 = 0
2. Since u1 + vj = c1j for occupied cells in row 1, then v1 = 24, v2 = 30, v4 = 0.
3. Since ui + v2 = ci2 for occupied cells in column 2, then u2 + 30 = 40, hence u2 = 10.
4. Since u2 + vj = c2j for occupied cells in row 2, then 10 + v3 = 42, hence v3 = 32.
Calculate the reduced costs (circled numbers on the previous slide) by cij – ui – vj.
Iteration 1:
The stepping stone path for cell (2,4) is (2,4), (1,4), (1,2), (2,2). The allocations in the subtraction
cells are 20 and 40, respectively. The minimum is 20, and hence reallocate 20 along this path.
Thus for the next tableau:
x24 = 0 + 20 = 20 (0 is its current allocation) x14 = 20 – 20 = 0 (blank for the next tableau) x12 =
5 + 20 = 25
x22 = 40 – 20 = 20
1. Set u1 = 0.
and, 10 + v4 = 0 or v4 = -10.
Iteration 2
Calculate the reduced costs (circled numbers on the previous slide) by cij – ui – vj.
The most negative reduced cost is = -4 determined by x21. The stepping stone path for this cell is
(2,1),(1,1),(1,2),(2,2). The allocations in the subtraction cells are 25 and 20 respectively. Thus
the new solution is obtained by reallocating 20 on the stepping stone path. Thus for the next
tableau:
x11 = 25 – 20 = 5
x12 = 25 + 20 = 45
1. Set u1 = 0
Calculate the reduced costs (circled numbers on the previous slide) by cij – ui – vj.
(1,3) 40 – 0 – 36 = 4
(1,4) 0 – 0 – (-6) = 6
(2,2) 40 – 6 – 30 = 4
Since all the reduced costs are non-negative, this is the optimal tableau.
Degeneracy
If the basic feasible solution of a transportation problem with m origins and n destinations has
fewer than m + n – 1 positive xij (occupied cells), the problem is said to be a degenerate
transportation problem.
Chapter 6
Assignment
Assignment Problem
The assignment problem is one of the fundamental combinatorial optimization problems in the
branch of optimization or operations research in mathematics. It consists of finding a maximum
weight matching (or minimum weight perfect matching) in a weighted bipartite graph.
The problem instance has a number of agents and a number of tasks. Any agent can be assigned
to perform any task, incurring some cost that may vary depending on the agent-task assignment.
It is required to perform all tasks by assigning exactly one agent to each task and exactly one task
to each agent in such a way that the total cost of the assignment is minimized.
If the numbers of agents and tasks are equal and the total cost of the assignment for all tasks is
equal to the sum of the costs for each agent (or the sum of the costs for each task, which is the
same thing in this case), then the problem is called the linear assignment problem. Commonly,
when speaking of the assignment problem without any additional qualification, then the linear
assignment problem is meant.
Minimization problem
In this example we have to assign 4 workers to 4 machines. Each worker causes different costs
for the machines. Your goal is to minimize the total cost to the condition that each machine goes
to exactly 1 person and each person works at exactly 1 machine. For comprehension: Worker 1
causes a cost of 6 for machine 1 and so on …
Step 2 – Subtract the column minimum from each column from the reduced matrix.
The idea behind these 2 steps is to simplify the matrix since the solution of the reduced matrix
will be exactly the same as of the original matrix.
Step 3 – Assign one “0” to each row & column.
Now that we have simplified the matrix we can assign each worker with the minimal cost to each
machine which is represented by a “0”.
- In the first row we have one assignable “0” therefore we assign it to worker 3.
- In the second row we also only have one assignable “0” therefore we assign it to worker 4.
- In the third row we have two assignable “0”. We leave it as it is for now.
- In the fourth row we have one assignable “0” therefore we assign it. Consider that we can only
assign each worker to each machine hence we can’t allocate any other “0” in the first column.
- Now we go back to the third row which now only has one assignable “0” for worker 2.
As soon as we can assign each worker to one machine, we have the optimal solution. In this case
there is no need to proceed any further steps. Remember also, if we decide on an arbitrary order
in which we start allocating the “0”s then we may get into a situation where we have 3
assignments as against the possible 4. If we assign a “0” in the third row to worker 1 we
wouldn’t be able to allocate any “0”s in column one and row two.
- If there is an assignable “0”, only 1 assignable “0” in any row or any column, assign it.
Now there are also cases where you won’t get an optimal solution for a reduced matrix after one
iteration. The following example will explain it.
Step 2 – Subtract the column minimum from each column from the reduced matrix.
Step 3 – Assign one “0” to each row & column.
Now we have to assign the “0”s for every row respectively to the rule that we described earlier in
example 1.
- In the first row we have one assignable “0” therefore we assign it and no other allocation in
column 2 is possible.
- In the second row we have one assignable “0” therefore we assign it.
- In the third row we have several assignable “0”s. We leave it as it is for now and proceed.
Now we proceed with the allocations of the “0”s for each column.
- In the first column we have one assignable “0” therefore we assign it. No other “0”s in row 3
are assignable anymore.
Now we are unable to proceed because all the “0”s either been assigned or crossed. The crosses
indicate that they are not fit for assignments because assignments are already made.
We realize that we have 3 assignments for this 5x5 matrix. In the earlier example we were able
to get 4 assignments for a 4x4 matrix. Now we have to follow another procedure to get the
remaining 2 assignments (“0”).
Step 5 – If a row is ticked and has a “0”, then tick the corresponding column (if the column
is not yet ticked).
Step 6 – If a column is ticked and has an assignment, then tick the corresponding row (if
the row is not yet ticked).
In this case there is no more ticking possible and we proceed with the next step.
Step 8 – Draw lines through unticked rows and ticked columns. The number of lines
represents the maximum number of assignments possible.
Step 9 – Find out the smallest number which does not have any line passing through it. We
call it Theta. Subtract theta from all the numbers that do not have any lines passing
through them and add theta to all those numbers that have two lines passing through them.
Keep the rest of them the same.
With the new assignment matrix we start to assign the “0”s after the explained rules.
Nevertheless we have 4 assignments against the required 5 for an optimal solution. Therefore we
have to repeat step 4 – 9.
Iteration 2:
Step 5 – If a row is ticked and has a “0”, then tick the corresponding column (if the column is not
yet ticked).
Step 6 – If a column is ticked and has an assignment, then tick the corresponding row (if the row
is not yet ticked).
Step 8 – Draw lines through unticked rows and ticked columns. The number of lines represents
the maximum number of assignments possible.
Step 9 – Find out the smallest number which does not have any line passing through it. We call it
Theta. Subtract theta from all the numbers that do not have any lines passing through them and
add theta to all those numbers that have two lines passing through them. Keep the rest of them
the same.
Iteration 3:
Step 5 – If a row is ticked and has a “0”, then tick the corresponding column (if the column is not
yet ticked).
Step 6 – If a column is ticked and has an assignment, then tick the corresponding row (if the row
is not yet ticked).
Step 9 – Find out the smallest number which does not have any line passing through it. We call it
Theta. Subtract theta from all the numbers that do not have any lines passing through them and
add theta to all those numbers that have two lines passing through them. Keep the rest of them
the same.
Iteration 4:
Step 5 – If a row is ticked and has a “0”, then tick the corresponding column (if the column is not
yet ticked).
Step 6 – If a column is ticked and has an assignment, then tick the corresponding row (if the row
is not yet ticked).
Step 8 – Draw lines through unticked rows and ticked columns. The number of lines represents
the maximum number of assignments possible.
Step 9 – Find out the smallest number which does not have any line passing through it. We call it
Theta. Subtract theta from all the numbers that do not have any lines passing through them and
add theta to all those numbers that have two lines passing through them. Keep the rest of them
the same.
After the fourth iteration we assign the “0”s again and now we have an optimal solution with 5
assignments.
The solution:
From now on we proceed as usual with the steps to get to an optimal solution.
Step 2 – Subtract the column minimum from each column from the reduced matrix.
With the determined optimal solution we can compute the maximal profit:
Unbalanced Problem
Unbalanced problems are typically encountered in transportation problems in operations research
where the total supply does not equal the total demand. The main objective of a transportation problem
is to determine the transportation schedule that minimizes the overall transportation cost, while
meeting the supply and demand requirements. But in reality, the problems that we encounter involve
unbalanced conditions where the supply and demand are not equal.
Let us consider a problem where a company has three warehouses – warehouse 1, warehouse 2,
and warehouse 3. The company provides supplies to retailers Retailer A and Retailer B. The
supply and demand for the warehouses and retailers is shown below. The table shows the supply
from each warehouse, the demand of individual retailer and the distance between the individual
warehouses and the retailers. The cost of transportation can be taken proportionate to the
distance between the warehouse and retailer.
The problem can be simplified by introducing dummy dealers or dummy suppliers, who will
either supply or consume the excess. So the above problem boils down to
The cost of transportation from the individual warehouses to the retailers is taken as zero,
because the dummy supplier or dealer does not exist in reality. Now, algorithms or linear
programming equations can be used to find out find out a solution.
Chapter 7
Linear Programming
Introduction
Linear Programming (LP) is a mathematical modelling technique useful for allocation of limited
resources such as material, machines etc to several competing activities such as projects, services
etc. A typical linear programming problem consists of a linear objective function which is to be
maximized or minimized subject to a finite number of linear constraints.
The founders of LP are George B. Dantzig, who published the simplex method in 1947, John von
Neumann, who developed the theory of the duality in the same year, and Leonid Kantorovich, a
Russian mathematician who used similar techniques in economics before Dantzig and won the
Nobel prize in 1975 in economics. The linear programming problem was first shown to be
solvable in polynomial time by Leonid Khachiyan in 1979, but a larger major theoretical and
practical breakthrough in the field came in 1984 when Narendra Karmarkar introduced a new
interior point method for solving linear programming problems.
Consider a chemical company which produces two salts: X and Y. Suppose an acid and a base
are the only two chemicals required by the company to produce these two salts. Further suppose
that by past experience the owner of the company has obtained the following data:
To make 1 unit of the salt X, 6 units of the acid and 1 unit of the base is required. To make 1 unit
of the salt Y, 4 units of the acid and 2 units of the base are required. At most 24 units of the acid
and 6 units of the base are available daily. 1 unit of the salt X gets him a profit of 5 (in whatever
currency) and 1 unit of the salt Y gets him a profit of 4. In addition due to a market survey he
knows that the daily demand for the salt Y cannot exceed that for X by more then 1 unit, and that
the maximum daily demand for the salt Y is 2 units.
The company owner wants the best product mix. That is to say, he wants to know the amount of
X and Y he should make daily so that he gets the highest profit.
To formulate the LP model for this problem we first need to identify the decision variables.
These are the variables which will represent the entities about which we have to actually make a
decision about. In our problem it is clear that the entities are the salts X and Y and so the
variables should represent the amount they should be each made.
Linear programming problem is structure in which we solve our business problem. It is benefited to us to
make the structure of linear programming for best utilization of our skills in business.
Step 1
Determination of our objective
Either we want to maximize our profit or minimize our cost .
This is called z in the math language . Fix the profit or cost rate for your two product . This is called c1
and c2 and and multiply with x1 and x2 the quantity of our two product
Here is
Our objective function
Z= c1 x1 + c2 x2
Step 2
Determination of our condition in achieving our objective
Condition means we can achieve our objective with in our limit , suppose we have limited money , our
labourer limits , our machinery capacity have some limit , or we have limited time . So identity this and
write below.
Assumption
Now that you have seen how some simple problems can be formulated and solved as linear
programs, it is useful to reconsider the question of when a problem can be realistically
represented as a linear programming problem. A problem can be realistically represented as a
linear program if the following assumptions hold:
There are several assumptions on which the linear programming works, these are:
1. Proportionality: The basic assumption underlying the linear programming is that any change in
the constraint inequalities will have the proportional change in the objective function. This
means, if product contributes Rs 20 towards the profit, then the total contribution would be
equal to 20x1, where x1 is the number of units of the product.
For example, if there are 5 units of the product, then the contribution would be Rs 100
and in the case of 10 units, it would be Rs 200. Thus, if the output (sales) is doubled, the
profit would also be doubled.
2. Additivity: The assumption of additivity asserts that the total profit of the objective function is
determined by the sum of profit contributed by each product separately. Similarly, the total
amount of resources used is determined by the sum of resources used by each product
separately. This implies, there is no interaction between the decision variables.
3. Continuity: Another assumption of linear programming is that the decision variables are
continuous. This means a combination of outputs can be used with the fractional values along
with the integer values.
For example, If 5/3 units of product A and 10/3 units of product B to be produced in a
week. In this case, the fractional amount of production will be taken as a work-in-
progress and the remaining production part is taken in the following week. Therefore, a
production of 17 units of product A and 31 units of product B over a three-week period
implies 5/3 units of product A and 10/3 units of product B per week.
4. Certainty: Another underlying assumption of linear programming is a certainty, i.e. the
parameters of objective function coefficients and the coefficients of constraint inequalities is
known with certainty. Such as profit per unit of product, availability of material and labor per
unit, requirement of material and labor per unit are known and is given in the linear
programming problem.
5. Finite Choices: This assumption implies that the decision maker has certain choices, and the
decision variables assume non-negative values. The non-negative assumption is true in the
sense, the output in the production problem can not be negative. Thus, this assumption is
considered feasible.
Thus, while solving for the linear programming problem, these assumptions should be kept in
mind such that the best alternative is chosen.
Mathematical models can take many forms, including dynamical systems, statistical models,
differential equations, or game theoretic models. These and other types of models can overlap,
with a given model involving a variety of abstract structures. In general, mathematical models
may include logical models. In many cases, the quality of a scientific field depends on how well
the mathematical models developed on the theoretical side agree with results of repeatable
experiments. Lack of agreement between theoretical mathematical models and experimental
measurements often leads to important advances as better theories are developed.
Linear vs. nonlinear: If all the operators in a mathematical model exhibit linearity, the resulting
mathematical model is defined as linear. A model is considered to be nonlinear otherwise. The
definition of linearity and nonlinearity is dependent on context, and linear models may have
nonlinear expressions in them. For example, in a statistical linear model, it is assumed that a
relationship is linear in the parameters, but it may be nonlinear in the predictor variables.
Similarly, a differential equation is said to be linear if it can be written with linear differential
operators, but it can still have nonlinear expressions in it. In a mathematical programming
model, if the objective functions and constraints are represented entirely by linear equations,
then the model is regarded as a linear model. If one or more of the objective functions or
constraints are represented with a nonlinear equation, then the model is known as a nonlinear
model.
Nonlinearity, even in fairly simple systems, is often associated with phenomena such as chaos
and irreversibility. Although there are exceptions, nonlinear systems and models tend to be
more difficult to study than linear ones. A common approach to nonlinear problems is
linearization, but this can be problematic if one is trying to study aspects such as irreversibility,
which are strongly tied to nonlinearity.
Static vs. dynamic: A dynamic model accounts for time-dependent changes in the state of the
system, while a static (or steady-state) model calculates the system in equilibrium, and thus is
time-invariant. Dynamic models typically are represented by differential equations.
Explicit vs. implicit: If all of the input parameters of the overall model are known, and the
output parameters can be calculated by a finite series of computations (known as linear
programming, not to be confused with linearity as described above), the model is said to be
explicit. But sometimes it is the output parameters which are known, and the corresponding
inputs must be solved for by an iterative procedure, such as Newton's method (if the model is
linear) or Broyden's method (if non-linear). For example, a jet engine's physical properties such
as turbine and nozzle throat areas can be explicitly calculated given a design thermodynamic
cycle (air and fuel flow rates, pressures, and temperatures) at a specific flight condition and
power setting, but the engine's operating cycles at other flight conditions and power settings
cannot be explicitly calculated from the constant physical properties.
Discrete vs. continuous: A discrete model treats objects as discrete, such as the particles in a
molecular model or the states in a statistical model; while a continuous model represents the
objects in a continuous manner, such as the velocity field of fluid in pipe flows, temperatures
and stresses in a solid, and electric field that applies continuously over the entire model due to a
point charge.
Deterministic vs. probabilistic (stochastic): A deterministic model is one in which every set of
variable states is uniquely determined by parameters in the model and by sets of previous states
of these variables; therefore, a deterministic model always performs the same way for a given
set of initial conditions. Conversely, in a stochastic model—usually called a "statistical model"—
randomness is present, and variable states are not described by unique values, but rather by
probability distributions.
Deductive, inductive, or floating: A deductive model is a logical structure based on a theory. An
inductive model arises from empirical findings and generalization from them. The floating model
rests on neither theory nor observation, but is merely the invocation of expected structure.
Application of mathematics in social sciences outside of economics has been criticized for
unfounded models. Application of catastrophe theory in science has been characterized as a
floating model.
Throughout history, more and more accurate mathematical models have been developed.
Newton's laws accurately describe many everyday phenomena, but at certain limits relativity
theory and quantum mechanics must be used, even these do not apply to all situations and need
further refinement. It is possible to obtain the less accurate models in appropriate limits, for
example relativistic mechanics reduces to Newtonian mechanics at speeds much less than the
speed of light. Quantum mechanics reduces to classical physics when the quantum numbers are
high. For example, the de Broglie wavelength of a tennis ball is insignificantly small, so classical
physics is a good approximation to use in this case.
It is common to use idealized models in physics to simplify things. Massless ropes, point
particles, ideal gases and the particle in a box are among the many simplified models used in
physics. The laws of physics are represented with simple equations such as Newton's laws,
Maxwell's equations and the Schrödinger equation. These laws are such as a basis for making
mathematical models of real situations. Many real situations are very complex and thus modeled
approximate on a computer, a model that is computationally feasible to compute is made from
the basic laws or from approximate models made from the basic laws. For example, molecules
can be modeled by molecular orbital models that are approximate solutions to the Schrödinger
equation. In engineering, physics models are often made by mathematical methods such as finite
element analysis.
Different mathematical models use different geometries that are not necessarily accurate
descriptions of the geometry of the universe. Euclidean geometry is much used in classical
physics, while special relativity and general relativity are examples of theories that use
geometries which are not Euclidean.
Some applications
Since prehistorical times simple models such as maps and diagrams have been used.
Often when engineers analyze a system to be controlled or optimized, they use a mathematical
model. In analysis, engineers can build a descriptive model of the system as a hypothesis of how
the system could work, or try to estimate how an unforeseeable event could affect the system.
Similarly, in control of a system, engineers can try out different control approaches in
simulations.
A mathematical model usually describes a system by a set of variables and a set of equations that
establish relationships between the variables. Variables may be of many types; real or integer
numbers, boolean values or strings, for example. The variables represent some properties of the
system, for example, measured system outputs often in the form of signals, timing data, counters,
and event occurrence (yes/no). The actual model is the set of functions that describe the relations
between the different variables.
Building blocks
In business and engineering, mathematical models may be used to maximize a certain output.
The system under consideration will require certain inputs. The system relating inputs to outputs
depends on other variables too: decision variables, state variables, exogenous variables, and
random variables.
Decision variables are sometimes known as independent variables. Exogenous variables are
sometimes known as parameters or constants. The variables are not independent of each other as
the state variables are dependent on the decision, input, random, and exogenous variables.
Furthermore, the output variables are dependent on the state of the system (represented by the
state variables).
Objectives and constraints of the system and its users can be represented as functions of the
output variables or state variables. The objective functions will depend on the perspective of the
model's user. Depending on the context, an objective function is also known as an index of
performance, as it is some measure of interest to the user. Although there is no limit to the
number of objective functions and constraints a model can have, using or optimizing the model
becomes more involved (computationally) as the number increases.
For example, in economics students often apply linear algebra when using input-output models.
Complicated mathematical models that have many variables may be consolidated by use of
vectors where one symbol represents several variables.
A priori information
Mathematical modeling problems are often classified into black box or white box models,
according to how much a priori information on the system is available. A black-box model is a
system of which there is no a priori information available. A white-box model (also called glass
box or clear box) is a system where all necessary information is available. Practically all systems
are somewhere between the black-box and white-box models, so this concept is useful only as an
intuitive guide for deciding which approach to take.
Usually it is preferable to use as much a priori information as possible to make the model more
accurate. Therefore, the white-box models are usually considered easier, because if you have
used the information correctly, then the model will behave correctly. Often the a priori
information comes in forms of knowing the type of functions relating different variables. For
example, if we make a model of how a medicine works in a human system, we know that usually
the amount of medicine in the blood is an exponentially decaying function. But we are still left
with several unknown parameters; how rapidly does the medicine amount decay, and what is the
initial amount of medicine in blood? This example is therefore not a completely white-box
model. These parameters have to be estimated through some means before one can use the
model.
In black-box models one tries to estimate both the functional form of relations between variables
and the numerical parameters in those functions. Using a priori information we could end up, for
example, with a set of functions that probably could describe the system adequately. If there is
no a priori information we would try to use functions as general as possible to cover all different
models. An often used approach for black-box models are neural networks which usually do not
make assumptions about incoming data. Alternatively the NARMAX (Nonlinear AutoRegressive
Moving Average model with eXogenous inputs) algorithms which were developed as part of
nonlinear system identification can be used to select the model terms, determine the model
structure, and estimate the unknown parameters in the presence of correlated and nonlinear
noise. The advantage of NARMAX models compared to neural networks is that NARMAX
produces models that can be written down and related to the underlying process, whereas neural
networks produce an approximation that is opaque.
Subjective information
Sometimes it is useful to incorporate subjective information into a mathematical model. This can
be done based on intuition, experience, or expert opinion, or based on convenience of
mathematical form. Bayesian statistics provides a theoretical framework for incorporating such
subjectivity into a rigorous analysis: we specify a prior probability distribution (which can be
subjective), and then update this distribution based on empirical data.
Complexity
In general, model complexity involves a trade-off between simplicity and accuracy of the model.
Occam's razor is a principle particularly relevant to modeling; the essential idea being that
among models with roughly equal predictive power, the simplest one is the most desirable.
While added complexity usually improves the realism of a model, it can make the model difficult
to understand and analyze, and can also pose computational problems, including numerical
instability. Thomas Kuhn argues that as science progresses, explanations tend to become more
complex before a paradigm shift offers radical simplification.
For example, when modeling the flight of an aircraft, we could embed each mechanical part of
the aircraft into our model and would thus acquire an almost white-box model of the system.
However, the computational cost of adding such a huge amount of detail would effectively
inhibit the usage of such a model. Additionally, the uncertainty would increase due to an overly
complex system, because each separate part induces some amount of variance into the model. It
is therefore usually appropriate to make some approximations to reduce the model to a sensible
size. Engineers often can accept some approximations in order to get a more robust and simple
model. For example, Newton's classical mechanics is an approximated model of the real world.
Still, Newton's model is quite sufficient for most ordinary-life situations, that is, as long as
particle speeds are well below the speed of light, and we study macro-particles only.
Training
Any model which is not pure white-box contains some parameters that can be used to fit the
model to the system it is intended to describe. If the modeling is done by a neural network, the
optimization of parameters is called training. In more conventional modeling through explicitly
given mathematical functions, parameters are determined by curve fitting.
Model evaluation
A crucial part of the modeling process is the evaluation of whether or not a given mathematical
model describes a system accurately. This question can be difficult to answer as it involves
several different types of evaluation.
Usually the easiest part of model evaluation is checking whether a model fits experimental
measurements or other empirical data. In models with parameters, a common approach to test
this fit is to split the data into two disjoint subsets: training data and verification data. The
training data are used to estimate the model parameters. An accurate model will closely match
the verification data even though these data were not used to set the model's parameters. This
practice is referred to as cross-validation in statistics.
Defining a metric to measure distances between observed and predicted data is a useful tool of
assessing model fit. In statistics, decision theory, and some economic models, a loss function
plays a similar role.
Assessing the scope of a model, that is, determining what situations the model is applicable to,
can be less straightforward. If the model was constructed based on a set of data, one must
determine for which systems or situations the known data is a "typical" set of data.
The question of whether the model describes well the properties of the system between data
points is called interpolation, and the same question for events or data points outside the
observed data is called extrapolation.
Philosophical considerations
Many types of modeling implicitly involve claims about causality. This is usually (but not
always) true of models involving differential equations. As the purpose of modeling is to
increase our understanding of the world, the validity of a model rests not only on its fit to
empirical observations, but also on its ability to extrapolate to situations or data beyond those
originally described in the model. One can think of this as the differentiation between qualitative
and quantitative predictions. One can also argue that a model is worthless unless it provides
some insight which goes beyond what is already known from direct investigation of the
phenomenon being studied.
An example of such criticism is the argument that the mathematical models of Optimal foraging
theory do not offer insight that goes beyond the common-sense conclusions of evolution and
other basic principles of ecology.
Graphical Method
Linear programming (LP) is an application of matrix algebra used to solve a broad class of
problems that can be represented by a system of linear equations. A linear equation is an
algebraic equation whose variable quantity or quantities are in the first power only and whose
graph is a straight line. LP problems are characterized by an objective function that is to be
maximized or minimized, subject to a number of constraints. Both the objective function and the
constraints must be formulated in terms of a linear equality or inequality. Typically; the objective
function will be to maximize profits (e.g., contribution margin) or to minimize costs (e.g.,
variable costs).. The following assumptions must be satisfied to justify the use of linear
programming:
Linearity. All functions, such as costs, prices, and technological require-ments, must be
linear in nature.
Certainty. All parameters are assumed to be known with certainty.
Nonnegativity. Negative values of decision variables are unacceptable.
Graphical method
Simplex method
The graphical method is limited to LP problems involving two decision variables and a limited
number of constraints due to the difficulty of graphing and evaluating more than two decision
variables. This restriction severely limits the use of the graphical method for real-world
problems. The graphical method is presented first here, however, because it is simple and easy to
understand and it is a very good learning tool.
The computer-based simplex method is much more powerful than the graphical method and
provides the optimal solution to LP problems containing thousands of decision variables and
constraints. It uses an iterative algorithm to solve for the optimal solution. Moreover, the simplex
method provides information on slack variables (unused resources) and shadow prices
(opportunity costs) that is useful in performing sensitivity analysis. Excel uses a special version
of the simplex method, as will be discussed later.
Constructing the Linear Programming Problem for Maximization of the Objective Function
STEP 1: DEFINE THE DECISION VARIABLES. In any LP problem, the decision variables
should completely describe the decisions to be made. Bridgeway must decide how many printers
and keyboards should be manufactured each week. With this in mind, the decision variables are
defined as follows:
STEP 2: DEFINE THE OBJECTIVE FUNCTION. The objective function represents the goal
that management is trying to achieve. The goal in Bridgeway's case is to maximize (max) total
contribution margin. For each printer that is sold, $30 in contribution margin will be realized. For
each keyboard that is sold, $20 in contribution margin will be realized. Thus, the total
contribution margin for Bridgeway can be expressed by the following objective function
equation:
where the variable Z denotes the objective function value of any LP problem. In the Bridgeway
case, Z equals the total contribution margin that will be realized when an optimal mix of
products X (printer) and Y (keyboard) is manufactured and sold.
Constraint 1. Each week, no more than 1,000 hours of soldering time may be used. Thus,
constraint l may be expressed by:
2X + Y ≤ 1,000
because it takes 2 hours of soldering to produce one printer and 1 hour of soldering to produce
one keyboard. The inequality sign means that the total soldering time for both products X and Y
cannot exceed the 1,000 soldering hours available, but could he less than the available hours.
Constraint 2. Each week, no more than 800 hours of assembling time may be used. Thus,
constraint 2 may be expressed by:
X + Y ≤ 800
Constraint 3. Because of limited demand, at most 350 printers should be produced each week.
This constraint can be expressed as follows:
X ≤ 350
X >= 0
Y >= 0
These four steps and the formulation of the LP problem for Bridgeway are summarized in
Exhibit 16-1. This LP problem provides the necessary data to develop a graphical solution.
The following are two of the most basic concepts associated with LP:
Feasible region
Optimal solution
Step 1. Graphically determine the feasible region. Step 2. Search for the optimal solution.
For a point (X, Y) to be in the feasible region, (X, Y) must satisfy all the above inequalities. A
graph containing these constraint equations is shown in Exhibit 16-2. Note that the only points
satisfying the nonnegativity constraints are the points in the first quadrant of the X, Y plane. This
is indicated by the arrows pointing to the right from the y-axis and upward from the x-axis. Thus,
any point that is outside the first quadrant cannot be in the feasible region.
Feasible Region for the Bridgeway Problem
In plotting equation 2X + Y <= 1,000 on the graph, the following questions are asked: How
much of product X could be produced if all resources were allocated to it? In this equation, a
total of 1,000 hours of soldering time is available. If all 1,000 hours are allocated to product X,
500 printers can be produced each week. On the other hand, how much of product Y could be
produced if all resources were allocated to it? If all 1,000 soldering hours are allocated to
produce Y, then 1,000 keyboards can be produced each week. Thus, the line on the graph
expressing the soldering time constraint equation extends from the 500-unit point A on the x-axis
to the 1,000-unit point B on the y-axis.
The equation associated with the assembling capacity constraint has been plotted on the graph in
a similar manner. If 800 assembling hours are allocated to product X, then 800 printers can be
produced. If, on the other hand, 800 assembling hours are allocated to product Y, then 800
keyboards can be produced. This analysis results in line CD.
Since equation X < 350 concerns only product X, the line expressing the equation on the graph
does not touch the y-axis at all. It extends from the 350-unit point E on the x-axis and runs
parallel to the y-axis, thereby signifying that regardless of the number of units of X produced, no
more than 350 units of X can ever be sold.
Exhibit 16-2 shows that the set of points in the quadrant that satisfies all constraints is bounded
by the five-sided polygon HDGFE. Any point on this polygon or in its interior is in the feasible
region. Any other point fails to satisfy at least one of the inequalities and thus falls outside the
feasible region.
STEP 2: SEARCH FOR THE OPTIMAL SOLUTION. Having identified the feasible region for
the Bridgeway case, we now search for the optimal solution, which will be the point in the
feasible region that maximizes the objective function. In Bridgeway's case, this is:
A graph showing the isoprofit lines for Bridgeway Company appears in Exhibit 16-3. The
isoprofit lines are broken to differentiate them from the lines that form the feasible region. To
draw an isoprofit line, any Z-value is chosen, then the x- and y-intercepts are calculated. For
example, a contribution margin value of $6,000 gives a line with intercepts at 200 printers and
300 keyboards:
In practice, the simplex algorithm is quite efficient and can be guaranteed to find the global
optimum if certain precautions against cycling are taken. The simplex algorithm has been proved
to solve "random" problems efficiently, i.e. in a cubic number of steps, which is similar to its
behavior on practical problems.
However, the simplex algorithm has poor worst-case behavior: Klee and Minty constructed a
family of linear programming problems for which the simplex method takes a number of steps
exponential in the problem size. In fact, for some time it was not known whether the linear
programming problem was solvable in polynomial time, i.e. of complexity class P.
Criss-cross algorithm
Like the simplex algorithm of Dantzig, the criss-cross algorithm is a basis-exchange algorithm
that pivots between bases. However, the criss-cross algorithm need not maintain feasibility, but
can pivot rather from a feasible basis to an infeasible basis. The criss-cross algorithm does not
have polynomial time-complexity for linear programming. Both algorithms visit all 2 corners of
a (perturbed) cube in dimension D, the Klee–Minty cube, in the worst case.
Interior point
In contrast to the simplex algorithm, which finds an optimal solution by traversing the edges
between vertices on a polyhedral set, interior-point methods move through the interior of the
feasible region.
Ellipsoid algorithm, following Khachiyan
This is the first worst-case polynomial-time algorithm for linear programming. To solve a
problem which has n variables and can be encoded in L input bits, this algorithm uses O(nL)
pseudo-arithmetic operations on numbers with O(L) digits. Khachiyan's algorithm and his long
standing issue was resolved by Leonid Khachiyan in 1979 with the introduction of the ellipsoid
method. The convergence analysis have (real-number) predecessors, notably the iterative
methods developed by Naum Z. Shor and the approximation algorithms by Arkadi Nemirovski
and D. Yudin.
However, Khachiyan's algorithm inspired new lines of research in linear programming. In 1984,
N. Karmarkar proposed a projective method for linear programming. Karmarkar's algorithm
improved on Khachiyan's worst-case polynomial bound (giving ). Karmarkar claimed that his
algorithm was much faster in practical LP than the simplex method, a claim that created great
interest in interior-point methods. Since Karmarkar's discovery, many interior-point methods
have been proposed and analyzed.
Affine scaling
Main article: Affine scaling
Affine scaling is one of the oldest interior point methods to be developed. It was developed in the
Soviet Union in the mid-1960s, but didn't receive much attention until the discovery of
Karmarkar's algorithm, after which affine scaling was reinvented multiple times and presented as
a simplified version of Karmarkar's. Affine scaling amounts to doing gradient descent steps
within the feasible region, while rescaling the problem to make sure the steps move toward the
optimum faster.
Path-following algorithms
For both theoretical and practical purposes, barrier function or path-following methods have
been the most popular interior point methods since the 1990s.
The current opinion is that the efficiency of good implementations of simplex-based methods
and interior point methods are similar for routine applications of linear programming. However,
for specific types of LP problems, it may be that one type of solver is better than another
(sometimes much better), and that the structure of the solutions generated by interior point
methods versus simplex-based methods are significantly different with the support set of active
variables being typically smaller for the later one.
LP solvers are in widespread use for optimization of various problems in industry, such as
optimization of flow in transportation networks.
Covering and packing LPs can be solved approximately in nearly-linear time. That is, if matrix A
is of dimension n×m and has N non-zero entries, then there exist algorithms that run in time
O(N·(log N)/ε) and produce O(1±ε) approximate solutions to given covering and packing LPs.
The best known sequential algorithm of this kind runs in time O(N + (log N)·(n+m)/ε), and the
best known parallel algorithm of this kind runs in O((log N)/ε) iterations, each requiring only a
matrix-vector multiplication which is highly parallelizable.
x becomes X1
y becomes X2
As the independent terms of all restrictions are positive no further action is required. Otherwise
there would be multiplied by "-1" on both sides of the inequality (noting that this operation also
affects the type of restriction).
Normalize restrictions.
The inequalities become equations by adding slack, surplus and artificial variables as the
following table:
≥ - surplus + artificial
= + artificial
≤ + slack
In this case, a slack variable (X3, X4 and X5) is introduced in each of the restrictions of ≤ type,
to convert them into equalities, resulting the system of linear equations:
2·X1 + X2 + X3 = 18
2·X1 + 3·X2 + X4 = 42
3·X1 + X2 + X5 = 24
The initial tableau of Simplex method consists of all the coefficients of the decision variables of
the original problem and the slack, surplus and artificial variables added in second step (in
columns, with P0 as the constant term and Pi as the coefficients of the rest of Xi variables), and
constraints (in rows). The Cb column contains the coefficients of the variables that are in the
base.
The first row consists of the objective function coefficients, while the last row contains the
objective function value and reduced costs Zj - Cj.
The last row is calculated as follows: Zj = Σ(Cbi·Pj) for i = 1..m, where if j = 0, P0 = bi and C0 =
0, else Pj = aij. Although this is the first tableau of the Simplex method and all Cb are null, so the
calculation can simplified, and by this time Zj = -Cj.
3 2 0 0 0
Base Cb P0 P1 P2 P3 P4 P5
P3 0 18 2 1 1 0 0
P4 0 42 2 3 0 1 0
P5 0 24 3 1 0 0 1
Z 0 -3 -2 0 0 0
Stopping condition.
If the objective is to maximize, when in the last row (indicator row) there is no negative value
between discounted costs (P1 columns below) the stop condition is reached.
In that case, the algorithm reaches the end as there is no improvement possibility. The Z value
(P0 column) is the optimal solution of the problem.
Another possible scenario is all values are negative or zero in the input variable column of the
base. This indicates that the problem is not limited and the solution will always be improved.
First, input base variable is determined. For this, column whose value in Z row is the lesser of all
the negatives is chosen. In this example it would be the variable X1 (P1) with -3 as coefficient.
If there are two or more equal coefficients satisfying the above condition (case of tie), then
choice the basic variable.
The column of the input base variable is called pivot column (in green color).
Once obtained the input base variable, the output base variable is determined. The decision is
based on a simple calculation: divide each independent term (P0 column) between the
corresponding value in the pivot column, if both values are strictly positive (greater than zero).
The row whose result is minimum score is chosen.
If there is any value less than or equal to zero, this quotient will not be performed. If all values of
the pivot column satisfy this condition, the stop condition will be reached and the problem has an
unbounded solution (see Simplex method theory).
The term of the pivot column which led to the lesser positive quotient in the previous division
indicates the row of the slack variable leaving the base. In this example, it is X5 (P5), with 3 as
coefficient. This row is called pivot row (in green).
If two or more quotients meet the choosing condition (case of tie), other than that basic variable
is chosen (wherever possible).
The intersection of pivot column and pivot row marks the pivot value, in this example, 3.
Update tableau.
New value = Previous value - (Previous value in pivot column * New value in pivot row)
So the pivot is normalized (its value becomes 1), while the other values of the pivot column are
canceled (analogous to the Gauss-Jordan method).
Previous P4 row 42 2 3 0 1 0
- - - - - -
x x x x x x
= = = = = =
3 2 0 0 0
Base Cb P0 P1 P2 P3 P4 P5
P3 0 2 0 1/3 1 0 -2/3
P4 0 26 0 7/3 0 1 -2/3
P1 3 8 1 1/3 0 0 1/3
Z 24 0 -1 0 0 1
When checking the stop condition is observed which is not fulfilled since there is one negative value in
the last row, -1. So, continue iteration steps 6 and 7 again.
6.1. The input base variable is X2 (P2), since it is the variable that corresponds to the column
where the coefficient is -1.
6.2. To calculate the output base variable, the constant terms P0 column) are divided by the
terms of the new pivot column: 2 / 1/3 [=6] , 26 / 7/3 [=78/7] and 8 / 1/3 [=24]. As the lesser
positive quotient is 6, the output base variable is X3 (P3).
6.3. The new pivot is 1/3.
7. Updating the values of tableau again is obtained:
3 2 0 0 0
Base Cb P0 P1 P2 P3 P4 P5
P2 2 6 0 1 3 0 -2
P4 0 12 0 0 -7 1 4
P1 3 6 1 0 -1 0 1
Z 30 0 0 3 0 -1
Checking again the stop condition reveals that the pivot row has one negative value, -1. It means that
optimal solution is not reached yet and we must continue iterating (steps 6 and 7):
6.1. The input base variable is X5 (P5), since it is the variable that corresponds to the column
where the coefficient is -1.
6.2. To calculate the output base variable, the constant terms (P0) are divided by the terms of
the new pivot column: 6/(-2) [=-3] , 12/4 [=3] , and 6/1 [=6]. In this iteration, the output base
variable is X4 (P4).
6.3. The new pivot is 4.
7. Updating the values of tableau again is obtained:
3 2 0 0 0
Base Cb P0 P1 P2 P3 P4 P5
P2 2 12 0 1 -1/2 1/2 0
P5 0 3 0 0 -7/4 1/4 1
P1 3 3 1 0 3/4 -1/4 0
Z 33 0 0 5/4 1/4 0
End of algorithm.
It is noted that in the last row, all the coefficients are positive, so the stop condition is fulfilled.
The optimal solution is given by the val-ue of Z in the constant terms column (P0 column), in the
example: 33. In the same column, the point where it reaches is shown, watching the
corresponding rows of input decision variables: X1 = 3 and X2 = 12.
Chapter 8
Network Analysis
The program (or project) evaluation and review technique, commonly abbreviated PERT, is a statistical
tool, used in project management, which was designed to analyze and represent the tasks involved in
completing a given project. First developed by the United States Navy in the 1950s, it is commonly used
in conjunction with the critical path method (CPM).
Fig: PERT network chart for a seven-month project with five milestones (10 through 50) and six
activities (A through F).
History
Program evaluation and review technique
The Navy's Special Projects Office, charged with developing the Polaris-Submarine weapon
system and the Fleet Ballistic Missile capability, has developed a statistical technique for
measuring and forecasting progress in research and development programs. This program
evaluation and review technique (code-named PERT) is applied as a decision-making tool
designed to save time in achieving end-objectives, and is of particular interest to those engaged
in research and development programs for which time is a critical factor.
The new technique takes recognition of three factors that influence successful achievement of
research and development program objectives: time, resources, and technical performance
specifications. PERT employs time as the variable that reflects planned resource-applications and
performance specifications. With units of time as a common denominator, PERT quantifies
knowledge about the uncertainties involved in developmental programs requiring effort at the
edge of, or beyond, current knowledge of the subject – effort for which little or no previous
experience exists.
Through an electronic computer, the PERT technique processes data representing the major,
finite accomplishments (events) essential to achieve end-objectives; the inter-dependence of
those events; and estimates of time and range of time necessary to complete each activity
between two successive events. Such time expectations include estimates of "most likely time",
"optimistic time", and "pessimistic time" for each activity. The technique is a management
control tool that sizes up the outlook for meeting objectives on time; highlights danger signals
requiring management decisions; reveals and defines both methodicalness and slack in the flow
plan or the network of sequential activities that must be performed to meet objectives; compares
current expectations with scheduled completion dates and computes the probability for meeting
scheduled dates; and simulates the effects of options for decision – before decision.
The concept of PERT was developed by an operations research team staffed with representatives
from the Operations Research Department of Booz, Allen and Hamilton; the Evaluation Office of the
Lockheed Missile Systems Division; and the Program Evaluation Branch, Special Projects Office, of the
Department of the Navy.
— Willard Fazar (Head, Program Evaluation Branch, Special Projects Office, U. S. Navy), The American
Statistician, April 1959.
Overview
PERT is a method to analyze the involved tasks in completing a given project, especially the
time needed to complete each task, and to identify the minimum time needed to complete the
total project.
PERT was developed primarily to simplify the planning and scheduling of large and complex
projects. It was developed for the U.S. Navy Special Projects Office in 1957 to support the U.S.
Navy's Polaris nuclear submarine project. It was able to incorporate uncertainty by making it
possible to schedule a project while not knowing precisely the details and durations of all the
activities. It is more of an event-oriented technique rather than start- and completion-oriented,
and is used more in projects where time is the major factor rather than cost. It is applied to very
large-scale, one-time, complex, non-routine infrastructure and Research and Development
projects. An example of this was for the 1968 Winter Olympics in Grenoble which applied PERT
from 1965 until the opening of the 1968 Games.
This project model was the first of its kind, a revival for scientific management, founded by
Frederick Taylor (Taylorism) and later refined by Henry Ford (Fordism). DuPont's critical path
method was invented at roughly the same time as ′PERT′.
Implementation
The first step to scheduling the project is to determine the tasks that the project requires and the
order in which they must be completed. The order may be easy to record for some tasks (e.g.
When building a house, the land must be graded before the foundation can be laid) while
difficult for others (There are two areas that need to be graded, but there are only enough
bulldozers to do one). Additionally, the time estimates usually reflect the normal, non-rushed
time. Many times, the time required to execute the task can be reduced for an additional cost or a
reduction in the quality.
In the following example there are seven tasks, labeled A through G. Some tasks can be done
concurrently (A and B) while others cannot be done until their predecessor task is complete (C
cannot begin until A is complete). Additionally, each task has three time estimates: the optimistic
time estimate (o), the most likely or normal time estimate (m), and the pessimistic time estimate
(p). The expected time (te) is computed using the formula (o + 4m + p) ÷ 6.
A network diagram can be created by hand or by using diagram software. There are two types of
network diagrams, activity on arrow (AOA) and activity on node (AON). Activity on node diagrams are
generally easier to create and interpret. To create an AON diagram, it is recommended (but not
required) to start with a node named start. This "activity" has a duration of zero (0). Then you draw each
activity that does not have a predecessor activity (a and b in this example) and connect them with an
arrow from start to each node. Next, since both c and d list a as a predecessor activity, their nodes are
drawn with arrows coming from a. Activity e is listed with b and c as predecessor activities, so node e is
drawn with arrows coming from both b and c, signifying that e cannot begin until both b and c have
been completed. Activity f has d as a predecessor activity, so an arrow is drawn connecting the activities.
Likewise, an arrow is drawn from e to g. Since there are no activities that come after f or g, it is
recommended (but again not required) to connect them to a node labeled finish.
By itself, the network diagram pictured above does not give much more information than a Gantt
chart; however, it can be expanded to display more information. The most common information
shown is:
In order to determine this information it is assumed that the activities and normal duration times
are given. The first step is to determine the ES and EF. The ES is defined as the maximum EF of
all predecessor activities, unless the activity in question is the first activity, for which the ES is
zero (0). The EF is the ES plus the task duration (EF = ES + duration).
The ES for start is zero since it is the first activity. Since the duration is zero, the EF is also zero.
This EF is used as the ES for a and b.
The ES for a is zero. The duration (4 work days) is added to the ES to get an EF of four. This EF is
used as the ES for c and d.
The ES for b is zero. The duration (5.33 work days) is added to the ES to get an EF of 5.33.
The ES for c is four. The duration (5.17 work days) is added to the ES to get an EF of 9.17.
The ES for d is four. The duration (6.33 work days) is added to the ES to get an EF of 10.33. This
EF is used as the ES for f.
The ES for e is the greatest EF of its predecessor activities (b and c). Since b has an EF of 5.33 and
c has an EF of 9.17, the ES of e is 9.17. The duration (5.17 work days) is added to the ES to get an
EF of 14.34. This EF is used as the ES for g.
The ES for f is 10.33. The duration (4.5 work days) is added to the ES to get an EF of 14.83.
The ES for g is 14.34. The duration (5.17 work days) is added to the ES to get an EF of 19.51.
The ES for finish is the greatest EF of its predecessor activities (f and g). Since f has an EF of
14.83 and g has an EF of 19.51, the ES of finish is 19.51. Finish is a milestone (and therefore has
a duration of zero), so the EF is also 19.51.
Barring any unforeseen events, the project should take 19.51 work days to complete. The next
step is to determine the late start (LS) and late finish (LF) of each activity. This will eventually
show if there are activities that have slack. The LF is defined as the minimum LS of all successor
activities, unless the activity is the last activity, for which the LF equals the EF. The LS is the LF
minus the task duration (LS = LF − duration).
The LF for finish is equal to the EF (19.51 work days) since it is the last activity in the project.
Since the duration is zero, the LS is also 19.51 work days. This will be used as the LF for f and g.
The LF for g is 19.51 work days. The duration (5.17 work days) is subtracted from the LF to get
an LS of 14.34 work days. This will be used as the LF for e.
The LF for f is 19.51 work days. The duration (4.5 work days) is subtracted from the LF to get an
LS of 15.01 work days. This will be used as the LF for d.
The LF for e is 14.34 work days. The duration (5.17 work days) is subtracted from the LF to get
an LS of 9.17 work days. This will be used as the LF for b and c.
The LF for d is 15.01 work days. The duration (6.33 work days) is subtracted from the LF to get
an LS of 8.68 work days.
The LF for c is 9.17 work days. The duration (5.17 work days) is subtracted from the LF to get an
LS of 4 work days.
The LF for b is 9.17 work days. The duration (5.33 work days) is subtracted from the LF to get an
LS of 3.84 work days.
The LF for a is the minimum LS of its successor activities. Since c has an LS of 4 work days and d
has an LS of 8.68 work days, the LF for a is 4 work days. The duration (4 work days) is subtracted
from the LF to get an LS of 0 work days.
The LF for start is the minimum LS of its successor activities. Since a has an LS of 0 work days and
b has an LS of 3.84 work days, the LS is 0 work days.
The next step is to determine the critical path and if any activities have slack. The critical path is
the path that takes the longest to complete. To determine the path times, add the task durations
for all available paths. Activities that have slack can be delayed without changing the overall
time of the project. Slack is computed in one of two ways, slack = LF − EF or slack = LS − ES.
Activities that are on the critical path have a slack of zero (0).
The critical path is aceg and the critical time is 19.51 work days. It is important to note that there
can be more than one critical path (in a project more complex than this example) or that the
critical path can change. For example, let's say that activities d and f take their pessimistic (b)
times to complete instead of their expected (TE) times. The critical path is now adf and the
critical time is 22 work days. On the other hand, if activity c can be reduced to one work day, the
path time for aceg is reduced to 15.34 work days, which is slightly less than the time of the new
critical path, beg (15.67 work days).
Assuming these scenarios do not happen, the slack for each activity can now be determined.
Start and finish are milestones and by definition have no duration, therefore they can have no
slack (0 work days).
The activities on the critical path by definition have a slack of zero; however, it is always a good
idea to check the math anyway when drawing by hand.
o LFa – EFa = 4 − 4 = 0
o LFc – EFc = 9.17 − 9.17 = 0
o LFe – EFe = 14.34 − 14.34 = 0
o LFg – EFg = 19.51 − 19.51 = 0
Activity b has an LF of 9.17 and an EF of 5.33, so the slack is 3.84 work days.
Activity d has an LF of 15.01 and an EF of 10.33, so the slack is 4.68 work days.
Activity f has an LF of 19.51 and an EF of 14.83, so the slack is 4.68 work days.
Therefore, activity b can be delayed almost 4 work days without delaying the project. Likewise,
activity d or activity f can be delayed 4.68 work days without delaying the project (alternatively,
d and f can be delayed 2.34 work days each).
Advantages
PERT chart explicitly defines and makes visible dependencies (precedence relationships)
between the work breakdown structure (commonly WBS) elements.
PERT facilitates identification of the critical path and makes this visible.
PERT facilitates identification of early start, late start, and slack for each activity.
PERT provides for potentially reduced project duration due to better understanding of
dependencies leading to improved overlapping of activities and tasks where feasible.
The large amount of project data can be organized & presented in diagram for use in decision
making.
PERT can provide a probability of completing before a given time.
Disadvantages
There can be potentially hundreds or thousands of activities and individual dependency
relationships.
PERT is not easily scalable for smaller projects.
The network charts tend to be large and unwieldy requiring several pages to print and requiring
specially sized paper.
The lack of a timeframe on most PERT/CPM charts makes it harder to show status although
colours can help (e.g., specific colour for completed nodes).
One possible method to maximize solution robustness is to include safety in the baseline
schedule in order to absorb the anticipated disruptions. This is called proactive scheduling. A
pure proactive scheduling is a utopia; incorporating safety in a baseline schedule which allows
for every possible disruption would lead to a baseline schedule with a very large make-span. A
second approach, termed reactive scheduling, consists of defining a procedure to react to
disruptions that cannot be absorbed by the baseline schedule.
Both CPM and PERT are derivatives of the Gantt chart and, as a result, are very similar. There
were originally two primary differences between CPM and PERT. With CPM a single estimate
for activity time was used that did not allow for any variation in activity times--activity times
were treated as if they were known for certain, or "deterministic." With PERT, multiple time
estimates were used for each activity that allowed for variation in activity times--activity times
were treated as "probabilistic." The other difference was related to the mechanics of drawing the
project network. In PERT activities were represented as arcs, or arrowed lines, between two
nodes, or circles, whereas in CPM activities were represented as the nodes or circles. However,
over time CPM and PERT have been effectively merged into a single technique conventionally
referred to as CPM/PERT.
The advantage of CPM/PERT over the Gantt chart is in the use of a network to depict the
precedence relationships between activities. The Gantt chart does not clearly show precedence
relationships, which is a disadvantage that limited its use to small projects. The CPM/PERT
network is a more efficient and direct means of displaying precedence relationships. In other
words, in a network it is visually easier to see the precedence relationships, which makes
CPM/PERT popular with managers and other users, especially for large projects with many
activities.
Critical Path Analysis
Many larger businesses get involved in projects that are complex and involve significant
investment and risk. As the complexity and risk increases it becomes even more necessary to
identify the relationships between the activities involved and to work out the most efficient way
of completing the project.
The essential technique for using CPA is to construct a model of the project that includes the
following:
This process determines which activities are "critical" (i.e., on the longest path) and which have
"total float" (i.e. can be delayed without making the project longer).
The sequence of project activities which add up to the longest overall duration
The critical path determines the shortest time possible to complete the project.
Any delay of an activity on the critical path directly impacts the planned project completion date
(i.e. there is no float on the critical path).
Illustration of CPA
Here is worked example to illustrate how the critical path for a project is determined.
Consider the following series of activities in a business planning to launch a new product:
Laid out in the correct sequence of activities, the network diagram would look like this before we
calculate the EST and LFT for each activity:
For example:
The EST for task B is 2 months – the time taken to conduct market research (task A)
To calculate the EST for task C, we add the 2 months for task A to the 4 months for designing
the product concept (task B) = 6 months
Evaluating CPA
The main advantages and disadvantages of a business using CPA can be summarised as follows:
Advantages of CPA
Most importantly – helps reduce the risk and costs of complex projects
Encourages careful assessment of the requirements of each activity in a project
Help spot which activities have some slack ("float") and could therefore transfer some
resources = better allocation of resources
A decision-making tool and a planning tool – all in one!
Provides managers with a useful overview of a complex project
Links well with other aspects of business planning, including cash flow forecasting and
budgeting
Disadvantages of CPA
A central weakness of both PERT and CPM is the inability to deal with resource
dependencies As discussed in chapter 1, resource dependencies are those that concern the
availability of resources whether they are human, mechanical or fiscal (PERT/CPM considers
only causal dependencies, the completion of a prior task). PERT/CPM also assumes that
additional resources can be shifted to a project as required. Because, in the real world, all
projects have finite resources to draw on the estimates and expectations are frequently skewed.
Because of this weakness, a significant portion of the PM community believes that PERT/CPM
creates unrealistic expectations, at best. As a result, management of projects using only
PERT/CPM can be difficult and frustrating for worker, Project Managers and stakeholders alike.
A newly emerging (within the last 10 years) methodology is Critical Chain Project
Management (CCPM), also referred to as Theory of Constraints. In essence, CCPM focuses
on managing constraints, the relationship between tasks within a project and resources within
project. By actively managing these “hotspots” it is believed that CCPM decreases project
conflict and tension and provides a more balanced expectation. Though an interesting theory,
CCPM is largely unproven and appears to be most applicable in projects concerning highly
dynamic tasks that can be grouped in modules. Module structure groups tasks where the
completion of a module delivers some degree of function that can be used regardless of the status
of the remainder of the project. An example would be software development, where a subroutine
that is common to many applications can be completed and useful without the entire project is
completed. Because the relationship between modules is not as critical, the modules themselves
can be re-planned and re-scheduled as necessary, adding a degree of efficiency and decreasing
conflict within a project or between projects. CCPM also focuses on overall project progress
instead of individual task progress. A perceived strength of CCPM is that it is based on an
absence of multi-tasking; a single resource is only assigned to a single task/project. A relatively
humanistic approach, CCPM calculations also account for the inconsistent nature of human
performance (good days, bad days, sick time, training needed, etc). CCPM estimates are much
broader (50% probability, 90% probability, etc) and deal exclusively with a single “normal”
completion date of the project as a whole. As such, it is believed that by identifying and grouping
tasks and limiting constraints the project becomes more manageable while providing incremental
value. Critics of CCPM argue that its assumptions (absence of multitasking, tasks may be
grouped into semi-independent yet value-filled groups) create unrealistic expectations. In any
event, CCPM seems applicable only in those industries where incremental progress can deliver
incremental value or function. Clearly, only completing one wing of an airplane, 2 walls of a
house or 1/3 of a city-wide traffic risk assessment would provide little value, so CCPM has
found little acceptance outside of very specific hi-tech business areas.
The second method in use is a variation of PERT called Earned Value Method,
introduced by the Department of Defense in the mid 60s. In the business world this method is
synonymous with ROI (Return On Investment).Simply put, it examines the relationship between
the cost of doing something and the value received by doing it. Earned value does not
concentrate on probability of completion at a specific time, nor does it deal with a specific time
or range of times, though a by –product of the analysis is a constantly moving completion
projection. It tracks tasks and the project as a whole in terms of money by analysis that answers 3
specific questions:
1) How does the cost of work performed compare to the value of the work
performed?
3) How does the amount of money spent so far on a project compare to what should
have been spent?
By far, the most common method used is PERT/CPM. The remainder of this unit will
focus on introducing basic methods and calculations in use. As discussed in Unit 1, PERT is
based on a beta distribution that is useful in real-world planning because it accounts for a degree
of randomness (that all humans bring to the table). Based on its theoretical model, PERT delivers
a task or project completion estimate based on pessimistic, optimistic and most likely estimates
provided by the user. PERT also provides a probability of completion on any date selected by the
user. PERT calculations are simple and straightforward, but tend to get lengthy when many tasks
are used. Before the task calculations can be made, however, 2 steps must be taken in any
project planning:
1) Define the goal of the project and the tasks required to complete it
2) Place tasks in a logical order and determine the critical path (it is helpful to
diagram the tasks)
a. The critical path is the longest time path through the network of tasks
When these steps are complete, generate a set of duration estimates for each task; each set
should contain a pessimistic, most likely and optimistic estimate. To keep the estimates straight,
it is useful to label pessimistic estimates as TP, optimistic estimates as TO and most likely
estimates as TL (any labeling system can be used, but these are fairly intuitive).For each task,
calculate the PERT derived expected duration (TE) based on a formula, (TP + 4 TL + TO) / 6 = TE
1) Read this formula as the sum of pessimistic plus 4 times likely plus optimistic
divided by 6 equals the expected duration
2) Compete this calculation for all tasks; making sure to group tasks on the critical
path separately
a. The critical path is the longest time path through the network of tasks
b. The sum of duration of tasks on the critical path will determine the project
duration
A second set of calculations are necessary to determine information that will be useful
later in the process. These calculations will yield the Standard Deviation (SD) and Variance (V)
for each task duration. The SD is the average deviation form the estimated time; as a general
rule, the higher the SD is the greater amount of uncertainty exists. The V reflects the spread of a
value over a normal distribution. The SD and V will be useful in determining the probability of
the project meeting a desired completion date. The formulae for calculating SD and V are:
1) SD=(TP-T0)/6 {read as (pessimistic-optimistic)/6}
3) Compete this calculation for all tasks; making sure to group tasks on the critical
path separately
c. The critical path is the longest time path through the network of tasks
d. The sum of duration of tasks on the critical path will determine the project
duration
Theory of Crashing
Shifting resources to reduce slack time so the critical path is as short as possible. Crashing always raises
project costs and is typically disruptive – a project should be crashed with caution. The goal of crashing a
project is to reduce the duration as much as possible regardless of cost. It is the opposite of relaxing a
project.
Chapter 9
Correlation
Coefficient of variation
A coefficient of variation (CV) can be calculated and interpreted in two different settings:
analyzing a single variable and interpreting a model. The standard formulation of the CV, the
ratio of the standard deviation to the mean, applies in the single variable setting. In the modeling
setting, the CV is calculated as the ratio of the root mean squared error (RMSE) to the mean of
the dependent variable. In both settings, the CV is often presented as the given ratio multiplied
by 100. The CV for a single variable aims to describe the dispersion of the variable in a way that
does not depend on the variable's measurement unit. The higher the CV, the greater the
dispersion in the variable. The CV for a model aims to describe the model fit in terms of the
relative sizes of the squared residuals and outcome values. The lower the CV, the smaller the
residuals relative to the predicted value. This is suggestive of a good model fit.
The CV for a variable can easily be calculated using the information from a typical variable
summary (and sometimes the CV will be returned by default in the variable summary). We
demonstrate below how to calculate the CV in Stata.
Advantages
The advantage of the CV is that it is unitless. This allows CVs to be compared to each other in
ways that other measures, like standard deviations or root mean squared residuals, cannot be.
In the variable CV setting: The standard deviations of two variables, while both measure
dispersion in their respective variables, cannot be compared to each other in a meaningful way to
determine which variable has greater dispersion because they may vary greatly in their units and
the means about which they occur. The standard deviation and mean of a variable are expressed
in the same units, so taking the ratio of these two allows the units to cancel. This ratio can then
be compared to other such ratios in a meaningful way: between two variables (that meet the
assumptions outlined below), the variable with the smaller CV is less dispersed than the variable
with the larger CV.
In the model CV setting: Similarly, the RMSE of two models both measure the magnitude of the
residuals, but they cannot be compared to each other in a meaningful way to determine which
model provides better predictions of an outcome. The model RMSE and mean of the predicted
variable are expressed in the same units, so taking the ratio of these two allows the units to
cancel. This ratio can then be compared to other such ratios in a meaningful way: between two
models (where the outcome variable meets the assumptions outlined below), the model with the
smaller CV has predicted values that are closer to the actual values. It is interesting to note the
differences between a model's CV and R-squared values. Both are unitless measures that are
indicative of model fit, but they define model fit in two different ways: CV evaluates the relative
closeness of the predictions to the actual values while R-squared evaluates how much of the
variability in the actual values is explained by the model.
There are some requirements that must be met in order for the CV to be interpreted in the ways
we have described. The most obvious problem arises when the mean of a variable is zero. In
this case, the CV cannot be calculated. Even if the mean of a variable is not zero, but the
variable contains both positive and negative values and the mean is close to zero, then the CV
can be misleading. The CV of a variable or the CV of a prediction model for a variable can be
considered as a reasonable measure if the variable contains only positive values. This is a
definite disadvantage of CVs.
The Pearson correlation evaluates the linear relationship between two continuous
variables. A relationship is linear when a change in one variable is associated with a
proportional change in the other variable.
For example, you might use a Pearson correlation to evaluate whether increases in
temperature at your production facility are associated with decreasing thickness of your
chocolate coating.
Also called Spearman's rho, the Spearman correlation evaluates the monotonic
relationship between two continuous or ordinal variables. In a monotonic relationship, the
variables tend to change together, but not necessarily at a constant rate. The Spearman
correlation coefficient is based on the ranked values for each variable rather than the raw
data.
It is always a good idea to examine the relationship between variables with a scatterplot.
Correlation coefficients only measure linear (Pearson) or monotonic (Spearman) relationships.
Other relationships are possible.
this case.
The Windows version of Excel supports programming through Microsoft's Visual Basic for
Applications (VBA), which is a dialect of Visual Basic. Programming with VBA allows
spreadsheet manipulation that is awkward or impossible with standard spreadsheet techniques.
Programmers may write code directly using the Visual Basic Editor (VBE), which includes a
window for writing code, debugging code, and code module organization environment. The user
can implement numerical methods as well as automating tasks such as formatting or data
organization in VBA and guide the calculation using any desired intermediate results reported
back to the spreadsheet.
VBA was removed from Mac Excel 2008, as the developers did not believe that a timely release
would allow porting the VBA engine natively to Mac OS X. VBA was restored in the next
version, Mac Excel 2011, although the build lacks support for ActiveX objects, impacting some
high level developer tools.
A common and easy way to generate VBA code is by using the Macro Recorder. The Macro
Recorder records actions of the user and generates VBA code in the form of a macro. These
actions can then be repeated automatically by running the macro. The macros can also be linked
to different trigger types like keyboard shortcuts, a command button or a graphic. The actions in
the macro can be executed from these trigger types or from the generic toolbar options. The
VBA code of the macro can also be edited in the VBE. Certain features such as loop functions
and screen prompts by their own properties, and some graphical display items, cannot be
recorded, but must be entered into the VBA module directly by the programmer. Advanced users
can employ user prompts to create an interactive program, or react to events such as sheets being
loaded or changed.
Macro Recorded code may not be compatible between Excel versions. Some code that is used in
Excel 2010 can not be used in Excel 2003. Making a Macro that changes the cell colors and
making changes to other aspects of cells may not be backward compatible.
VBA code interacts with the spreadsheet through the Excel Object Model, a vocabulary
identifying spreadsheet objects, and a set of supplied functions or methods that enable reading
and writing to the spreadsheet and interaction with its users (for example, through custom
toolbars or command bars and message boxes). User-created VBA subroutines execute these
actions and operate like macros generated using the macro recorder, but are more flexible and
efficient.
MINITAB
Linear regression: Oldest type of regression, designed 250 years ago; computations (on small
data) could easily be carried out by a human being, by design. Can be used for interpolation, but
not suitable for predictive analytics; has many drawbacks when applied to modern data, e.g.
sensitivity to both ouliers and cross-correlations (both in the variable and observation domains),
and subject to over-fitting. A better solution is piecewise-linear regression, in particular for time
series.
Logistic regression: Used extensively in clinical trials, scoring and fraud detection, when the
response is binary (chance of succeeding or failing, e.g. for a new tested drug or a credit card
transaction). Suffers same drawbacks as linear regression (not robust, model-dependent), and
computing regression coeffients involves using complex iterative, numerically unstable
algorithm. Can be well approximated by linear regression after transforming the response (logit
transform). Some versions (Poisson or Cox regression) have been designed for a non-binary
response, for categorical data (classification), ordered integer response (age groups), and even
continuous response (regression trees).
Ridge regression: A more robust version of linear regression, putting constrainsts on regression
coefficients to make them much more natural, less subject to over-fitting, and easier to
interpret. Click here for source code.
Lasso regression: Similar to ridge regression, but automatically performs variable reduction
(allowing regression coefficients to be zero).
Ecologic regression: Consists in performing one regression per strata, if your data is segmented
into several rather large core strata, groups, or bins. Beware about the curse of big data in this
context: if you perform millions of regressions, some will be totally wrong, and the best ones will
be overshadowed by noisy ones with great but artificial goodness-of-fit: a big concern if you try
to identify extreme events and causal relationships (global warming, rare diseases or extreme
flood modeling). Here's a fix to this problem.
Regression in unusual spaces: click here for details. Example: to detect if meteorite fragments
come from a same celestial body, or to reverse-engineer Coca-Cola formula.
Logic regression: Used when all variables are binary, typically in scoring algorithms. It is a
specialized, more robust form of logistic regression (useful for fraud detection where each
variable is a 0/1 rule), where all variables have been binned into binary variables.
Bayesian regression: see entry in Wikipedia. It's a kind of penalized likehood estimator, and thus
somewhat similar to ridge regression: more flexible and stable than traditional linear regression.
It assumes that you have some prior knowledge about the regression coefficients.and the error
term - relaxing the assumption that the error must have a normal distribution (the error must
still be independent across observations). However, in practice, the prior knowledge is
translated into artificial (conjugate) priors - a weakness of this technique.
Quantile regression: Used in connection with extreme events, read Common Errors in
Statistics page 238 for details.
LAD regression: Similar to linear regression, but using absolute values (L1 space) rather than
squares (L2 space). More robust, see also our L1 metric to assess goodness-of-fit (better than
R^2) and our L1 variance (one version of which is scale-invariant).
Jackknife regression: This is the new type of regression, also used as general clustering and data
reduction technique. It solves all the drawbacks of traditional regression. It provides an
approximate, yet very accurate, robust solution to regression problems, and work well with
"independent" variables that are correlated and/or non-normal (for instance, data distributed
according to a mixture model with several modes). Ideal for black-box predictive algorithms. It
approximates linear regression quite well, but it is much more robust, and work when the
assumptions of traditional regression (non correlated variables, normal data, homoscedasticity)
are violated.
In statistics, simple linear regression is the least squares estimator of a linear regression model
with a single explanatory variable. In other words, simple linear regression fits a straight line
through the set of n points in such a way that makes the sum of squared residuals of the model
(that is, vertical distances between the points of the data set and the fitted line) as small as
possible.
The adjective simple refers to the fact that the outcome variable is related to a single predictor.
The slope of the fitted line is equal to the correlation between y and x corrected by the ratio of
standard deviations of these variables. The intercept of the fitted line is such that it passes
through the center of mass (x, y) of the data points.
Other regression methods besides the simple ordinary least squares (OLS) also exist. In
particular, when one wants to do regression by eye, one usually tends to draw a slightly steeper
line, closer to the one produced by the total least squares method. This occurs because it is more
natural for one's mind to consider the orthogonal distances from the observations to the
regression line, rather than the vertical ones as OLS method does.
In a cause and effect relationship, the independent variable is the cause, and the dependent
variable is the effect. Least squares linear regression is a method for predicting the value of a
dependent variable Y, based on the value of an independent variable X.
In this tutorial, we focus on the case where there is only one independent variable. This is called
simple regression (as opposed to multiple regression, which handles two or more independent
variables).
Tip: The next lesson presents a simple regression example that shows how to apply the material
covered in this lesson. Since this lesson is a little dense, you may benefit by also reading the next
lesson.
Prerequisites for Regression
Simple linear regression is appropriate when the following conditions are satisfied.
The dependent variable Y has a linear relationship to the independent variable X. To check this,
make sure that the XY scatterplot is linear and that the residual plot shows a random pattern.
(Don't worry. We'll cover residual plots in a future lesson.)
For each value of X, the probability distribution of Y has the same standard deviation σ. When
this condition is satisfied, the variability of the residuals will be relatively constant across all
values of X, which is easily checked in a residual plot.
o The Y values are independent, as indicated by a random pattern on the residual plot.
o The Y values are roughly normally distributed (i.e., symmetric and unimodal). A little
skewness is ok if the sample size is large. A histogram or a dotplot will show the shape of
the distribution.
Y = Β0 + Β1X
Given a random sample of observations, the population regression line is estimated by:
ŷ = b0 + b1x
where b0 is a constant, b1 is the regression coefficient, x is the value of the independent variable,
and ŷ is the predicted value of the dependent variable.
In the unlikely event that you find yourself on a desert island without a computer or a graphing
calculator, you can solve for b0 and b1 "by hand". Here are the equations.
b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)]
b1 = r * (sy / sx)
b0 = y - b1 * x
where b0 is the constant in the regression equation, b1 is the regression coefficient, r is the
correlation between x and y, xi is the X value of observation i, yi is the Y value of observation i, x
is the mean of X, y is the mean of Y, sx is the standard deviation of X, and sy is the standard
deviation of Y
The line minimizes the sum of squared differences between observed values (the y values) and
predicted values (the ŷ values computed from the regression equation).
The regression line passes through the mean of the X values (x) and through the mean of the Y
values (y).
The regression constant (b0) is equal to the y intercept of the regression line.
The regression coefficient (b1) is the average change in the dependent variable (Y) for a 1-unit
change in the independent variable (X). It is the slope of the regression line.
The least squares regression line is the only straight line that has all of these properties.
The formula for computing the coefficient of determination for a linear regression model with
one independent variable is given below.
The coefficient of determination (R) for a linear regression model with one independent variable
is:
The general purpose of multiple regression (the term was first used by Pearson, 1908) is to learn
more about the relationship between several independent or predictor variables and a dependent
or criterion variable. For example, a real estate agent might record for each listing the size of the
house (in square feet), the number of bedrooms, the average income in the respective
neighborhood according to census data, and a subjective rating of appeal of the house. Once this
information has been compiled for various houses it would be interesting to see whether and how
these measures relate to the price for which a house is sold. For example, you might learn that
the number of bedrooms is a better predictor of the price for which a house sells in a particular
neighborhood than how "pretty" the house is (subjective rating). You may also detect "outliers,"
that is, houses that should really sell for more, given their location and characteristics.
Once this so-called regression line has been determined, the analyst can now easily construct a
graph of the expected (predicted) salaries and the actual salaries of job incumbents in his or her
company. Thus, the analyst is able to determine which position is underpaid (below the
regression line) or overpaid (above the regression line), or paid equitably.
In the social and natural sciences multiple regression procedures are very widely used in
research. In general, multiple regression allows the researcher to ask (and hopefully answer) the
general question "what is the best predictor of ...". For example, educational researchers might
want to learn what are the best predictors of success in high-school. Psychologists may want to
determine which personality variable best predicts social adjustment. Sociologists may want to
find out which of the multiple social indicators best predict whether or not a new immigrant
group will adapt and be absorbed into society.
Computational Approach
The general computational problem that needs to be solved in multiple regression analysis is to
fit a straight line to a number of points.
In the simplest case - one dependent and one independent variable - you can visualize this in a
scatterplot.
Least Squares
The Regression Equation
Unique Prediction and Partial Correlation
Predicted and Residual Scores
Residual Variance and R-square
Interpreting the Correlation Coefficient R
Least Squares
A line in a two dimensional or two-variable space is defined by the equation Y=a+b*X; in full
text: the Y variable can be expressed in terms of a constant (a) and a slope (b) times the X
variable. The constant is also referred to as the intercept, and the slope as the regression
coefficient or B coefficient. For example, GPA may best be predicted as 1+.02*IQ. Thus,
knowing that a student has an IQ of 130 would lead us to predict that her GPA would be 3.6
(since, 1+.02*130=3.6).
For example, the animation below shows a two dimensional regression equation plotted with
three different confidence intervals (90%, 95% and 99%).
In the multivariate case, when there is more than one independent variable, the regression line
cannot be visualized in the two dimensional space, but can be computed just as easily. For
example, if in addition to IQ we had additional predictors of achievement (e.g., Motivation, Self-
discipline) we could construct a linear equation containing all those variables. In general then,
multiple regression procedures will estimate a linear equation of the form:
Note that in this equation, the regression coefficients (or B coefficients) represent the
independent contributions of each independent variable to the prediction of the dependent
variable. Another way to express this fact is to say that, for example, variable X1 is correlated
with the Y variable, after controlling for all other independent variables. This type of correlation
is also referred to as a partial correlation (this term was first used by Yule, 1907). Perhaps the
following example will clarify this issue. You would probably find a significant negative
correlation between hair length and height in the population (i.e., short people have longer hair).
At first this may seem odd; however, if we were to add the variable Gender into the multiple
regression equation, this correlation would probably disappear. This is because women, on the
average, have longer hair than men; they also are shorter on the average than men. Thus, after we
remove this gender difference by entering Gender into the equation, the relationship between
hair length and height disappears because hair length does not make any unique contribution to
the prediction of height, above and beyond what it shares in the prediction with variable Gender.
Put another way, after controlling for the variable Gender, the partial correlation between hair
length and height is zero.
The regression line expresses the best prediction of the dependent variable (Y), given the
independent variables (X). However, nature is rarely (if ever) perfectly predictable, and usually
there is substantial variation of the observed points around the fitted regression line (as in the
scatterplot shown earlier). The deviation of a particular point from the regression line (its
predicted value) is called the residual value.
Customarily, the degree to which two or more predictors (independent or X variables) are related
to the dependent (Y) variable is expressed in the correlation coefficient R, which is the square
root of R-square. In multiple regression, R can assume values between 0 and 1. To interpret the
direction of the relationship between variables, look at the signs (plus or minus) of the regression
or B coefficients. If a B coefficient is positive, then the relationship of this variable with the
dependent variable is positive (e.g., the greater the IQ the better the grade point average); if the B
coefficient is negative then the relationship is negative (e.g., the lower the class size the better
the average test scores). Of course, if the B coefficient is equal to 0 then there is no relationship
between the variables.
Assumption of Linearity
First of all, as is evident in the name multiple linear regression, it is assumed that the relationship
between variables is linear. In practice this assumption can virtually never be confirmed;
fortunately, multiple regression procedures are not greatly affected by minor deviations from this
assumption. However, as a rule it is prudent to always look at bivariate scatterplot of the
variables of interest. If curvature in the relationships is evident, you may consider either
transforming the variables, or explicitly allowing for nonlinear components.
Normality Assumption
It is assumed in multiple regression that the residuals (predicted minus observed values) are
distributed normally (i.e., follow the normal distribution). Again, even though most tests
(specifically the F-test) are quite robust with regard to violations of this assumption, it is always
a good idea, before drawing final conclusions, to review the distributions of the major variables
of interest. You can produce histograms for the residuals as well as normal probability plots, in
order to inspect the distribution of the residual values.
Limitations
The major conceptual limitation of all regression techniques is that you can only ascertain
relationships, but never be sure about underlying causal mechanism. For example, you would
find a strong positive relationship (correlation) between the damage that a fire does and the
number of firemen involved in fighting the blaze. Do we conclude that the firemen cause the
damage? Of course, the most likely explanation of this correlation is that the size of the fire (an
external variable that we forgot to include in our study) caused the damage as well as the
involvement of a certain number of firemen (i.e., the bigger the fire, the more firemen are called
to fight the blaze). Even though this example is fairly obvious, in real correlation research,
alternative causal explanations are often not considered.
Multiple regression is a seductive technique: "plug in" as many predictor variables as you can
think of and usually at least a few of them will come out significant. This is because you
are capitalizing on chance when simply including as many variables as you can think of as
predictors of some other variable of interest. This problem is compounded when, in addition, the
number of observations is relatively low. Intuitively, it is clear that you can hardly draw
conclusions from an analysis of 100 questionnaire items based on 10 respondents. Most authors
recommend that you should have at least 10 to 20 times as many observations (cases,
respondents) as you have variables; otherwise the estimates of the regression line are probably
very unstable and unlikely to replicate if you were to conduct the study again.
This is a common problem in many correlation analyses. Imagine that you have two predictors
(X variables) of a person's height: (1) weight in pounds and (2) weight in ounces. Obviously, our
two predictors are completely redundant; weight is one and the same variable, regardless of
whether it is measured in pounds or ounces. Trying to decide which one of the two measures is a
better predictor of height would be rather silly; however, this is exactly what you would try to do
if you were to perform a multiple regression analysis with height as the dependent (Y) variable
and the two measures of weight as the independent (X) variables. When there are very many
variables involved, it is often not immediately apparent that this problem exists, and it may only
manifest itself after several variables have already been entered into the regression equation.
Nevertheless, when this problem occurs it means that at least one of the predictor variables is
(practically) completely redundant with other predictors. There are many statistical indicators of
this type of redundancy (tolerances, semi-partial R, etc., as well as some remedies (e.g., Ridge
regression).
Fitting Centered Polynomial Models
The fitting of higher-order polynomials of an independent variable with a mean not equal to zero
can create difficult multicollinearity problems. Specifically, the polynomials will be highly
correlated due to the mean of the primary independent variable. With large numbers (e.g., Julian
dates), this problem is very serious, and if proper protections are not put in place, can cause
wrong results. The solution is to "center" the independent variable (sometimes, this procedures is
referred to as "centered polynomials"), i.e., to subtract the mean, and then to compute the
polynomials. See, for example, the classic text by Neter, Wasserman, & Kutner (1985, Chapter
9), for a detailed discussion of this issue (and analyses with polynomial models in general).
Even though most assumptions of multiple regression cannot be tested explicitly, gross
violations can be detected and should be dealt with appropriately. In particular outliers (i.e.,
extreme cases) can seriously bias the results by "pulling" or "pushing" the regression line in a
particular direction (see the animation below), thereby leading to biased regression coefficients.
Often, excluding just a single extreme case can yield a completely different set of results.
Chapter 11
Role of Statistics
Statistics is the science of collecting, analyzing and making inference from data. Statistics is a particularly
useful branch of mathematics that is not only studied theoretically by advanced mathematicians but one
that is used by researchers in many fields to organize, analyze, and summarize data. Statistical methods
and analyses are often used to communicate research findings and to support hypotheses and give
credibility to research methodology and conclusions. It is important for researchers and also consumers
of research to understand statistics so that they can be informed, evaluate the credibility and usefulness
of information, and make appropriate decisions.
Here is a small set of data: The grades for 15 students. For our purposes, they range from 0
(failing) to 4 (an A), and go up in steps of .2.
John -- 3.0
Mary -- 2.8
George -- 2.8
Beth -- 2.4
Sam -- 3.2
Judy -- 2.8
Fritz -- 1.8
Kate -- 3.8
Dave -- 2.6
Jenny -- 3.4
Mike -- 2.4
Sue -- 4.0
Don -- 3.4
Ellen -- 3.2
Orville -- 2.2
Central tendency
Central tendency refers to the idea that there is one number that best summarizes the entire set of
measurements, a number that is in some way "central" to the set.
The mode. The mode is the measurement that has the greatest frequency, the one you found the
most of. Although it isn't used that much, it is useful when differences are rare or when the
differences are non numerical. The prototypical example of something is usually the mode.
The mode for our example is 3.2. It is the grade with the most people (3).
The median. The median is the number at which half your measurements are more than that
number and half are less than that number. The median is actually a better measure of centrality
than the mean if your data are skewed, meaning lopsided. If, for example, you have a dozen
ordinary folks and one millionaire, the distribution of their wealth would be lopsided towards the
ordinary people, and the millionaire would be an outlier, or highly deviant member of the group.
The millionaire would influence the mean a great deal, making it seem like all the members of
the group are doing quite well. The median would actually be closer to the mean of all the
people other than the millionaire.
The median for our example is 3.0. Half the people scored lower, and half higher (and one
exactly).
The mean. The mean is just the average. It is the sum of all your measurements, divided by the
number of measurements. This is the most used measure of central tendency, because of its
mathematical qualities. It works best if the data is distributed very evenly across the range, or is
distributed in the form of a normal or bell-shaped curve (see below). One interesting thing about
the mean is that it represents the expected value if the distribution of measurements were
random! Here is what the formula looks like:
So 3.0 + 2.8 + 2.8 + 2.4 + 3.2 + 2.8 + 1.8 + 3.8 + 2.6 + 3.4 + 2.4 + 4.0 + 3.4 + 3.2 + 3.2 is 43.8.
Divide that by 15 and that is the mean or average for our example: 2.92.
Statistical dispersion
Dispersion refers to the idea that there is a second number which tells us how "spread out" all the
measurements are from that central number.
The range. The range is the measure from the smallest measurement to the largest one. This is
the simplest measure of statistical dispersion or "spread."
The range for our example is 2.2, the distance from the lowest score, 1.8, to the highest, 4.0.
Interquartile range. A slightly more sophisticated measure is the interquartile range. If you
divide the data into quartiles, meaning that one fourth of the measurements are in quartile 1, one
fourth in 2, one fourth in 3, and one fourth in 4, you will get a number that divides 1 and 2 and a
number that divides 3 and 4. You then measure the distance between those two numbers, which
therefore contains half of the data. Notice that the number between quartile 2 and 3 is the
median!
The interquartile range for example is .9, because the quartiles divide roughly at 2.45 and 3.35.
The reason for the odd dividing lines is because there are 15 pieces of data, which, of course,
cannot be neatly divided into quartiles!
The standard deviation. The standard deviation is the "average" degree to which scores deviate
from the mean. More precisely, you measure how far all your measurements are from the mean,
square each one, and add them all up. The result is called the variance. Take the square root of
the variance, and you have the standard deviation. Like the mean, it is the "expected value" of
how far the scores deviate from the mean. Here is what the formula looks like:
So, subtract the mean from each score and square them and sum: 5.1321. Then divide by 15 and
take the square root and you have the standard deviation for our example: .5849.... One standard
deviation above the mean is at about 3.5; one standard deviation below is at about 2.3.
At its simplest, the central tendency and the measure of dispersion describe a rectangle that is a
summary of the set of data. On a more sophisticated level, these measures describe a curve, such
as the normal curve, that contains the data most efficiently.
This curve, also called the bell-shaped curve, represents a distribution that reflects certain
probabilistic events when extended to an infinite number of measurements. It is an idealized
version of what happens in many large sets of measurements: Most measurements fall in the
middle, and fewer fall at points farther away from the middle. A simple example is height: Very
few people are below 3 feet tall; very few are over 8 feet tall; most of us are somewhere between
5 and 6. The same applies to weight, IQs, and SATs! In the normal curve, the mean, median,
and mode are all the same.
One standard deviation below the mean contains 34.1% of the measures, as does one standard
deviation above the mean. From one to two below contains 13.6%, as does from one to two
above. From two to three standard deviations contains 2.1% on each end. An other way to look
at it: Between one standard deviation below and above, we have 68% of the data; from two
below to two above, we have 95%; from three below to three above, we have 99.7%
Because of its mathematical properties, especially its close ties to probability theory, the normal
curve is often used in statistics, with the assumption that the mean and standard deviation of a set
of measurements define the distribution. Hopefully, it is obvious that this is not at all true for
nearly all cases. The best representation of your measurements is a diagram which includes all
the measurements, not just their mean and standard deviation! Our example above is a clear
example - a normal curve with a mean of 2.92 and a standard deviation of .58 is quite different
from the pattern of the original data. A good real life example is IQ and intelligence: IQ tests
are intentionally scored in such a way that they generate a normal curve, and because IQ tests are
what we use to measure intelligence, we often assume that intelligence is normally distributed,
which is not at all necessarily true!
It would be useful to have a measure of scatter that has the following properties:
1. The measure should be proportional to the scatter of the data (small when the data are
clustered together, and large when the data are widely scattered).
2. The measure should be independent of the number of values in the data set (otherwise,
simply by taking more measurements the value would increase even if the scatter of the
measurements was not increasing).
3. The measure should be independent of the mean (since now we are only interested in the
spread of the data, not its central tendency).
Both the variance and the standard deviation meet these three criteria for normally-distributed
(symmetric, "bell-curve") data sets.
The variance (σ) is a measure of how far each value in the data set is from the mean. Here is how
it is defined:
1. Subtract the mean from each value in the data. This gives you a measure of the distance
of each value from the mean.
2. Square each of these distances (so that they are all positive values), and add all of the
squares together.
3. Divide the sum of the squares by the number of values in the data set.
The standard deviation (σ) is simply the (positive) square root of the variance.
Now that you know how the summation operator works, you can understand the equation that
defines the variance:
The variance (σ), is defined as the sum of the squared distances of each term in the distribution
from the mean (μ), divided by the number of terms in the distribution (N).
There's a more efficient way to calculate the standard deviation for a group of numbers, shown in
the following equation:
You take the sum of the squares of the terms in the distribution, and divide by the number of
terms in the distribution (N). From this, you subtract the square of the mean (μ). It's a lot less
work to calculate the standard deviation this way.
It's easy to prove to yourself that the two equations are equivalent. Start with the definition for
the variance (Equation 1, below). Expand the expression for squaring the distance of a term from
the mean (Equation 2, below).
Now separate the individual terms of the equation (the summation operator distributes over the
terms in parentheses, see Equation 3, above). In the final term, the sum of μ/N, taken N times, is
just Nμ/N.
Next, we can simplify the second and third terms in Equation 3. In the second term, you can see
that ΣX/N is just another way of writing μ, the average of the terms. So the second term
simplifies to −2μ (compare Equations3 and 4, above). In the third term, N/N is equal to 1, so the
third term simplifies to μ (compare Equations 3 and 4, above).
Finally, from Equation 4, you can see that the second and third terms can be combined, giving us
the result we were trying to prove in Equation 5.
As an example, let's go back to the two distributions we started our discussion with:
data set 1: 3, 4, 4, 5, 6, 8
data set 2: 1, 2, 4, 5, 7, 11 .
What are the variance and standard deviation of each data set?
We'll construct a table to calculate the values. You can use a similar table to find the variance
and standard deviation for results from your experiments.
Data Set N ΣX ΣX μ μ σ σ
Although both data sets have the same mean (μ = 5), the variance (σ) of the second data set,
11.00, is a little more than four times the variance of the first data set, 2.67. The standard
deviation (σ) is the square root of the variance, so the standard deviation of the second data set,
3.32, is just over two times the standard deviation of the first data set, 1.63.
The variance and the standard deviation give us a numerical measure of the scatter of a data set.
These measures are useful for making comparisons between data sets that go beyond simple
visual impressions.
Chapter 12
1. Compute from the observations the observed value tobs of the test statistic T.
2. Calculate the p-value. This is the probability, under the null hypothesis, of sampling a test statistic at
least as extreme as that which was observed.
3. Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the p-value is less than
the significance level (the selected probability) threshold.
The two processes are equivalent. The former process was advantageous in the past when only
tables of test statistics at common probability thresholds were available. It allowed a decision to be
made without the calculation of a probability. It was adequate for classwork and for operational use,
but it was deficient for reporting results.
The latter process relied on extensive tables or on computational support not always available. The
explicit calculation of a probability is useful for reporting. The calculations are now trivially performed
with appropriate software.
The difference in the two processes applied to the Radioactive suitcase example (below):
The former report is adequate, the latter gives a more detailed explanation of the data and the
reason why the suitcase is being checked.
It is important to note the difference between accepting the null hypothesis and simply failing to
reject it. The "fail to reject" terminology highlights the fact that the null hypothesis is assumed to be
true from the start of the test; if there is a lack of evidence against it, it simply continues to be
assumed true. The phrase "accept the null hypothesis" may suggest it has been proved simply
because it has not been disproved, a logical fallacy known as the argument from ignorance. Unless
a test with particularly high power is used, the idea of "accepting" the null hypothesis may be
dangerous. Nonetheless the terminology is prevalent throughout statistics, where its meaning is well
understood.
The processes described here are perfectly adequate for computation. They seriously neglect
the design of experimentsconsiderations.
It is particularly critical that appropriate sample sizes be estimated before conducting the experiment.
Interpretation[edit]
If the p-value is less than the required significance level (equivalently, if the observed test statistic is
in the critical region), then we say the null hypothesis is rejected at the given level of significance.
Rejection of the null hypothesis is a conclusion. This is like a "guilty" verdict in a criminal trial: the
evidence is sufficient to reject innocence, thus proving guilt. We might accept the alternative
hypothesis (and the research hypothesis).
If the p-value is not less than the required significance level (equivalently, if the observed test
statistic is outside the critical region), then the test has no result. The evidence is insufficient to
support a conclusion. (This is like a jury that fails to reach a verdict.) The researcher typically gives
extra consideration to those cases where the p-value is close to the significance level.
In the Lady tasting tea example (below), Fisher required the Lady to properly categorize all of the
cups of tea to justify the conclusion that the result was unlikely to result from chance. He defined the
critical region as that case alone. The region was defined by a probability (that the null hypothesis
was correct) of less than 5%.
Whether rejection of the null hypothesis truly justifies acceptance of the research hypothesis
depends on the structure of the hypotheses. Rejecting the hypothesis that a large paw print
originated from a bear does not immediately prove the existence of Bigfoot. Hypothesis testing
emphasizes the rejection, which is based on a probability, rather than the acceptance, which
requires extra steps of logic.
"The probability of rejecting the null hypothesis is a function of five factors: whether the test is one-
or two tailed, the level of significance, the standard deviation, the amount of deviation from the null
hypothesis, and the number of observations."These factors are a source of criticism; factors under
the control of the experimenter/analyst give the results an appearance of subjectivity.
Use and importance[edit]
Statistics are helpful in analyzing most collections of data. This is equally true of hypothesis testing
which can justify conclusions even when no scientific theory exists. In the Lady tasting tea example,
it was "obvious" that no difference existed between (milk poured into tea) and (tea poured into milk).
The data contradicted the "obvious".
Statistical hypothesis testing plays an important role in the whole of statistics and in statistical
inference. For example, Lehmann (1992) in a review of the fundamental paper by Neyman and
Pearson (1933) says: "Nevertheless, despite their shortcomings, the new paradigm formulated in the
1933 paper, and the many developments carried out within its framework continue to play a central
role in both the theory and practice of statistics and can be expected to do so in the foreseeable
future".
Significance testing has been the favored statistical tool in some experimental social sciences (over
90% of articles in the Journal of Applied Psychology during the early 1990s). Other fields have
favored the estimation of parameters (e.g.,effect size). Significance testing is used as a substitute for
the traditional comparison of predicted value and experimental result at the core of the scientific
method. When theory is only capable of predicting the sign of a relationship, a directional (one-
sided) hypothesis test can be configured so that only a statistically significant result supports theory.
This form of theory appraisal is the most heavily criticized application of hypothesis testing.
Cautions[edit]
"If the government required statistical procedures to carry warning labels like those on drugs, most
inference methods would have long labels indeed." This caution applies to hypothesis tests and
alternatives to them.
The successful hypothesis test is associated with a probability and a type-I error rate. The
conclusion might be wrong.
The conclusion of the test is only as solid as the sample upon which it is based. The design of the
experiment is critical. A number of unexpected effects have been observed including:
A statistical analysis of misleading data produces misleading conclusions. The issue of data quality
can be more subtle. Inforecasting for example, there is no agreement on a measure of forecast
accuracy. In the absence of a consensus measurement, no decision based on measurements will be
without controversy.
The book How to Lie with Statistics is the most popular book on statistics ever published. It does not
much consider hypothesis testing, but its cautions are applicable, including: Many claims are made
on the basis of samples too small to convince. If a report does not mention sample size, be doubtful.
Hypothesis testing acts as a filter of statistical conclusions; only those results meeting a probability
threshold are publishable. Economics also acts as a publication filter; only those results favorable to
the author and funding source may be submitted for publication. The impact of filtering on publication
is termed publication bias. A related problem is that ofmultiple testing (sometimes linked to data
mining), in which a variety of tests for a variety of possible effects are applied to a single data set
and only those yielding a significant result are reported. These are often dealt with by using
multiplicity correction procedures that control the family wise error rate (FWER) or the false
discovery rate (FDR).
Those making critical decisions based on the results of a hypothesis test are prudent to look at the
details rather than the conclusion alone. In the physical sciences most results are fully accepted only
when independently confirmed. The general advice concerning statistics is, "Figures never lie, but
liars figure" (anonymous).
t-Test
The t-test assesses whether the means of two groups are statistically different from each other.
This analysis is appropriate whenever you want to compare the means of two groups, and
especially appropriate as the analysis for the posttest-only two-group randomized experimental
design.
Figure 1. Idealized distributions for treated and comparison group posttest values.
Figure 1 shows the distributions for the treated (blue) and control (green) groups in a study.
Actually, the figure shows the idealized distribution -- the actual distribution would usually be
depicted with a histogram or bar graph. The figure indicates where the control and treatment
group means are located. The question the t-test addresses is whether the means are statistically
different.
What does it mean to say that the averages for two groups are statistically different? Consider the
three situations shown in Figure 2. The first thing to notice about the three situations is that the
difference between the means is the same in all three. But, you should also notice that the three
situations don't look the same -- they tell very different stories. The top example shows a case
with moderate variability of scores within each group. The second situation shows the high
variability case. the third shows the case with low variability. Clearly, we would conclude that
the two groups appear most different or distinct in the bottom or low-variability case. Why?
Because there is relatively little overlap between the two bell-shaped curves. In the high
variability case, the group difference appears least striking because the two bell-shaped
distributions overlap so much.
Figure 2. Three scenarios for differences between means.
This leads us to a very important conclusion: when we are looking at the differences between
scores for two groups, we have to judge the difference between their means relative to the spread
or variability of their scores. The t-test does just this.
The top part of the formula is easy to compute -- just find the difference between the means. The
bottom part is called the standard error of the difference. To compute it, we take the variance
for each group and divide it by the number of people in that group. We add these two values and
then take their square root. The specific formula is given in Figure 4:
Figure 4. Formula for the Standard error of the difference between the means.
Remember, that the variance is simply the square of the standard deviation.
The t-value will be positive if the first mean is larger than the second and negative if it is smaller.
Once you compute the t-value you have to look it up in a table of significance to test whether the
ratio is large enough to say that the difference between the groups is not likely to have been a
chance finding. To test the significance, you need to set a risk level (called the alpha level). In
most social research, the "rule of thumb" is to set the alpha level at .05. This means that five
times out of a hundred you would find a statistically significant difference between the means
even if there was none (i.e., by "chance"). You also need to determine the degrees of freedom
(df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups
minus 2. Given the alpha level, the df, and the t-value, you can look the t-value up in a standard
table of significance (available as an appendix in the back of most statistics texts) to determine
whether the t-value is large enough to be significant. If it is, you can conclude that the difference
between the means for the two groups is different (even given the variability). Fortunately,
statistical computer programs routinely print the significance test results and save you the trouble
of looking them up in a table.
Z-test
A Z-test is any statistical test for which the distribution of the test statistic under the null
hypothesis can be approximated by a normal distribution. Because of the central limit theorem, many
test statistics are approximately normally distributed for large samples. For each significance level,
the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more
convenient than the Student's t-test which has separate critical values for each sample size.
Therefore, many statistical tests can be conveniently performed as approximate Z-tests if the sample
size is large or the population variance known. If the population variance is unknown (and therefore
has to be estimated from the sample itself) and the sample size is not large (n < 30), the Student's t-
test may be more appropriate.
If T is a statistic that is approximately normally distributed under the null hypothesis, the next step in
performing a Z-test is to estimate the expected value θ of T under the null hypothesis, and then
obtain an estimate s of the standard deviation ofT. After that the standard score Z = (T − θ) / s is
calculated, from which one-tailed and two-tailed p-values can be calculated as Φ(−Z) (for upper-
tailed tests), Φ(Z) (for lower-tailed tests) and 2Φ(−|Z|) (for two-tailed tests) where Φ is the
standardnormal cumulative distribution function.
Other location tests that can be performed as Z-tests are the two-sample location test and the paired
difference test.
Conditions[edit]
For the Z-test to be applicable, certain conditions must be met.
Nuisance parameters should be known, or estimated with high accuracy (an example of a nuisance
parameter would be the standard deviation in a one-sample location test). Z-tests focus on a single
parameter, and treat all other unknown parameters as being fixed at their true values. In practice,
due to Slutsky's theorem, "plugging in" consistent estimates of nuisance parameters can be justified.
However if the sample size is not large enough for these estimates to be reasonably accurate, the Z-
test may not perform well.
The test statistic should follow a normal distribution. Generally, one appeals to the central limit
theorem to justify assuming that a test statistic varies normally. There is a great deal of statistical
research on the question of when a test statistic varies approximately normally. If the variation of the
test statistic is strongly non-normal, a Z-test should not be used.
In some situations, it is possible to devise a test that properly accounts for the variation in plug-in
estimates of nuisance parameters. In the case of one and two sample location problems, a t-
test does this.
Chi-squared test
A chi-squared test, also referred to as a x test (or chi-square test), is any statistical hypothesis
test wherein the sampling distribution of the test statistic is a chi-square distribution when the null
hypothesis is true. Chi-squared tests are often constructed from a sum of squared errors, or through
the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of
independent normally distributed data, which is valid in many cases due to the central limit theorem.
A chi-squared test can be used to attempt rejection of the null hypothesis that the data are
independent.
Also considered a chi-square test is a test in which this is asymptoticallytrue, meaning that the
sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square
distribution as closely as desired by making the sample size large enough. The chi-squared test is
used to determine whether there is a significant difference between the expected frequencies and
the observed frequencies in one or more categories. Does the number of individuals or objects that
fall in each category differ significantly from the number you would expect? Is this difference
between the expected and observed due to sampling variation, or is it a real difference?
To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts
the formula forPearson's chi-square test by subtracting 0.5 from the difference between each
observed value and its expected value in a 2 × 2 contingency table. This reduces the chi-square
value obtained and thus increases its p-value.
Factor analysis
When trying to explain something by measuring a range of independent variables, Factor
Analysis helps reduce the number of reported variables by determining significant variables and
'combining' these into a a single variable (or 'factor'). It may be used in this way either to
discover factors or to test a hypothesis that they exist.
The determination is statistical, but the output is a 'virtual' or unobserved variable which joins the
measured variables in a linear formula that combines the observed measurements with 'factor
loading' constant numbers.
Factor Analysis also can be used to help demonstrate how a complex measurement instrument is
really measuring one or a few bigger things.
Example
Perhaps the most well-known result of factor analysis is 'IQ', which is a 'virtual' variable based
on variable measurements of ability in mathematics, language and logic.
The variables in many personality assessments, such as 16PF, have been identified using factor
analysis.
Discussion
Factor Analysis originated in psychometrics by Charles Spearman (as in Spearman correlation)
and Raymond Cattell (as in 16PF) with the assessment of intelligence and personality, and has
since spread to many other fields, from marketing to operations research.
Factor Analysis is different to much research, which focuses on the relationships between
independent and dependent variables. In contrast, Factor Analysis focuses on the relationship
between multiple independent variables. 'Factor' basically means 'independent variable', although
in this case the 'factors' are the new 'virtual' variables.
The use of Factor Analysis and its results is often an imprecise science and hence subject to
debate (for example 'IQ' has been heavily criticized). Nevertheless it provides a simplification
process that in practice can be very useful.
Principal Component Analysis is a variant of Factor Analysis and is equivalent when model
'errors' have the same variance. The difference lies in how Principal Component Analysis uses
the total variance in the data and assume linear variable combinations, whilst Common Factor
Analysis uses the common variance in the data and assumes latent variables.
Principal Component Analysis constructs as many components as there are original variables.
The first component takes into account the greatest amount of variance between the variables,
giving the weighting of these variables to form the single component. The second component is
constructed to account for as much variance is left over that is not accounted for by the first
component. The third component accounts for variance not accounted for by the first two
components, and so on.
Normally, the first component is much larger than the rest, with a rapid drop off through the
second and third components. If there is no useful way of reducing the matrix of variable
correlations into a smaller number of factors then all components will be approximately equal.
Eigenvalues
Eigenvalues represent the proportion of variance explained by a given variable. With five
variables, the sum of the eigenvalues will be 5. Sorting the factors by eigenvalue thus results in
the first factor having the greatest importance (explaining the greatest amount of variance).
Eigenvalues can be used to help identify the factors to select and carry forward for future use.
Ways of doing this include:
Orthogonal rotation
Factors selected through eigenvalue analysis may be refined further through orthogonal rotation.
'Orthogonal' factors are metaphorically at 'right angles' to one another, which means they do not
correlate with one another and are so even more independent, making them useful measures.
For example if you were measuring types of intelligence, it would help if, say, creative
intelligence was completely different from mathematical intelligence, so if you were working on
one it would not 'leak' over into another area.
Conjoint Test
When faced with a critical product or pricing decision, conjoint analysis can help. Conjoint is a
powerful advanced research technique that predicts how the market will respond to product or
pricing changes by understanding the trade-offs that customers make during the decision-making
process. The technique can be used to answer the all-important question: What value does the market
place on products or services and their features?
Conjoint analysis allows you to understand your customers' underlying value systems and build a
market model based on those values.
The model allows you to test the potential impact of product improvements and new products on
market demand.
The resulting insights give you a clear competitive edge.
Conjoint analysis helps answer vital product development and pricing questions, such as:
What combination of features should a product or service offer? Which are the most critical
features, and which are less important?
How does price impact demand for the product or service?
How well will a new product or service competes in the market?
Which strategies will maximize market preference across your product portfolio?
Which strategies will minimize cannibalization of existing offerings, especially premium or higher
margin products or services?
How vulnerable are you to competitive response?
Multivariate Test
This technique examines the relationship between several categorical independent variables and
two or more metric dependent variables. Whereas analysis of variance (ANOVA) assesses the
differences between groups (by using T tests for two means and F tests between three or more
means), MANOVA examines the dependence relationship between a set of dependent measures
across a set of groups. Typically this analysis is used in experimental design, and usually a
hypothesized relationship between dependent measures is used. This technique is slightly
different in that the independent variables are categorical and the dependent variable is metric.
Sample size is an issue, with 15-20 observations needed per cell. However, too many
observations per cell (over 30) and the technique loses its practical significance. Cell sizes
should be roughly equal, with the largest cell having less than 1.5 times the observations of the
smallest cell. That is because, in this technique, normality of the dependent variables is
important. The model fit is determined by examining mean vector equivalents across groups. If
there is a significant difference in the means, the null hypothesis can be rejected and treatment
differences can be determined.
Analysis of variance
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences
among group means and their associated procedures (such as "variation" among and between
groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting,
the observed variance in a particular variable is partitioned into components attributable to different
sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not
the means of several groups are equal, and therefore generalizes the t-test to more than two groups.
ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical
significance. It is conceptually similar to multiple two-sample t-tests, but is less conservative (results
in less type I error) and is therefore suited to a wide range of practical problems.
History[edit]
While the analysis of variance reached fruition in the 20th century, antecedents extend centuries into
the past according to Stigler. These include hypothesis testing, the partitioning of sums of squares,
experimental techniques and the additive model. Laplace was performing hypothesis testing in the
1770s. The development of least-squares methods by Laplace and Gauss circa 1800 provided an
improved method of combining observations (over the existing practices of astronomy and geodesy).
It also initiated much study of the contributions to sums of squares. Laplace soon knew how to
estimate a variance from a residual (rather than a total) sum of squares. By 1827 Laplace was using
least squares methods to address ANOVA problems regarding measurements of atmospheric
tides. Before 1800 astronomers had isolated observational errors resulting from reaction times (the
"personal equation") and had developed methods of reducing the errors. The experimental methods
used in the study of the personal equation were later accepted by the emerging field of
psychology which developed strong (full factorial) experimental methods to which randomization
and blinding were soon added. An eloquent non-mathematical explanation of the additive effects
model was available in 1885.
Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The
Correlation Between Relatives on the Supposition of Mendelian Inheritance. His first application of
the analysis of variance was published in 1921. Analysis of variance became widely known after
being included in Fisher's 1925 book Statistical Methods for Research Workers.
Randomization models were developed by several researchers. The first was published in Polish
by Neyman in 1923.
One of the attributes of ANOVA which ensured its early popularity was computational elegance. The
structure of the additive model allows solution for the additive coefficients by simple algebra rather
than by matrix calculations. In the era of mechanical calculators this simplicity was critical. The
determination of statistical significance also required access to tables of the F function which were
supplied by early statistics texts.
Motivating example[edit]
The analysis of variance can be used as an exploratory tool to explain observations. A dog show
provides an example. A dog show is not a random sampling of the breed: it is typically limited to
dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show might
plausibly be rather complex, like the yellow-orange distribution shown in the illustrations. Suppose
we wanted to predict the weight of a dog based on a certain set of characteristics of each dog.
Before we could do that, we would need to explain the distribution of weights by dividing the dog
population into groups based on those characteristics. A successful grouping will split dogs such that
(a) each group has a low variance of dog weights (meaning the group is relatively homogeneous)
and (b) the mean of each group is distinct (if two groups have the same mean, then it isn't
reasonable to conclude that the groups are, in fact, separate in any meaningful way).
In the illustrations to the right, each group is identified as X1, X2, etc. In the first illustration, we divide
the dogs according to the product (interaction) of two binary groupings: young vs old, and short-
haired vs long-haired (thus, group 1 is young, short-haired dogs, group 2 is young, long-haired dogs,
etc.). Since the distributions of dog weight within each of the groups (shown in blue) has a large
variance, and since the means are very close across groups, grouping dogs by these characteristics
does not produce an effective way to explain the variation in dog weights: knowing which group a
dog is in does not allow us to make any reasonable statements as to what that dog's weight is likely
to be. Thus, this grouping fails to fitthe distribution we are trying to explain (yellow-orange).
An attempt to explain the weight distribution by grouping dogs as (pet vs working breed) and (less
athletic vs more athletic) would probably be somewhat more successful (fair fit). The heaviest show
dogs are likely to be big strong working breeds, while breeds kept as pets tend to be smaller and
thus lighter. As shown by the second illustration, the distributions have variances that are
considerably smaller than in the first case, and the means are more reasonably distinguishable.
However, the significant overlap of distributions, for example, means that we cannot reliably say
that X1 and X2 are truly distinct (i.e., it is perhaps reasonably likely that splitting dogs according to the
flip of a coin—by pure chance—might produce distributions that look similar).
An attempt to explain weight by breed is likely to produce a very good fit. All Chihuahuas are light
and all St Bernards are heavy. The difference in weights between Setters and Pointers does not
justify separate breeds. The analysis of variance provides the formal tools to justify these intuitive
judgments. A common use of the method is the analysis of experimental data or the development of
models. The method has some advantages over correlation: not all of the data must be numeric and
one result of the method is a judgment in the confidence in an explanatory relationship.
By construction, hypothesis testing limits the rate of Type I errors (false positives) to a significance
level. Experimenters also wish to limit Type II errors (false negatives). The rate of Type II errors
depends largely on sample size (the rate will increase for small numbers of samples), significance
level (when the standard of proof is high, the chances of overlooking a discovery are also high)
and effect size (a smaller effect size is more prone to Type II error).
The terminology of ANOVA is largely from the statistical design of experiments. The experimenter
adjusts factors and measures responses in an attempt to determine an effect. Factors are assigned
to experimental units by a combination of randomization and blocking to ensure the validity of the
results. Blinding keeps the weighing impartial. Responses show a variability that is partially the result
of the effect and is partially random error.
ANOVA is the synthesis of several ideas and it is used for multiple purposes. As a consequence, it is
difficult to define concisely or precisely.
In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for
the observed data.
Additionally:
As a result: ANOVA "has long enjoyed the status of being the most used (some would say abused)
statistical technique in psychological research." ANOVA "is probably the most useful technique in
the field of statistical inference."
ANOVA is difficult to teach, particularly for complex experiments, with split-plot designs being
notorious. In some cases the proper application of the method is best determined by problem pattern
recognition followed by the consultation of a classic authoritative test.
Design-of-experiments terms[edit]
(Condensed from the NIST Engineering Statistics handbook: Section 5.7. A Glossary of DOE
Terminology.)
Balanced design
An experimental design where all cells (i.e. treatment combinations) have the same number of
observations.
Blocking
A schedule for conducting treatment combinations in an experimental study such that any effects on
the experimental results due to a known change in raw materials, operators, machines, etc., become
concentrated in the levels of the blocking variable. The reason for blocking is to isolate a systematic
effect and prevent it from obscuring the main effects. Blocking is achieved by restricting
randomization.
Design
A set of experimental runs which allows the fit of a particular model and the estimate of effects.
DOE
Design of experiments. An approach to problem solving involving collection of data that will support
valid, defensible, and supportable conclusions.
Effect
How changing the settings of a factor changes the response. The effect of a single factor is also
called a main effect.
Error
Unexplained variation in a collection of observations. DOE's typically require understanding of both
random error and lack of fit error.
Experimental unit
The entity to which a specific treatment combination is applied.
Factors
Process inputs an investigator manipulates to cause a change in the output.
Lack-of-fit error
Error that occurs when the analysis omits one or more important terms or factors from the process
model. Including replication in a DOE allows separation of experimental error into its components:
lack of fit and random (pure) error.
Model
Mathematical relationship which relates changes in a given response to changes in one or more
factors.
Random error
Error that occurs due to natural variation in the process. Random error is typically assumed to be
normally distributed with zero mean and a constant variance. Random error is also called
experimental error.
Randomization
A schedule for allocating treatment material and for conducting treatment combinations in a DOE
such that the conditions in one run neither depend on the conditions of the previous run nor predict
the conditions in the subsequent runs.
Replication
Performing the same treatment combination more than once. Including replication allows an
estimate of the random error independent of any lack of fit error.
Responses
The output(s) of a process. Sometimes called dependent variable(s).
Treatment
A treatment is a specific combination of factor levels whose effect is to be compared with other
treatments.
Classes of models There are three classes of models used in the analysis of variance, and these are
outlined here.
Fixed-effects models[edit]
Main article: Fixed effects model
The fixed-effects model (class I) of analysis of variance applies to situations in which the
experimenter applies one or more treatments to the subjects of the experiment to see whether
the response variable values change. This allows the experimenter to estimate the ranges of
response variable values that the treatment would generate in the population as a whole.
Random-effects models[edit]
Main article: Random effects model
Random effects model (class II) is used when the treatments are not fixed. This occurs when the
various factor levels are sampled from a larger population. Because the levels themselves
are random variables, some assumptions and the method of contrasting the treatments (a multi-
variable generalization of simple differences) differ from the fixed-effects model.
Mixed-effects models[edit]
Main article: Mixed model
A mixed-effects model (class III) contains experimental factors of both fixed and random-effects
types, with appropriately different interpretations and analysis for the two types.
Defining fixed and random effects has proven elusive, with competing definitions arguably leading
toward a linguistic quagmire.
Assumptions of ANOVA[edit]
The analysis of variance has been studied from several approaches, the most common of which
uses a linear model that relates the response to the treatments and blocks. Note that the model is
linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is
balanced across factors but much deeper understanding is needed for unbalanced data.
Correlation
The correlation is one of the most common and most useful statistics. A correlation is a single number
that describes the degree of relationship between two variables. Let's work through an example to show
you how this statistic is computed.
Regression
Regression analysis is used when you want to predict a continuous dependent variable
from a number of independent variables. If the dependent variable is dichotomous,
then logistic regression should be used. (If the split between the two levels of the
dependent variable is close to 50-50, then both logistic and linear regression will end
up giving you similar results.) The independent variables used in regression can be
either continuous or dichotomous. Independent variables with more than two levels
can also be used in regression analyses, but they first must be converted into variables
that have only two levels. This is called dummy coding and will be discussed later.
Usually, regression analysis is used with naturally-occurring variables, as opposed to
experimentally manipulated variables, although you can use regression with
experimentally manipulated variables. One point to keep in mind with regression
analysis is that causal relationships among the variables cannot be determined. While
the terminology is such that we say that X "predicts" Y, we cannot say that X "causes"
Y.
Assumptions of regression
Number of cases
When doing regression, the cases-to-Independent Variables (IVs) ratio should ideally
be 20:1; that is 20 cases for every IV in the model. The lowest your ratio should be is
5:1 (i.e., 5 cases for every IV in the model).
Accuracy of data
If you have entered the data (rather than using an established dataset), it is a good idea
to check the accuracy of the data entry. If you don't want to re-check each data point,
you should at least check the minimum and maximum value for each variable to
ensure that all values for each variable are "valid." For example, a variable that is
measured using a 1 to 5 scale should not have a value of 8.
Missing data
You also want to look for missing data. If specific variables have a lot of missing
values, you may decide not to include those variables in your analyses. If only a few
cases have any missing values, then you might want to delete those cases. If there are
missing values for several cases on different variables, then you probably don't want
to delete those cases (because a lot of your data will be lost). If there are not too much
missing data, and there does not seem to be any pattern in terms of what is missing,
then you don't really need to worry. Just run your regression, and any cases that do not
have values for the variables used in that regression will not be included. Although
tempting, do not assume that there is no pattern; check for this. To do this, separate
the dataset into two groups: those cases missing values for a certain variable, and
those not missing a value for that variable. Using t-tests, you can determine if the two
groups differ on other variables included in the sample. For example, you might find
that the cases that are missing values for the "salary" variable are younger than those
cases that have values for salary. You would want to do t-tests for each variable with a
lot of missing values. If there is a systematic difference between the two groups (i.e.,
the group missing values vs. the group not missing values), then you would need to
keep this in mind when interpreting your findings and not overgeneralize.
After examining your data, you may decide that you want to replace the missing
values with some other value. The easiest thing to use as the replacement value is the
mean of this variable. Some statistics programs have an option within regression
where you can replace the missing value with the mean. Alternatively, you may want
to substitute a group mean (e.g., the mean for females) rather than the overall mean.
The default option of statistics packages is to exclude cases that are missing values for
any variable that is included in regression. (But that case could be included in another
regression, as long as it was not missing values on any of the variables included in
that analysis.) You can change this option so that your regression analysis does not
exclude cases that are missing data for any variable included in the regression, but
then you might have a different number of cases for each variable.
Outliers
You also need to check your data for outliers (i.e., an extreme value on a particular
item) An outlier is often operationally defined as a value that is at least 3 standard
deviations above or below the mean. If you feel that the cases that produced the
outliers are not part of the same "population" as the other cases, then you might just
want to delete those cases. Alternatively, you might want to count those extreme
values as "missing," but retain the case for other variables. Alternatively, you could
retain the outlier, but reduce how extreme it is. Specifically, you might want to recode
the value so that it is the highest (or lowest) non-outlier value.
Chapter 13
The software name originally stood for Statistical Package for the Social
Sciences (SPSS), reflecting the original market, although the software is now popular in other fields
as well, including the health sciences and marketing.
Overview[edit]
SPSS is a widely used program for statistical analysis in social science. It is also used by market
researchers, health researchers, survey companies, government, education researchers, marketing
organizations, data miners,and others. The original SPSS manual (Nie, Bent & Hull, 1970) has been
described as one of "sociology's most influential books" for allowing ordinary researchers to do their
own statistical analysis. In addition to statistical analysis, data management (case selection, file
reshaping, creating derived data) and data documentation (a metadata dictionary was stored in
the datafile) are features of the base software.
The many features of SPSS Statistics are accessible via pull-down menus or can be programmed
with a proprietary 4GLcommand syntax language. Command syntax programming has the benefits
of reproducibility, simplifying repetitive tasks, and handling complex data manipulations and
analyses. Additionally, some complex applications can only be programmed in syntax and are not
accessible through the menu structure. The pull-down menu interface also generates command
syntax: this can be displayed in the output, although the default settings have to be changed to make
the syntax visible to the user. They can also be pasted into a syntax file using the "paste" button
present in each menu. Programs can be run interactively or unattended, using the supplied
Production Job Facility.
The graphical user interface has two views which can be toggled by clicking on one of the two tabs
in the bottom left of the SPSS Statistics window. The 'Data View' shows a spreadsheet view of the
cases (rows) and variables (columns). Unlike spreadsheets, the data cells can only contain numbers
or text, and formulas cannot be stored in these cells. The 'Variable View' displays the metadata
dictionary where each row represents a variable and shows the variable name, variable label, value
label(s), print width, measurement type, and a variety of other characteristics. Cells in both views
can be manually edited, defining the file structure and allowing data entry without using command
syntax. This may be sufficient for small datasets. Larger datasets such as statistical surveys are
more often created in data entry software, or entered duringcomputer-assisted personal interviewing,
by scanning and using optical character recognition and optical mark recognitionsoftware, or by
direct capture from online questionnaires. These datasets are then read into SPSS.
SPSS Statistics can read and write data from ASCII text files (including hierarchical files), other
statistics packages,spreadsheets and databases. SPSS Statistics can read and write to
external relational database tables via ODBC and SQL.
Statistical output is to a proprietary file format (*.spv file, supporting pivot tables) for which, in
addition to the in-package viewer, a stand-alone reader can be downloaded. The proprietary output
can be exported to text or Microsoft Word, PDF, Excel, and other formats. Alternatively, output can
be captured as data (using the OMS command), as text, tab-delimited text, PDF, XLS, HTML, XML,
SPSS dataset or a variety of graphic image formats (JPEG, PNG, BMP and EMF).
SPSS Statistics Server is a version of SPSS Statistics with a client/server architecture. It had some
features not available in the desktop version, such asscoring functions. (Scoring functions are
included in the desktop version from version 19.)
SPSS Statistics versions 16.0 and later run under Windows, Mac, and Linux. The graphical user
interface is written in Java. The Mac OS version is provided as a Universal binary, making it fully
compatible with both PowerPC and Intel-based Mac hardware.
Prior to SPSS 16.0, different versions of SPSS were available for Windows, Mac OS X and Unix.
The Windows version was updated more frequently and had more features than the versions for
other operating systems.
SPSS Statistics version 13.0 for Mac OS X was not compatible with Intel-based Macintosh
computers, due to the Rosetta emulation software causing errors in calculations. SPSS Statistics
15.0 for Windows needed a downloadable hotfix to be installed in order to be compatible
with Windows Vista.
SPSS Inc announced on July 28, 2009 that it was being acquired by IBM for US$1.2 billion. Because
of a dispute about ownership of the name "SPSS", between 2009 and 2010, the product was
referred to as PASW (Predictive Analytics SoftWare). As of January 2010, it became "SPSS: An IBM
Company". Complete transfer of business to IBM was done by October 1, 2010. By that date, SPSS:
An IBM Company ceased to exist. IBM SPSS is now fully integrated into the IBM Corporation, and is
one of the brands under IBM Software Group's Business Analytics Portfolio, together with IBM
Algorithmics, IBM Cognos and IBM OpenPages.
Microsoft Excel 2000 (version 9) provides a set of data analysis tools called the Analysis
ToolPak which you can use to save steps when you develop complex statistical analyses. You
provide the data and parameters for each analysis; the tool uses the appropriate statistical macro
functions and then displays the results in an output table. Some tools generate charts in addition
to output tables.
If the Data Analysis command is selectable on the Tools menu, then the Analysis ToolPak is
installed on your system. However, if the Data Analysis command is not on the Tools menu, you
need to install the Analysis ToolPak by doing the following:
Step 1: On the Tools menu, click Add-Ins.... If Analysis ToolPak is not listed in the Add-Ins
dialog box, click Browse and locate the drive, folder name, and file name for the Analysis
ToolPak Add-in — Analys32.xll — usually located in the Program Files\Microsoft Office\
Office\Library\Analysis folder. Once you find the file, select it and click OK.
Step 2: If you don't find the Analys32.xll file, then you must install it.
1. Insert your Microsoft Office 2000 Disk 1 into the CD ROM drive.
2. Select Run from the Windows Start menu.
3. Browse and select the drive for your CD. Select Setup.exe, click Open, and click
OK.
4. Click the Add or Remove Features button.
5. Click the + next to Microsoft Excel for Windows.
6. Click the + next to Add-ins.
7. Click the down arrow next to Analysis ToolPak.
8. Select Run from My Computer.
9. Select the Update Now button.
10. Excel will now update your system to include Analysis ToolPak.
11. Launch Excel.
12. On the Tools menu, click Add-Ins... - and select the Analysis ToolPak check box.
Step 3: The Analysis ToolPak Add-In is now installed and Data Analysis... will now be
selectable on the Tools menu.
Microsoft Excel is a powerful spreadsheet package available for Microsoft Windows and the
Apple Macintosh. Spreadsheet software is used to store information in columns and rows which
can then be organized and/or processed. Spreadsheets are designed to work well with numbers
but often include text. Excel organizes your work into workbooks; each workbook can contain
many worksheets; worksheets are used to list and analyze data .
Excel is available on all public-access PCs (i.e., those, e.g., in the Library and PC Labs). It can
be opened either by selecting Start - Programs - Microsoft Excel or by clicking on the Excel
Short Cut which is either on your desktop, or on any PC, or on the Office Tool bar.
Opening a Document:
To save your document with its current filename, location and file format either click on File -
Save. If you are saving for the first time, click File-Save; choose/type a name for your document;
then click OK. Also use File-Save if you want to save to a different filename/location.
When you have finished working on a document you should close it. Go to the File menu and
click on Close. If you have made any changes since the file was last saved, you will be asked if
you wish to save them.
When you start Excel, a blank worksheet is displayed which consists of a multiple grid of cells
with numbered rows down the page and alphabetically-titled columns across the page. Each cell
is referenced by its coordinates (e.g., A3 is used to refer to the cell in column A and row 3;
B10:B20 is used to refer to the range of cells in column B and rows 10 through 20).
Your work is stored in an Excel file called a workbook. Each workbook may contain several
worksheets and/or charts - the current worksheet is called the active sheet. To view a different
worksheet in a workbook click the appropriate Sheet Tab.
You can access and execute commands directly from the main menu or you can point to one of
the toolbar buttons (the display box that appears below the button, when you place the cursor
over it, indicates the name/action of the button) and click once.
It is important to be able to move around the worksheet effectively because you can only enter or
change data at the position of the cursor. You can move the cursor by using the arrow keys or by
moving the mouse to the required cell and clicking. Once selected the cell becomes the active
cell and is identified by a thick border; only one cell can be active at a time.
To move from one worksheet to another click the sheet tabs. (If your workbook contains many
sheets, right-click the tab scrolling buttons then click the sheet you want.) The name of the active
sheet is shown in bold.
To move between cells on a worksheet, click any cell or use the arrow keys. To see a different
area of the sheet, use the scroll bars and click on the arrows or the area above/below the scroll
box in either the vertical or horizontal scroll bars.
Note that the size of a scroll box indicates the proportional amount of the used area of the sheet
that is visible in the window. The position of a scroll box indicates the relative location of the
visible area within the worksheet.
Entering Data
A new worksheet is a grid of rows and columns. The rows are labeled with numbers, and the
columns are labeled with letters. Each intersection of a row and a column is a cell. Each cell has
an address, which is the column letter and the row number. The arrow on the worksheet to the
right points to cell A1, which is currently highlighted, indicating that it is an active cell. A cell
must be active to enter information into it. To highlight (select) a cell, click on it.
Click on a cell (e.g. A1), then hold the shift key while you click on another (e.g.
D4) to select all cells between and including A1 and D4.
Click on a cell (e.g. A1) and drag the mouse across the desired range, unclicking
on another cell (e.g. D4) to select all cells between and including A1 and D4.
To select several cells which are not adjacent, press "control" and click on the
cells you want to select. Click a number or letter labeling a row or column to
select that entire row or column.
One worksheet can have up to 256 columns and 65,536 rows, so it'll be a while before you run
out of space.
Note that as you type information into the cell, the information you enter also displays in the
formula bar. You can also enter information into the formula bar, and the information will appear
in the selected cell.
Press "Enter" to move to the next cell below (in this case, A2)
Press "Tab" to move to the next cell to the right (in this case, B1)
Click in any cell to select it
Entering Labels
Unless the information you enter is formatted as a value or a formula, Excel will interpret it as a
label, and defaults to align the text on the left side of the cell.
If you are creating a long worksheet and you will be repeating the same label information in
many different cells, you can use the AutoComplete function. This function will look at other
entries in the same column and attempt to match a previous entry with your current entry. For
example, if you have already typed "Wesleyan" in another cell and you type "W" in a new cell,
Excel will automatically enter "Wesleyan." If you intended to type "Wesleyan" into the cell, your
task is done, and you can move on to the next cell. If you intended to type something else, e.g.
"Williams," into the cell, just continue typing to enter the term.
To turn on the AutoComplete funtion, click on "Tools" in the menu bar, then select "Options,"
then select "Edit," and click to put a check in the box beside "Enable AutoComplete for cell
values."
Another way to quickly enter repeated labels is to use the Pick List feature. Right click on a cell,
then select "Pick From List." This will give you a menu of all other entries in cells in that
column. Click on an item in the menu to enter it into the currently selected cell.
Entering Values
A value is a number, date, or time, plus a few symbols if necessary to further define the numbers
[such as: . , + - ( ) % $ / ].
Numbers are assumed to be positive; to enter a negative number, use a minus sign "-" or enclose
the number in parentheses "()".
Dates are stored as MM/DD/YYYY, but you do not have to enter it precisely in that format. If
you enter "jan 9" or "jan-9", Excel will recognize it at January 9 of the current year, and store it
as 1/9/2002. Enter the four-digit year for a year other than the current year (e.g. "jan 9, 1999").
To enter the current day's date, press "control" and ";" at the same time.
Times default to a 24 hour clock. Use "a" or "p" to indicate "am" or "pm" if you use a 12 hour
clock (e.g. "8:30 p" is interpreted as 8:30 PM). To enter the current time, press "control" and ":"
(shift-semicolon) at the same time.
An entry interpreted as a value (number, date, or time) is aligned to the right side of the cell, to
reformat a value.
Rounding Numbers that Meet Specified Criteria: To apply colors to maximum and/or
minimum values:
1. Select a cell in the region, and press Ctrl+Shift+* (in Excel 2003, press this or
Ctrl+A) to select the Current Region.
2. From the Format menu, select Conditional Formatting.
3. In Condition 1, select Formula Is, and type =MAX($F:$F) =$F1.
4. Click Format, select the Font tab, select a color, and then click OK.
5. In Condition 2, select Formula Is, and type =MIN($F:$F) =$F1.
6. Repeat step 4, select a different color than you selected for Condition 1, and then
click OK.
Note: Be sure to distinguish between absolute reference and relative reference when entering the
formulas.
Solution: Use the IF, MOD, and ROUND functions in the following formula:
=IF(MOD(A2,1)=0.5,A2,ROUND(A2,0))
1. Select the cells in the sheet by pressing Ctrl+A (in Excel 2003, select a cell in a
blank area before pressing Ctrl+A, or from a selected cell in a Current Region/List
range, press Ctrl+A+A).
OR
Click Select All at the top-left intersection of rows and columns.
2. Press Ctrl+C.
3. Press Ctrl+Page Down to select another sheet, then select cell A1.
4. Press Enter.
Copying the entire sheet means copying the cells, the page setup parameters, and the defined
range Names.
Option 1:
Option 2:
Option 3:
Sorting by Columns
The default setting for sorting in Ascending or Descending order is by row. To sort by columns:
Descriptive Statistics
The Data Analysis ToolPak has a Descriptive Statistics tool that provides you with an easy way
to calculate summary statistics for a set of sample data. Summary statistics includes Mean,
Standard Error, Median, Mode, Standard Deviation, Variance, Kurtosis, Skewness, Range,
Minimum, Maximum, Sum, and Count. This tool eliminates the need to type indivividual
functions to find each of these results. Excel includes elaborate and customisable toolbars, for
example the "standard" toolbar shown here:
Excel can be used to generate measures of location and variability for a variable. Suppose we
wish to find descriptive statistics for a sample data: 2, 4, 6, and 8.
Step 1. Select the Tools *pull-down menu, if you see data analysis, click on this option,
otherwise, click on add-in.. option to install analysis tool pak.
Enter A1:A4 in the input range box, A1 is a value in column A and row 1, in this case this
value is 2. Using the same technique enter other VALUES until you reach the last one. If a
sample consists of 20 numbers, you can select for example A1, A2, A3, etc. as the input range.
Step 5. Select an output range, in this case B1. Click on summary statistics to see the results.
Select OK.
When you click OK, you will see the result in the selected range.
As you will see, the mean of the sample is 5, the median is 5, the standard deviation is 2.581989,
the sample variance is 6.666667,the range is 6 and so on. Each of these factors might be
important in your calculation
of different statistical procedures.
Normal Distribution
Consider the problem of finding the probability of getting less than a certain value under any
normal probability distribution. As an illustrative example, let us suppose the SAT scores
nationwide are normally distributed with a mean and standard deviation of 500 and 100,
respectively. Answer the following questions based on the given information:
A: What is the probability that a randomly selected student score will be less than 600 points?
B: What is the probability that a randomly selected student score will exceed 600 points?
C: What is the probability that a randomly selected student score will be between 400 and 600?
Hint: Using Excel you can find the probability of getting a value approximately less than or
equal to a given value. In a problem, when the mean and the standard deviation of the population
are given, you have to use common sense to find different probabilities based on the question
since you know the area under a normal curve is 1.
Solution:
In the work sheet, select the cell where you want the answer to appear. Suppose, you chose cell
number one, A1. From the menus, select "insert pull-down".
Steps 2-3 From the menus, select insert, then click on the Function option.
Step 4. After clicking on the Function option, the Paste Function dialog appears from Function
Category. Choose Statistical then NORMDIST from the Function Name box; Click OK
Step 5. After clicking on OK, the NORMDIST distribution box appears:
i. Enter 600 in X (the value box);
ii. Enter 500 in the Mean box;
iii. Enter 100 in the Standard deviation box;
iv. Type "true" in the cumulative box, then click OK.
As you see the value 0.84134474 appears in A1, indicating the probability that a randomly
selected student's score is below 600 points. Using common sense we can answer part "b" by
subtracting 0.84134474 from 1. So the part "b" answer is 1- 0.8413474 or 0.158653. This is the
probability that a randomly selected student's score is greater than 600 points. To answer part
"c", use the same techniques to find the probabilities or area in the left sides of values 600 and
400. Since these areas or probabilities overlap each other to answer the question you should
subtract the smaller probability from the larger probability. The answer equals 0.84134474 -
0.15865526 that is, 0.68269. The screen shot should look like following:
Inverse Case
Calculating the value of a random variable often called the "x" value
You can use NORMINV from the function box to calculate a value for the random variable - if
the probability to the left side of this variable is given. Actually, you should use this function to
calculate different percentiles. In this problem one could ask what is the score of a student whose
percentile is 90? This means approximately 90% of students scores are less than this number. On
the other hand if we were asked to do this problem by hand, we would have had to calculate the
x value using the normal distribution formula x = m + zd. Now let's use Excel to calculate P90.
In the Paste function, dialog click on statistical, then click on NORMINV. The screen shot
would look like the following:
At the end of this screen you will see the formula result which is approximately 628 points. This
means the top 10% of the students scored better than 628.
Suppose we wish for estimating a confidence interval for the mean of a population. Depending
on the size of your sample size you may use one of the following cases:
Large Sample Size (n is larger than, say 30):
The general formula for developing a confidence interval for a population means is:
In this formula is the mean of the sample; Z is the interval coefficient, which can be found
from the normal distribution table (for example the interval coefficient for a 95% confidence
level is 1.96). S is the standard deviation of the sample and n is the sample size.
Now we would like to show how Excel is used to develop a certain confidence interval of a
population mean based on a sample information. As you see in order to evaluate this formula you
need "the mean of the sample" and the margin of error Excel will automatically
calculate these quantities for you.
add the margin of error to the mean of the sample, ; Find the upper limit of the
interval and subtract the margin of error from the mean to the lower limit of the interval. To
demonstrate how Excel finds these quantities we will use the data set, which contains the hourly
income of 36 work-study students here, at the University of Baltimore. These numbers appear in
cells A1 to A36 on an Excel work sheet.
After entering the data, we followed the descriptive statistic procedure to calculate the unknown
quantities. The only additional step is to click on the confidence interval in the descriptive
statistics dialog box and enter the given confidence level, in this case 95%.
On the descriptive statistics dialog, click on Summary Statistic. After you have done that, click
on the confidence interval level and type 95% - or in other problems whatever confidence
interval you desire. In the Output Range box enter B1 or what ever location you desire.
Now click on OK. The screen shot would look like the following:
As you see, the spreadsheet shows that the mean of the sample is = 6.902777778 and the
absolute value of the margin of error = 0.231678109. This mean is based on this
sample information. A 95% confidence interval for the hourly income of the UB work-study
students has an upper limit of 6.902777778 + 0.231678109 and a lower limit of 6.902777778 -
0.231678109.
On the other hand, we can say that of all the intervals formed this way 95% contains the mean of
the population. Or, for practical purposes, we can be 95% confident that the mean of the
population is between 6.902777778 - 0.231678109 and 6.902777778 + 0.231678109. We can be
at least 95% confident that interval [$6.68 and $7.13] contains the average hourly income of a
work-study student.
Smal Sample Size (say less than 30) If the sample n is less than 30 or we must use the small
sample procedure to develop a confidence interval for the mean of a population. The general
formula for developing confidence intervals for the population mean based on small a sample is:
In this formula is the mean of the sample. is the interval coefficient providing an area of
in the upper tail of a t distribution with n-1 degrees of freedom which can be found from a t
distribution table (for example the interval coefficient for a 90% confidence level is 1.833 if the
sample is 10). S is the standard deviation of the sample and n is the sample size.
Now you would like to see how Excel is used to develop a certain confidence interval of a
population mean based on this small sample information.
As you see, to evaluate this formula you need "the mean of the sample" and the margin of
error Excel will automatically calculate these quantities the way it did for large
samples.
Again, the only things you have to do are: add the margin of error to the mean of
the sample, , find the upper limit of the interval and to subtract the margin of error from the
mean to find the lower limit of the interval.
To demonstrate how Excel finds these quantities we will use the data set, which contains the
hourly incomes of 10 work-study students here, at the University of Baltimore. These numbers
appear in cells A1 to A10 on an Excel work sheet.
After entering the data we follow the descriptive statistic procedure to calculate the unknown
quantities (exactly the way we found quantities for large sample). Here you are with the
procedures in step-by-step form:
Preface
A common misconception held by many is that O.R. is a collection of mathematical tools. While it is true
that it uses a variety of mathematical techniques, operations research has a much broader scope. It is in
fact a systematic approach to solving problems, which uses one or more analytical tools in the process of
analysis. Perhaps the single biggest problem with O.R. is its name; to a layperson, the term "operations
research" does not conjure up any sort of meaningful image! This is an unfortunate consequence of the
fact that the name that A. P. Rowe is credited with first assigning to the field was somehow never altered
to something that is more indicative of the things that O.R. actually does. Sometimes O.R. is referred to as
Management Science (M.S.) in order to better reflect its role as a scientific approach to solving
management problems, but it appears that this terminology is more popular with business professionals
and people still quibble about the differences between O.R. and M.S.
The tools of operations research are not from any one discipline, rather Mathematics, Statistics,
Economics, Engineering, Psychology, etc. have contributed to this newer discipline of knowledge. Today,
it has become a professional discipline that deals with the application of scientific methods for decision-
making, and especially to the allocation of scare resources.
Employing techniques from other mathematical sciences, such as mathematical modeling, statistical
analysis, and mathematical optimization, operations research arrives at optimal or near-optimal solutions
to complex decision-making problems. Because of its emphasis on human-technology interaction and
because of its focus on practical applications, operations research has overlap with other disciplines,
notably industrial engineering and operations management, and draws on psychology and organization
science. Operations research is often concerned with determining the maximum (of profit, performance,
or yield) or minimum (of loss, risk, or cost) of some real-world objective. Originating in military efforts
before World War II, its techniques have grown to concern problems in a variety of industries.