zy
zyxwv
zyxwvutsr
Probability and the Unit
Square
ALAN M. SYKES
Keywords: Teaching; Sample space; Distribution; Probability density function
Introduction
The mathematical structure of probability distributions that underlies Statistics is
quite complex. Students are bewildered by terms such as pdf and cdf, mean and
variance, and the relationships between the terms. Carefully chosen exercises help,
and it is particularly instructive if a variety of such exercises have a common starting
point which is easy to understand.
“Choosing a point at random in a square” is a simple, intuitive statistical
experiment. With micro-computers in the classroom it can be demonstrated easily.
We show in the succeeding sections of the paper just how much of the elementary
theory of probability distributions can be explored from within this basic
framework. At the same time, difficulties experienced by students are highlighted in
the hope that readers may respond with suggestions for overcoming them!
The Unit Square as a Sample Space
Consider the square with vertices at the points (0,O) (1,O) (0,l) and (1,l). A point
chosen at random from the square has (random) co-ordinates X , Y. By random we
understand that the probability of the random point lying in a particular region of
the square is proportional to the area of the region (and hence, since the total area
equals 1 , is equal to the area).
(030)
zyxwv
(LO)
Why use such a sample space? If you require motivation for its study, why not
consider the old fair-ground game of rolling a coin of diameter d onto a plane of
squares of side l? Alternatively, consider the meeting of friends problem whereby
two friends agree to meet at a particular coffee house for lunch, arriving at random
within the lunch hour and they agree to wait only 15 minutes. Both contexts pose
obvious interesting questions, and in each case the unit square is the correct sample
space.
49
zyxwvu
zyx
zyx
zyxwv
zyxw
zyxw
zyxwvut
zyxwvutsr
Probabilities and Random Variables
Our sample space immediately provides two random variables, Xand Y. As a first
step students must answer probability questions about X and Y by determining
appropriate regions of the unit square and calculating areas.
The calculation P(X< f)(answer )) involves realizing that the event X < f consists
of all points in the square to the left of the line x=). Note that, as with many
situations in statistics, functional relationships are used inversely. (The random
variable Xis strictly a function which maps a point (x,y) onto x, but what is really
important is to be able to use this function in reverse, and determine the regions of
points (x,y) such that X(x,y)<f.) I suspect this aspect of functions receives
insufficient attention in A-level Mathematics courses, at least from a statistical point
of view.
A typical set of exercises might be
(i) P(X< +)
(ii) P( Y > 2)
(iii) P(X < ) AND Y > 2)
Students find the answers without too much difficulty and may be prompted to
enquire whether the fact that answer (iii) = answer (ii) x answer (i) is a freak or not.
(Hence to independence of random variables.)
The great thing about random variables is that they breed well! The usual
arithmetical processes can be applied to random variables on the same sample space
to give birth to more. Here are some examples and some suggested calculations, each
of which should be accompanied by a quick sketch of the appropriate region of the
sample space.
P(S < +) = $
(a) S = X + Y
(b) D = X - Y
P(D < i)= 1
(c) M = X Y
P(M < f) = + f log,2
(d) R = Y/X
P(R < 2) = 1
(e) L = min ( X , Y )
P(L < +)
(f) U = max ( X , Y )
P(U <+)=$
Each of these examples seems to cause difficulties. A general problem is the
reluctance to deal with inequalities. Despite answering (i), (ii) and (iii) above,
students baulk at determining the region such that X+ Y < ) and need to be
persuaded to look at the boundary X + Y = f and infer the region from that. Whilst
this may be successful, applying the same technique to (b) often leads to an incorrect
region! (I suggest a check by inserting a point into the inequality.)
Question (c)causes problems for similar reasons and furthermore, the calculation
of the area involves splitting into separate parts, corresponding to the area of the
reactangle X<+,and the area under the hyperbola y = 1/2x from 3 to 1.
Question (d) should be easier than (c) but is not, because of a reluctance to
manipulate inequalities to turn Y / X < + into Y < +X.
The last two questions appear to present particular difficulties, probably because
of lack of familiarity with the ideas of maximum and minimum as a binary
+
=a
zyx
50
zyxwvuts
zyxwvu
zy
zy
zyx
zyxwv
operation. They are, however, particularly fruitful questions to ask as they link
directly with the basic axiom of addition of probabilities. The event P ( L < i ) is
clearly (?) equal to P(X< OR Y < t)whilst P( U < +) = P(X< AND Y < Hence
(f) can be answered easily using the independence of X and Y. Of course, (e)can also
be answered similarly if we concentrate first on the complementary event P(L > 4) =
P(X> f AND Y > 9).The two possible calculations of(e)are written below, together
with appropriate diagrams.
+
+
zyxw
zyxwv
zyxwvuts
+
P(L > +) = P(X > AND Y > +)
= P(X > +).P(X > +) = 4
P(L < +) =
P(L<i)=P(X<+OR Y < i )
= P(X< +) + P( Y < 4)
-P(X<iAND Y < i )
t
=+++-+x+
-3
- 4
c).
Distributions and Densities
When the sample space is explicit, the route to information concerning random
variables is as follows:
Calculate
F,(s)
(ii) Calculate
f,(s)
(i)
(iii) Calculate
= P( W d
=
s);
for all relevant s
dFWM
ds
~
E( W ) and Var ( W ) by integration involving fw(s)
Parts (ii) and (iii) are mechanical and do not usually present problems. However, (i)
does!
The use of a dummy variable (here I have used s) seems to cause conceptual
problems. Students are happy to calculate P(X< b), P(X<f), P(X<$) etcetera, but
became mildly disconcerted with calculating P(X< s) for 0 < s < I.
A related difficulty is that they very often fail to note the phrase ‘for all relevant s’.
Take for example the distribution of S = X + Y. Here is a typical solution.
51
zyxwvutsrq
zyxwv
zyxwvut
But X + Y can take values in the range 0 to 2, and moreover the nature of the
calculation changes, dependent on s being less than or greater than 1. Here is the full
calculation.
S
zyxwvutsrqpon
zyx
A typical reaction to this is “How can you have two answers to one question?“!
Moreover the disquiet continues when you suggest they should differentiate this cdf
to find the pdf and hence calculate the mean and variance of S = X + Y.The integral
has to be split into two integrals:
E(X+ Y )=
zyxw
j:
S.S.
ds +
j:
~ . (2 s)ds = 3 + 5 = 1
+
(Of course this has to be the answer as E(X Y )= E(X) + E( Y ) and to show that
E(X) = E( Y )= 3 is trivial!)
It is worth continuing the example to show that Var(X+ Y ) = i =
Var ( X ) Var ( Y ) .(The equality holds because X and Y are independent.)
Working through similar calculations for the other random variables mentioned
is also instructive.
For example:
+
P ( L d s ) = 1 -(1
-s)2
fL(Sf = 2(1 - s)
Odsd 1
Odsdl
zyxw
(fL(s)is the probability density function of L).
It is worth pointing out that the graph of fLshows that small values of L are more
likely than large values, not surprising since L is the minimum of two random
quantities.
Furthermore, E(L) = 3 and so E(U) must equal 5.Why? because L U = X + Y,
+
E(L)+E(U)=E(L+U)=E(X+ Y ) = 1
52
z
zyxw
zyxwvu
zyxwvu
zyxw
If time permits, make the student calculate Var(L) and Var(U). Then it is easy to
check that
Var(L) Var(U) # Var ( L V )= Var(X+ Y )=
+
+
zyxwv
zyxwv
Thus we have demonstrated that the law
+
Var (XI + X,)= Var (X,) Var (X,)
does not work for correlated variables L, U but does work for independent variables
X,Y.
What about M = XY? Calculation of the density of M is straightforward.
P(M < S) = s +
d
ds
fM(S) = -(s
1s-dx
=s
-slOg,~
zyxwv
zyx
- slog,s) = -log, s
=a.
(0 d s < 1)
and eventually E(M)
This, of course, must be the answer as X and Y are
independent, hence E ( X Y )= E(X)E(Y ) .
This leaves the ratio R = Y/X. Does E ( R ) = l ? (Argument: E(R) “=”E(Y)/
E(X)= 1.) A most emphatic no!
In fact, the expected value of R is infinite, which is not difficult to prove providing
the correct cdf is calculated. (See the remarks for S = X+ Y!).
a
S
This is a quick tour de force of the sort ofcalculations that are possible using fairly
obvious random variables generated in the unit square. The examples are worthy
exercises for student and teacher alike. As I hope I have demonstrated, many aspects
of the theory of distributions and expected values are illustrated, and particular
emphasis is placed on the path,
Probability -+ cdf + expected values,
53
zyxwvuts
zyx
zyx
zy
zyxwvuts
which is possible because we start with a simple but rich sample space-the
square.
unit
A final example
For the final example, return to the ‘meeting of friends’ problem. Let A and B be
the two friends, and suppose that A arrives at time X(measured in hours, 0 < X < 1)
and B arrives at time Y. The usual problem is to calculate the probability that they
actually meet. Assuming the 15 minute waiting time, this probability is simply given
by
zyxw
zyxwvu
zyx
A more interesting question is to calculate the distribution of the waiting time of
A , (call this W )and hence evaluate E( W).A problem arises because W is neither a
discrete nor a continuous random variable-it takes the values 0 and 1/4 with
positive probability whereas individual values between 0 and 1/4 are possible but
have zero probability. So, such mixed-type random variables are not merely
mathematical artefacts-they do occur in the simplest of probability spaces-the
unit square, in an uncontrived manner.
The calculation of P( W < s)is straightforward if care is taken! Here are the steps
with diagrams.
P( W = 0) = P(X-
< Y < X ) = 7/32
zyxwv
P( w < s) = P(X-i < Y < x+ s)
= 1 -&2)2-&1
-s)2
O<s<$
P(W<i)= 1
S
54
zyxwvu
zyxwvu
zyxwv
Below is a sketch of the curve P( W < s) and a simple way to calculate E( W ) is to
calculate the area of the region above this curve, bounded by y = 1 (see [13) and the
answer is +.
zyxwvu
zyx
Anyone wishing to pursue these examples in classroom may like to take
advantage of a BBC disk demonstrating the relationship between the cdf and the pdf
for the random variables S, M , D, L and U. These are available from the author [2]
at cost price.
University College of Swansea, Wales
References
[l] Sykes, A. M. (1981). An Alternative Approach to the Mean. Teaching Statistics, 3, 3,
82-87.
[2] Department of Management Science and Statistics, University College of Swansea,
Singleton Park, Swansea.
Probability and Statistics Study
Group: Looking back and
Looking forward
JOAN GARFIELD and DAVID GREEN
The Past
For many people interested in and involved in Statistical Education the world’s
First International Conference on Teaching Statistics held in Sheffield in 1982 was a
watershed. It was clear proof that statistical education had come of age and was
assured a lively future. It brought together people from all round the globe who had
been working in isolation but who now found common interest with colleagues in
other places. This was particularly true for one of us (David Green). It was an
exciting time to meet with experts involved in research into probability concepts
55
zy