Probability and the Unit Square

Alan Sykes

zy zyxwv zyxwvutsr Probability and the Unit Square ALAN M. SYKES Keywords: Teaching; Sample space; Distribution; Probability density function Introduction The mathematical structure of probability distributions that underlies Statistics is quite complex. Students are bewildered by terms such as pdf and cdf, mean and variance, and the relationships between the terms. Carefully chosen exercises help, and it is particularly instructive if a variety of such exercises have a common starting point which is easy to understand. “Choosing a point at random in a square” is a simple, intuitive statistical experiment. With micro-computers in the classroom it can be demonstrated easily. We show in the succeeding sections of the paper just how much of the elementary theory of probability distributions can be explored from within this basic framework. At the same time, difficulties experienced by students are highlighted in the hope that readers may respond with suggestions for overcoming them! The Unit Square as a Sample Space Consider the square with vertices at the points (0,O) (1,O) (0,l) and (1,l). A point chosen at random from the square has (random) co-ordinates X , Y. By random we understand that the probability of the random point lying in a particular region of the square is proportional to the area of the region (and hence, since the total area equals 1 , is equal to the area). (030) zyxwv (LO) Why use such a sample space? If you require motivation for its study, why not consider the old fair-ground game of rolling a coin of diameter d onto a plane of squares of side l? Alternatively, consider the meeting of friends problem whereby two friends agree to meet at a particular coffee house for lunch, arriving at random within the lunch hour and they agree to wait only 15 minutes. Both contexts pose obvious interesting questions, and in each case the unit square is the correct sample space. 49 zyxwvu zyx zyx zyxwv zyxw zyxw zyxwvut zyxwvutsr Probabilities and Random Variables Our sample space immediately provides two random variables, Xand Y. As a first step students must answer probability questions about X and Y by determining appropriate regions of the unit square and calculating areas. The calculation P(X< f)(answer )) involves realizing that the event X < f consists of all points in the square to the left of the line x=). Note that, as with many situations in statistics, functional relationships are used inversely. (The random variable Xis strictly a function which maps a point (x,y) onto x, but what is really important is to be able to use this function in reverse, and determine the regions of points (x,y) such that X(x,y)<f.) I suspect this aspect of functions receives insufficient attention in A-level Mathematics courses, at least from a statistical point of view. A typical set of exercises might be (i) P(X< +) (ii) P( Y > 2) (iii) P(X < ) AND Y > 2) Students find the answers without too much difficulty and may be prompted to enquire whether the fact that answer (iii) = answer (ii) x answer (i) is a freak or not. (Hence to independence of random variables.) The great thing about random variables is that they breed well! The usual arithmetical processes can be applied to random variables on the same sample space to give birth to more. Here are some examples and some suggested calculations, each of which should be accompanied by a quick sketch of the appropriate region of the sample space. P(S < +) = $ (a) S = X + Y (b) D = X - Y P(D < i)= 1 (c) M = X Y P(M < f) = + f log,2 (d) R = Y/X P(R < 2) = 1 (e) L = min ( X , Y ) P(L < +) (f) U = max ( X , Y ) P(U <+)=$ Each of these examples seems to cause difficulties. A general problem is the reluctance to deal with inequalities. Despite answering (i), (ii) and (iii) above, students baulk at determining the region such that X+ Y < ) and need to be persuaded to look at the boundary X + Y = f and infer the region from that. Whilst this may be successful, applying the same technique to (b) often leads to an incorrect region! (I suggest a check by inserting a point into the inequality.) Question (c)causes problems for similar reasons and furthermore, the calculation of the area involves splitting into separate parts, corresponding to the area of the reactangle X<+,and the area under the hyperbola y = 1/2x from 3 to 1. Question (d) should be easier than (c) but is not, because of a reluctance to manipulate inequalities to turn Y / X < + into Y < +X. The last two questions appear to present particular difficulties, probably because of lack of familiarity with the ideas of maximum and minimum as a binary + =a zyx 50 zyxwvuts zyxwvu zy zy zyx zyxwv operation. They are, however, particularly fruitful questions to ask as they link directly with the basic axiom of addition of probabilities. The event P ( L < i ) is clearly (?) equal to P(X< OR Y < t)whilst P( U < +) = P(X< AND Y < Hence (f) can be answered easily using the independence of X and Y. Of course, (e)can also be answered similarly if we concentrate first on the complementary event P(L > 4) = P(X> f AND Y > 9).The two possible calculations of(e)are written below, together with appropriate diagrams. + + zyxw zyxwv zyxwvuts + P(L > +) = P(X > AND Y > +) = P(X > +).P(X > +) = 4 P(L < +) = P(L<i)=P(X<+OR Y < i ) = P(X< +) + P( Y < 4) -P(X<iAND Y < i ) t =+++-+x+ -3 - 4 c). Distributions and Densities When the sample space is explicit, the route to information concerning random variables is as follows: Calculate F,(s) (ii) Calculate f,(s) (i) (iii) Calculate = P( W d = s); for all relevant s dFWM ds ~ E( W ) and Var ( W ) by integration involving fw(s) Parts (ii) and (iii) are mechanical and do not usually present problems. However, (i) does! The use of a dummy variable (here I have used s) seems to cause conceptual problems. Students are happy to calculate P(X< b), P(X<f), P(X<$) etcetera, but became mildly disconcerted with calculating P(X< s) for 0 < s < I. A related difficulty is that they very often fail to note the phrase ‘for all relevant s’. Take for example the distribution of S = X + Y. Here is a typical solution. 51 zyxwvutsrq zyxwv zyxwvut But X + Y can take values in the range 0 to 2, and moreover the nature of the calculation changes, dependent on s being less than or greater than 1. Here is the full calculation. S zyxwvutsrqpon zyx A typical reaction to this is “How can you have two answers to one question?“! Moreover the disquiet continues when you suggest they should differentiate this cdf to find the pdf and hence calculate the mean and variance of S = X + Y.The integral has to be split into two integrals: E(X+ Y )= zyxw j: S.S. ds + j: ~ . (2 s)ds = 3 + 5 = 1 + (Of course this has to be the answer as E(X Y )= E(X) + E( Y ) and to show that E(X) = E( Y )= 3 is trivial!) It is worth continuing the example to show that Var(X+ Y ) = i = Var ( X ) Var ( Y ) .(The equality holds because X and Y are independent.) Working through similar calculations for the other random variables mentioned is also instructive. For example: + P ( L d s ) = 1 -(1 -s)2 fL(Sf = 2(1 - s) Odsd 1 Odsdl zyxw (fL(s)is the probability density function of L). It is worth pointing out that the graph of fLshows that small values of L are more likely than large values, not surprising since L is the minimum of two random quantities. Furthermore, E(L) = 3 and so E(U) must equal 5.Why? because L U = X + Y, + E(L)+E(U)=E(L+U)=E(X+ Y ) = 1 52 z zyxw zyxwvu zyxwvu zyxw If time permits, make the student calculate Var(L) and Var(U). Then it is easy to check that Var(L) Var(U) # Var ( L V )= Var(X+ Y )= + + zyxwv zyxwv Thus we have demonstrated that the law + Var (XI + X,)= Var (X,) Var (X,) does not work for correlated variables L, U but does work for independent variables X,Y. What about M = XY? Calculation of the density of M is straightforward. P(M < S) = s + d ds fM(S) = -(s 1s-dx =s -slOg,~ zyxwv zyx - slog,s) = -log, s =a. (0 d s < 1) and eventually E(M) This, of course, must be the answer as X and Y are independent, hence E ( X Y )= E(X)E(Y ) . This leaves the ratio R = Y/X. Does E ( R ) = l ? (Argument: E(R) “=”E(Y)/ E(X)= 1.) A most emphatic no! In fact, the expected value of R is infinite, which is not difficult to prove providing the correct cdf is calculated. (See the remarks for S = X+ Y!). a S This is a quick tour de force of the sort ofcalculations that are possible using fairly obvious random variables generated in the unit square. The examples are worthy exercises for student and teacher alike. As I hope I have demonstrated, many aspects of the theory of distributions and expected values are illustrated, and particular emphasis is placed on the path, Probability -+ cdf + expected values, 53 zyxwvuts zyx zyx zy zyxwvuts which is possible because we start with a simple but rich sample space-the square. unit A final example For the final example, return to the ‘meeting of friends’ problem. Let A and B be the two friends, and suppose that A arrives at time X(measured in hours, 0 < X < 1) and B arrives at time Y. The usual problem is to calculate the probability that they actually meet. Assuming the 15 minute waiting time, this probability is simply given by zyxw zyxwvu zyx A more interesting question is to calculate the distribution of the waiting time of A , (call this W )and hence evaluate E( W).A problem arises because W is neither a discrete nor a continuous random variable-it takes the values 0 and 1/4 with positive probability whereas individual values between 0 and 1/4 are possible but have zero probability. So, such mixed-type random variables are not merely mathematical artefacts-they do occur in the simplest of probability spaces-the unit square, in an uncontrived manner. The calculation of P( W < s)is straightforward if care is taken! Here are the steps with diagrams. P( W = 0) = P(X- < Y < X ) = 7/32 zyxwv P( w < s) = P(X-i < Y < x+ s) = 1 -&2)2-&1 -s)2 O<s<$ P(W<i)= 1 S 54 zyxwvu zyxwvu zyxwv Below is a sketch of the curve P( W < s) and a simple way to calculate E( W ) is to calculate the area of the region above this curve, bounded by y = 1 (see [13) and the answer is +. zyxwvu zyx Anyone wishing to pursue these examples in classroom may like to take advantage of a BBC disk demonstrating the relationship between the cdf and the pdf for the random variables S, M , D, L and U. These are available from the author [2] at cost price. University College of Swansea, Wales References [l] Sykes, A. M. (1981). An Alternative Approach to the Mean. Teaching Statistics, 3, 3, 82-87. [2] Department of Management Science and Statistics, University College of Swansea, Singleton Park, Swansea. Probability and Statistics Study Group: Looking back and Looking forward JOAN GARFIELD and DAVID GREEN The Past For many people interested in and involved in Statistical Education the world’s First International Conference on Teaching Statistics held in Sheffield in 1982 was a watershed. It was clear proof that statistical education had come of age and was assured a lively future. It brought together people from all round the globe who had been working in isolation but who now found common interest with colleagues in other places. This was particularly true for one of us (David Green). It was an exciting time to meet with experts involved in research into probability concepts 55 zy

RELATED PAPERS

RELATED TOPICS

Log In

Probability and the Unit Square

Probability and the Unit Square

Related Papers

RELATED PAPERS

RELATED TOPICS