Statistics II
Statistics II
Statistics II
Topics
Estimates of mean and error method of maximum likelihood and error of the mean Weighted mean, error of weighted mean Condence intervals Chi-square (2) distribution and degrees of freedom () Histogram 2
Format for lab writeup Next - least squares t to a straight line and errors of parameters
2
Brief overview
Suppose we have x = f(u,v) and u and v are uncorrelated The Author April 8, 2006 random variables with Gaussian distributions. Expand f in a TaylorsApril series around x = f(u0,v0) where u0,v0 8, 2006 0 are the mean values of u and v, keeping the lowest order terms:
x x x0 = f u u + f v v
The distribution of xis fdistribution in u and v. f a bivariate u + v x = Under suitable conditions u 2(see Bevington v 2 Ch. 3) we can f f 2 (the 2 deviation 2 x) by approximate standard of x u + v x
x 2
u f 2 2 u + u
v f 2 2 v v
of a binomial distribution. For a total of 1035 cars, we know that on a of people go right, 25% left. On this particular day 752 went right, so le right a success. Lets assume the 75% probability is very well established 1035! PB (x; 1035, 0.75) = x!(1035 x)! 3 4
x
2.8) There are two at discrete choices with a choices, xed probability forleft; eachp=0.75 choice, so this is an example Cars fork in road: two right or for right of a binomial distribution. For a ofnumber 1035 cars, know that typical day 75% (776.25) where x total is the of we cars going to on thearight. on a typical day. of people go right, 25% left. On this particular day 752 went right, so lets call a car going to the right a success. Lets assume the 75% probability is very well established. Thus,
1 4
1035
We are supposed to nd the probability of getting a result as far or farther from the mean (assume x = 752 cars turned right on a particular day the deviation can be less than or more than the mean). This would require summing this ugly It is convenient to transform this to a unit normal distribution in the var function from 0 to 752 and 800 to 1035 (or one minus the sum from 753 to 799). Since n 1 has = 0 with and = 1. aSince we day? allow deviations in either direction fro Is this consistent being typical and x 1, we can approximate the Binomial with a Gaussian if we are not too near x = n. The 776.25|/13.9 = 1.745. The probability for falling inside the in |z |by = |752 standard deviation is given = npq = 13.9 so the mean is (1035 776.25) /13.9 = 19 from a this unitwith normal is given Table C.2large of Bevington as 0.9 Find for probability for a distribution deviation from thein mean this or n = 1035. We can appoximate a Gaussian distribution.
x the mean). This would re the deviation can be less than 3or xmore than 1 1035 1035! PB (x; 1035, 0 .75) = function from 0x to 752 and (or one minus the sum from 7 !(1035 x)!800 4 to 1035 4 and x 1, we can approximate the Binomial with a Gaussian if we are where x is the number of standard cars goingdeviation to the right. is given by = npq = 13.9 so the mean is (1035 7 Mean = np = 1035x0.75 = 776.25, n = 1035. We can appoximate this with a Gaussian distribution.
Binomial distribution 1035 cars We are supposed to nd the probability of getting a result as far or farthe
thisthis being a unit typical daydistribution is 8.1%. in the variable z = (x )/ . This It is convenient to transform to a normal has = 0 and =Could 1. Since we allow deviations in either direction from the mean, our limit is sum PB from 0 to 752 and from 800 to 1035 |z | = |752 776.25|/13.9 = 1.745. Thealso probability falling inside the interval 1.745 < z < 1.745 in La You could use thefor Gaussian cumulative distribution function for a unit normal distribution is given in with Table C.2 of Bevington asx 0.large, 9190 (using interpolation). in the gure below): Can Dist approximate Gaussian since n, x still many Thus the probability of exceeding 1.745 from the mean is 1-0.9190 = 0.081 and the probability of less that n. (19 to be exact.) this being a typical day is 8.1%.
4
Thus the positive probability exceeding 1.745 from the mean is 1-0.9190 = 0.0 larger, either orof negative
n be less than or more than the mean). This would require summing this ugly the deviation can be less than or more than the me to 752 and 800 to 1035 (or one minus the sum from 753 to 800 799). Since 1 function from 0 to 752 and to 1035 (orn one min 2.8 Solution (continued) an approximate Bevington the Binomial with aand Gaussian if we not too near x = n. with Thea x 1, we canare approximate the Binomial standard is given by /= npq 9 so th on is given by = npq = 13.9 so the mean deviation is (1035 776.25) 13.9 == 1913 .from n = 1035. We can appoximate this with a Gaussian d n appoximate this with a Gaussian distribution. Use Gaussian approximation with same , .
o transform this Transform to a unit normal distribution in the variable interval to unit normal distribution in z = (x )/ . This has = 0 and = 1. Since we allow deviations in = 1. Since we allow in |z either direction mean, our limit for is | = |752 776.25|/from 13.9 =the 1.745. The probability Limits deviations of interval z, 5|/13.9 = 1.745. The probability for for falling inside the interval 1.given 745 < < 1.745 a unit normal distribution is inzTable C.2 o Probability for z within this interval given in Table C.2 of Thus the probability of exceeding 745 from the me l distribution is given in Table C.2 of Bevington as 0.9190 (using1.interpolation). Bevington (probabilitythis of being within 1.745 8.1%. of ): being a typical day is lity of exceedingP(-1.745 1.745 from the mean is 1-0.9190 = 0.081 and the probability of < z 1.745) = 0.9190 cal day is 8.1%. couldday also Gaussian cumulative distribu Probability that this is You a typical = use 1- Pthe = 0.081 (or 8.1%)
Condence interval: the above interval is a in 91.9% condence se the Gaussian cumulative distribution function LabVIEW (called Normal interval for z e below): z should lie within this interval 91.9% of the time. Larger probability interval required to exclude hypothesis
|z | = |752 776.25|/13.9 = 1.745. The probability for falling inside the interval 1.745 for a unit normal distribution is given in Table C.2 of Bevington as 0.9190 (using in Bevington Thus the probability of exceeding 12.8 .745 (concluded) from the mean is 1-0.9190 = 0.081 and the p this being a typical day is 8.1%.
You could also use the cumulative distribution function to nd P You could also use the Gaussian cumulative distribution function in LabVIEW (cal available in LabVIEW: Dist the is gure below): inThis
1.745
1 1 exp z 2 = 0.0405 2 2
2 2 f f Test of a Distribution x 2 u 2 + v 2 u v N f 2 f x u + v x x x = i i0 Dene 2 u v i n! i=1 PB (x; n, p) = px (1 p)nx f f x!(n x)! u + v xi, it can be shown x x x0 Gaussian = 2 2 For variables that this f f u random v degrees x 2with u 2 + of freedom. v 2 follows the chi-square distribution u v N 2 x i i 2 2 2 (i.e., mean value) for The expectation value is f f i 2 2 2 x u + v i =1 n! x nx u v P ( x ; n, p ) = p (1 p ) B If this is based on a t where some parameters x!(n are x)! determined
PB (x; n, p) =
from the t, the number of degrees of freedom is reduced by the n!tted x number of parameters nx
x!(n x)!
N
p (1 p)
i=1
i=1 as was done for Gaussian probability. (See Table C.4 in Bevington)
Comments on 2 of Histogram
N
2
i=1
(Ni ni )2 ni
Can model this with N individual mutually independent binomials, so long as a xed total is not required. Then normalize to the actual Ntotal, using 1 degree of freedom in the t (N = number of bins) (see discussion in Bevington, Ch. 3 and Prob. 4.13 solution on next page)
For small ni, large Ntotal, large number of bins, ni is approx. Poisson
>5 OK) or N >>1 2 2 distribution if all ni >>1(ni
For xed Ntotal, model with multinomial distribution (see p. 12) But ni are not mutually independent with multinomial
4.13) I made a LabVIEW VI (see gures on next page) to plot the histogram, calculate the Gaussia comparison values and nd 2 according to Eq. 4.33 of Bevington:
4.13) I made a LabVIEW VI (see gures on next page) to plot the histogram, calculate the Gaussian n comparison values and nd 2 according to[h Eq. (xj4.33 ) of NBevington: P (xj )]2
32.8 and
j =1
where n is the number of bins, N is the total number of trials, h(xj ) is the contents of the j th b is the number of bins, N is the total of trials, h(xj ) is the contents of thefor j thdetails). bin and N P (where xj ) isnthe expected contents from the number Gaussian distribution (see the text
and N P (xj ) is the expected contents from the Gaussian distribution (see the text for details).
j =1
Assume the bins are small enough so so we can the integral ofp.d.f. the p.d.f. over Assume the bins are small enough we canapproximate approximate the integral of the over the bin the b with the with central value of the times the Then the central value of p.d.f. the p.d.f. times thebin bin width. width. Then
P( xj ) = N N PN (x j) = N p( (x x) dx NN p(x ) x j p ) dx p ( x j )x xj
xj
where width.
where p(xj ) is the Gaussian p.d.f. evaluated at the center of the j th bin and x = 2 is the bin p(x width. j ) is the Gaussian p.d.f. evaluated at xj , the center of the j th bin and x = 2 is
the b
This j and are given but the total number of trials is taken to agree with the experiment (200 trials The expectation value for 2 equals the number of degrees of freedom, < 2 >= 12. The resulting This represents one constraint and reduces of degrees of freedom by 1. We have 1 2 = 8.28 (disagrees with the answer in the the booknumber but was checked independently). The probability 9 bins to compare with the Gaussian = 12 degrees of freedom. for exceeding this value of 2 is and 0.76 (calculated by LabVIEW but agrees with interpolated value
This analysis assumes the contents of each j is xjh an 2independent measurement. 1 histogram 1 bin p(xj )= exp to agree with . and are given but the total number of trials is taken the experiment (200 trials). 2 2 This represents one constraint and reduces the number of degrees of freedom by 1. We have 13 bins to compare with Gaussian = 12 degrees of freedom. analysis assumes the the contents of and each histogram bin h is an independent measurement.
The VI front panel with the results and the diagram which produced them are shown in the following Histogram Chi-Square Result gures.
10
where p(xj ) is the Gaussian p.d.f. evaluated at the center of the j th bin and x = 2 is the bin width. This analysis assumes the contents of each histogram bin hj is an independent measurement. and are given but the total number of trials is taken to agree with the experiment (200 trials). This represents one constraint and reduces the number of degrees of freedom by 1. We have 13 bins to compare with the Gaussian and = 12 degrees of freedom. The expectation value for 2 equals the number of degrees of freedom, < 2 >= 12. The resulting 2 = 8.28 (disagrees with the answer in the book but was checked independently). The probability for exceeding this value of 2 is 0.76 (calculated by LabVIEW but agrees with interpolated value 2> = 0.69 (expectation value of 1). This is not a bad t. Of from Table C.4). The reduced 2 < 2 course, the distribution is only valid for underlying Gaussians and this is not a good assumption for the bins with low occupancy.
N P (xj ) = N
p(x) dx N p(xj )x
11
Multinomial Distribution
Histogram with n bins, N total counts (partition N events into n bins), xi counts in ith bin, with probability pi to get a count in ith bin
N! xn 1 x2 . . . p p px P (x1 , x2 , . . . , xn ) = n x1 !x2 ! . . . xn ! 1 2
n n
with
i=1
xi = N,
i=1
pi = 1 .
Then
i = N pi ,
i 2 = N pi (1 pi ),
ij 2 = N pi pj
Reference:
http://mathworld.wolfram.com/MultinomialDistribution.html
12
Abstract (very brief overview stating results) Introduction (Overview and theory related to experiment) Experimental setup and procedure Analysis of data and results with errors
Graphs should have axes labeled with units, usually points should have error bars and the graph should have a caption explaining briey what is plotted
Comparison with accepted values, discussion of results and errors; conclusions, if any. References
13