Project Report Probability
Project Report Probability
Project Report Probability
Course title:
ENGINEERING SATISTICS AND PROBABILITY
Submitted by:
GROUP N0 03
ALI RAZA
UW-13-CE-Bsc-011
NAEEM ZAFAR
UW-11-ME-Bsc-015
MASOOD CHANDIO
UW-13-CE-Bsc-027
RAJA ZULQERNAIN
UW-13-CE-Bsc-044
Submitted to:
SIR TARIQ
Acknowledgment:
Countless gratitude to Almighty ALLAH, Who is then omnipotent, omnipresent & HE, who
blessed with the chance and choice, health and courage, and knowledge enabled us to complete
this project.
All respect for the HOLY PROPHET MUHAMMAD (S.A.W.W), who is forever a torch of
knowledge and guidance to humanity & enables us to shape our life according to the teachings of
ISLAM, & endowed us an exemplary guidance in every sphere of life.
I acknowledge the services of Mr. Tariq Hussain in helping and guiding me in compiling and
presenting the present report. In fact it would not have been possible for me to accomplish this
task without his help.
I dedicate this work to my Parents, to whom I am very thankful as they encouraged me and
provided me all the necessary resources that had made possible for me to be able to accomplish
this task.
Regards
Ali RAZA
Contents
Abstract...................................................................................................................... 5
Introduction:............................................................................................................... 6
Brief History of Correlation......................................................................................... 7
Types of Correlation.................................................................................................... 8
Correlation coefficient.............................................................................................. 12
Covariance............................................................................................................ 12
For a population...................................................................................................... 14
For a sample........................................................................................................... 15
Why Use Correlation?............................................................................................... 15
Regression................................................................................................................ 16
History...................................................................................................................... 16
Uses of Correlation and Regression..........................................................................21
Assumptions............................................................................................................. 21
Why Use Regression................................................................................................. 21
Application of correlation and regression.................................................................22
Correlation and Regression Conclusion....................................................................23
References................................................................................................................ 24
Abstract
The present review introduces methods of analyzing the relationship between two quantitative
variables. The calculation and interpretation of the sample product moment correlation
coefficient and the linear regression equation are discussed and illustrated. Common misuses of
the techniques are considered. Tests and confidence intervals for the population parameters are
described, and failures of the underlying assumptions are highlighted.
Introduction:
The most commonly used techniques for investigating the relationship between two quantitative
variables are correlation and linear regression. Correlation quantifies the strength of the linear
relationship between a pair of variables, whereas regression expresses the relationship in the
form of an equation. For example, in patients attending an accident and emergency unit (A&E),
we could use correlation and regression to determine whether there is a relationship between age
and urea level, and whether the level of urea can be predicted for a given age
adult nephew, and so on; but the index of co-relation, which is what I there [Ref. 14]
calledregression, is different in the different cases.
By 1889, Galton was writing co-relation as correlation (42), and he had become fascinated by
fingerprints (16, 19). Galton's 1890 account of his development of correlation (18) would be his
last substantive paper on the subject (43).
Karl Pearson, Galton's colleague and friend, and father of Egon Pearson, pursued the refinement
of correlation (33, 34, 37) with such vigor that the statistic r, a statistic Galton called the index of
co-relation (15) and Pearson called the Galton coefficient of reversion (36), is known today as
Pearson's r.
Correlation
Correlation and regression analysis are related in the sense that both deal with relationships
among variables. The correlation coefficient is a measure of linear association between two
variables. Values of the correlation coefficient are always between -1 and +1. A correlation
coefficient of +1 indicates that two variables are perfectly related in a positive linear sense, a
correlation coefficient of -1 indicates that two variables are perfectly related in a negative linear
sense, and a correlation coefficient of 0 indicates that there is no linear relationship between the
two variables. For simple linear regression, the sample correlation coefficient is the square root
of the coefficient of determination, with the sign of the correlation coefficient being the same as
the sign of b1, the coefficient of x1 in the estimated regression equation.
Neither regression nor correlation analyses can be interpreted as establishing cause-and-effect
relationships. They can indicate only how or to what extent variables are associated with each
other. The correlation coefficient measures only the degree of linear association between two
variables. Any conclusions about a cause-and-effect relationship must be based on the judgment
of the analyst.
Types of Correlation
Positive correlation occurs when an increase in one variable increases the value in another.
The line corresponding to the scatter plot is an increasing line.
Negative Correlation
Negative correlation occurs when an increase in one variable decreases the value of another.
The line corresponding to the scatter plot is a decreasing line.
No Correlation
No correlation occurs when there is no linear dependency between the variables.
Perfect Correlation
Perfect correlation occurs when there is a funcional dependency between the variables.
In this case all the points are in a straight line.
Strong Correlation
A correlation is stronger the closer the points are located to one another on the line.
Weak Correlation
A correlation is weaker the farther apart the points are located to one another on the line.
Correlation coefficient
Pearson's correlation coefficient is the covariance of the two variables divided by the product of
their standard deviations. The form of the definition involves a "product moment", that is, the
mean (the first moment about the origin) of the product of the mean-adjusted random variables;
hence the modifier product-moment in the name.
Covariance
Covariance indicates how two variables are related. A positive covariance means the variables
are positively related, while a negative covariance means the variables are inversely related. The
formula for calculating covariance of sample data is shown below.
x=
y=
n=
=
number
the
the
the
of
mean
data
of
independent
dependent
points
in
the
the
independent
variable
variable
sample
variable x
Using the covariance formula, you can determine whether economic growth and S&P 500
returns have a positive or inverse relationship. Before you compute the covariance, calculate the
mean of x and y. (The Summary Measures topic of the Discrete Probability Distributions section
explains the mean formula in detail.)
Now you can identify the variables for the covariance formula as follows.
x=
y=
=
2.1,
8,
2.5,
12,
4.0,
14,
and
and
3.6
10
(economic
(S&P
500
growth)
returns)
3.1
= 11
Substitute these values into the covariance formula to determine the relationship between
economic growth and S&P 500 returns.
The covariance between the returns of the S&P 500 and economic growth is 1.53. Since the
covariance is positive, the variables are positively relatedthey move together in the same
direction.
For a population
Pearson's correlation coefficient when applied to a population is commonly represented by the
Greek letter (rho) and may be referred to as the population correlation coefficientor
the population Pearson correlation coefficient. The formula for [7] is:
where:
is the covariance
The formula for can be expressed in terms of mean and expectation. Since
[7]
where:
and
is the mean of
is the expectation.
For a sample
Pearson's correlation coefficient when applied to a sample is commonly represented by the
letter r and may be referred to as the sample correlation coefficient or the sample Pearson
correlation coefficient. We can obtain a formula for r by substituting estimates of the covariances
and variances based on a sample into the formula above. So if we have one dataset {x1,...,xn}
containing n values and another dataset {y1,...,yn} containing n values then that formula for r is:
Regression
In statistics, regression is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analysing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables. More
specifically, regression analysis helps one understand how the typical value of the dependent
variable (or 'criterion variable') changes when any one of the independent variables is varied,
while the other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given the independent variables
that is, the average value of the dependent variable when the independent variables are fixed.
Less commonly, the focus is on a quantile, or other location parameter of the conditional
distribution of the dependent variable given the independent variables. In all cases, the
estimation target is a function of the independent variables called the regression function. In
regression analysis, it is also of interest to characterize the variation of the dependent variable
around the regression function which can be described by a probability distribution.
Regression analysis is widely used for prediction and forecasting, where its use has substantial
overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms
of these relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so caution is advisable; for example, correlation does not imply
causation.
History
The earliest form of regression was the method of least squares, which was published
by Legendre in 1805,and by Gauss in 1809. Legendre and Gauss both applied the method to the
problem of determining, from astronomical observations, the orbits of bodies about the Sun
(mostly comets, but also later the then newly discovered minor planets). Gauss published a
further development of the theory of least squares in 1821,including a version of the Gauss
Markov theorem.
The term "regression" was coined by Francis Galton in the nineteenth century to describe a
biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors
tend to regress down towards a normal average (a phenomenon also known as regression toward
the mean). For Galton, regression had only this biological meaning, but his work was later
extended by Udny Yule and Karl Pearson to a more general statistical context. In the work of
Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to
be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and
1925. Fisher assumed that the conditional distribution of the response variable is Gaussian, but
the joint distribution need not be. In this respect, Fisher's assumption is closer to Gauss's
formulation of 1821.
In the 1950s and 1960s, economists used electromechanical desk calculators to calculate
regressions. Before 1970, it sometimes took up to 24 hours to receive the result from one
regression.
Regression methods continue to be an area of active research. In recent decades, new methods
have been developed for robust regression, regression involving correlated responses such
as time series and growth curves, regression in which the predictor or response variables are
curves, images, graphs, or other complex data objects, regression methods accommodating
various types of missing data, nonparametric regression, Bayesian methods for regression,
regression in which the predictor variables are measured with error, regression with more
predictor variables than observations, and causal inference with regression.
Or
It provides estimate of
values
of
dependent
variables from values of
independent variables.
It can be extended to 2 or
more variables, which is
known
as
multiple
regression.
Algebraically method-:
1.Least Square Method-:
The regression equation of X on Y is :
X= a+bY
Where,
X=Dependent variable
Y=Independent variable
The regression equation of Y on X is:
Y = a+bX
Where,
Y=Dependent variable
X=Independent variable
And the values of a and b in the above
equations are found by the method of least
of Squares-reference . The values of a and b
are found with the help of normal equations
given below:
(I )
(II )
Solution-:
X=0.49+0.74Y
The second main use for correlation and regression is to see whether two variables are
associated, without necessarily inferring a cause-and-effect relationship. In this case, neither
variable is determined by the experimenter; both are naturally variable. If an association is
found, the inference is that variation in X may cause variation in Y, or variation in Y may
cause variation in X, or variation in some other factor may affect both X and Y.
The third common use of linear regression is estimating the value of one variable
corresponding to a particular value of the other variable.
Assumptions
Some underlying assumptions governing the uses of correlation and regression are as follows.
The observations are assumed to be independent. For correlation, both variables should be
random variables, but for regression only the dependent variable Y must be random. In carrying
out hypothesis tests, the response variable should follow Normal distribution and the variability
of Y should be the same for each value of the predictor variable. A scatter diagram of the data
provides an initial check of the assumptions for regression.
Bridge Engineering
2. Construction engineering
3. Environmental engineering
4. Fire protection engineering
5. Geotechnical engineering
6. Hydraulic engineering
7. Materials science
8. Structural engineering
9. Surveying
10. Timber Engineering
11. Transportation engineering
12. Water resources engineering
13. Agricultural Engineering
14. Civil Engineering
15. Chemical Engineering
16. Electrical Engineering
17. Environmental Engineering
18. Industrial Engineering
19. Marine Engineering
20. Material Science
References
1. Whitley E, Ball J. Statistics review 1: Presenting and summarising data. Crit
Care. 2002;6:6671. doi: 10.1186/cc1455. [PMC free article] [PubMed] [Cross Ref]
2. Kirkwood BR, Sterne JAC. Essential Medical Statistics. 2. Oxford: Blackwell Science;
2003.
3. Whitley E, Ball J. Statistics review 2: Samples and populations. Crit Care. 2002;6:143
148. doi: 10.1186/cc1473. [PMC free article] [PubMed] [Cross Ref]
4. Bland M. An Introduction to Medical Statistics. 3. Oxford: Oxford University Press;
2001.
5. Bland M, Altman DG. Statistical methods for assessing agreement between two methods
of clinical measurement. Lancet. 1986;i:307310. [PubMed]
6. Zar JH. Biostatistical Analysis. 4. New Jersey, USA: Prentice Hall; 1999.
7. Altman DG. Practical Statistics for Medical Research. London: Chapman & Hall; 1991.