Cement Process Engineering Vade Mecum: 2. Statistics
Cement Process Engineering Vade Mecum: 2. Statistics
Cement Process Engineering Vade Mecum: 2. Statistics
VADE MECUM
2. STATISTICS
Rev. 2002
SECTION 2 STATISTICS
Table of Contents
1. Descriptive Statistics ............................................................................... 2.1
1.1 Definitions .......................................................................................... 2.1
1.2 Basic .................................................................................................. 2.1
1.3 Normal Probability Distribution .......................................................... 2.1
1.4 Interval Estimation and Tables ............................................................ 2.2
2. Statistical Estimation Tests ..................................................................... 2.3
2.1 Generalities......................................................................................... 2.3
2.2 Test for the Equality of Two Variances ( 1 , 2 ) of two Normal
Population of Random Size, ( n1 , n 2 ) .................................................. 2.3
2.3 Fisher Distribution Table .................................................................... 2.4
3. Correlation Between Data Regression.................................................. 2.5
3.1 Generalities......................................................................................... 2.5
3.2 Least Squared Lines............................................................................ 2.5
4. Temporal/Regionalized Series (Variables) .............................................. 2.6
4.1 Stationnarity ....................................................................................... 2.6
4.2 Variogram .......................................................................................... 2.6
4.3 Raw Mix Control Tuning .................................................................... 2.9
5. Sampling ................................................................................................ 2.10
5.1 Golden Rules .................................................................................... 2.10
5.2 Fundamental Error (FE) .................................................................... 2.10
5.3 Minimum Representative Weight (MRW).......................................... 2.11
5.4 Estimation of the Maximum Particle Size .......................................... 2.11
5.5 Minimum Number of Observations.................................................... 2.12
5.6 Mechanical Sampling........................................................................ 2.12
5.7 Manual Sampling on Conveyor Belt .................................................. 2.13
Index - i
Rev. 2002
SECTION 2 STATISTICS
1. Descriptive Statistics
1.1 Definitions
Statistics is the science of drawing conclusions about a population based on an analysis of sample data from
that population.
Population: values that can be taken by a variable.
Sample: drawing of n values of the variable taken from the population.
Random Variable = X = ( xi ) .
Probability Distribution = P ( xi ) . It describes the random variable probability of occurrence and is described
by its parameters. (Example: Normal distribution is described by and , see below).
Statistic = Any function of the sample data.
Estimator = An estimator of a parameter is a statistic, which corresponds to the parameter. For instance :
The sample mean ( x ) is the estimator of the actual population mean
The sample variance ( S 2 ) is the estimator of the actual population variance 2
Interval Estimation: An interval estimation of a parameter is the interval between 2 statistics that includes the
true value of the parameter with a given probability (1- ).
1.2 Basic
i =1
Arithmetical Mean = x =
2
Variance = S X
2
2
2
2
2 2
SX
+Y = S X + S Y and S aX = a S X
(x x)
n
Standard Deviation = S X =
i =1
n 1
(x x )* ( y y)
The most often used probability distribution is the Normal probability distribution:
(
2
1 x x
)
2
dZ
1
=
e
dx 2
Central Limit Theorem
For a group of n independent sampling units drawn from a population of mean and variance 2 , the
sampling distribution of x =
x ,
1
n
i =1
2
. Said:
n
2.1
Rev. 2002
SECTION 2 STATISTICS
The confidence interval, with a probability of (1- ), in which for any samples of the population with a given
unknown mean ( ) and known variance ( 2 ), the average x of the sample should range is given by:
x
2
x+
0.25
0.159
0.10
0.05
0.025
0.6745
1
1.28
1.64
1.96
0.0232
0.01
0.005
0.00135
0.001
2
2.32
2.57
3
3.09
Example:
Estimation of the true LHV mean ( ) of liquid waste fuel.
An n-size sample (n=100) of different waste fuel shipments gave a mean x = 5.5 MCal/kg. Standard deviation
2
= 1 / 100 = 0.01 .
n
Thus, we are sure at 1 = 90% (then
Remark:
If the population from which the sample is taken, is not infinite (lets say population size=800), then we have
to use a corrective factor of 1 n = 1 100
= 0.935 .
N
800
c) If the Variance 2 is Unknown and Sample Size n<30,
It has to be approximated by the variance S 2 of the sample. The Normal distribution is replaced by a t
distribution (Student distribution). The estimated interval, with a confidence of (1- ) and n-1 degree of
freedom, is given by:
S
S
x + t
x t
, n 1 n
, n 1 n
2
Example
With the data as above assuming S=1Mcal/kg, then with the same confidence (90%) and say 20 (21 samples)
degrees of freedom, ( ) is between: x 1.72 0.01 = [5.5 0.172 ,5.5 + 0.172] = [5.328 ,5.672] .
2.2
Rev. 2002
SECTION 2 STATISTICS
t 0.005 ,
t0.01,
t 0.025 ,
t 0.05 ,
t 0.10 ,
t 0.25 ,
t 0.45 ,
1
2
3
4
5
10
15
20
40
120
63.66
9.92
5.84
4.6
4.03
3.17
2.95
2.84
2.70
2.62
31.82
6.96
4.54
3.75
3.36
2.76
2.60
2.53
2.42
2.36
12.71
4.3
3.18
2.78
2.57
2.23
2.13
2.09
2.02
1.98
6.31
2.92
2.35
2.13
2.02
1.81
1.75
1.72
1.68
1.66
3.08
1.89
1.64
1.53
1.48
1.37
1.34
1.32
1.30
1.29
1
0.817
0.765
0.741
0.727
0.700
0.691
0.687
0.681
0.677
0.158
0.142
0.137
0.134
0.132
0.129
0.128
0.127
0.126
0.126
A statistical hypothesis is a statement about the values of the parameters of a probability distribution.
Null hypothesis (H o ) : A = B , Alternative hypothesis (H 1 ) : A B .
To test a hypothesis, we take a random sample from the population under study, compute an appropriate test
statistic and then either reject or fail to reject ( H o ) with a risk of rejecting H o although H o is true.
2.2 Test for the Equality of Two Variances ( 1 , 2 ) of two Normal Population of Random
Size, ( n1 , n 2 )
(Excel Function FTEST)
Test Description
H o : 12 = 22 , H 1 : 12 22
We reject H o if Fo > F
2
Where
F
2
, n1 1, n2 1
S 12
S 22
, n11, n2 1
or if Fo < F
and F
, n1 1, n2 1,
2
1 , n1 1, n2 1
2
As the table for the F table gives only the upper tail points of the F, so to find F
, n1 1,n2 1
2
use: F
1 , n1 1, n2 1
2
1
F
2
we must
, n2 1, n1 1
2.3
Rev. 2002
SECTION 2 STATISTICS
(x 4586 )
8
S12 =
i =1
8 1
= 34796 , S 2 2 = 26598
34796
Fo =
= 1.308 < F0.05
= 4.99
26598
,8 1,8 1
2
1
The test yields not to reject H o : the measurements dont allow us to conclude that #1 way of sampling is
significantly, with 5% confidence, different than #2 (even if S 1 > S 2 ). The excel function is FINV(0.025,7,7).
(Calculated for the upper 2.5% of the F distribution (in our case, when = 5% )).
F(0.025, n1,n2)
1
1 647.79
2 799.48
3 864.15
4 899.60
5 921.83
6 937.11
7 948.20
8 956.64
9 963.28
10 968.63
15 984.87
20 993.08
25 998.09
30 1001.40
40 1005.60
50 1008.10
60 1009.79
70 1011.01
80 1011.91
90 1012.61
100 1013.16
200 1015.72
2
38.51
39.00
39.17
39.25
39.30
39.33
39.36
39.37
39.39
39.40
39.43
39.45
39.46
39.46
39.47
39.48
39.48
39.48
39.49
39.49
39.49
39.49
Ex: F(0.025,5,10)=4.24
3
4
5
6
17.44 12.22 10.01 8.81
16.04 10.65 8.43 7.26
15.44 9.98 7.76 6.60
15.10 9.60 7.39 6.23
14.88 9.36 7.15 5.99
14.73 9.20 6.98 5.82
14.62 9.07 6.85 5.70
14.54 8.98 6.76 5.60
14.47 8.90 6.68 5.52
14.42 8.84 6.62 5.46
14.25 8.66 6.43 5.27
14.17 8.56 6.33 5.17
14.12 8.50 6.27 5.11
14.08 8.46 6.23 5.07
14.04 8.41 6.18 5.01
14.01 8.38 6.14 4.98
13.99 8.36 6.12 4.96
13.98 8.35 6.11 4.94
13.97 8.33 6.10 4.93
13.96 8.33 6.09 4.92
13.96 8.32 6.08 4.92
13.93 8.29 6.05 4.88
7
8.07
6.54
5.89
5.52
5.29
5.12
4.99
4.90
4.82
4.76
4.57
4.47
4.40
4.36
4.31
4.28
4.25
4.24
4.23
4.22
4.21
4.18
8
7.57
6.06
5.42
5.05
4.82
4.65
4.53
4.43
4.36
4.30
4.10
4.00
3.94
3.89
3.84
3.81
3.78
3.77
3.76
3.75
3.74
3.70
9
7.21
5.71
5.08
4.72
4.48
4.32
4.20
4.10
4.03
3.96
3.77
3.67
3.60
3.56
3.51
3.47
3.45
3.43
3.42
3.41
3.40
3.37
10
6.94
5.46
4.83
4.47
4.24
4.07
3.95
3.85
3.78
3.72
3.52
3.42
3.35
3.31
3.26
3.22
3.20
3.18
3.17
3.16
3.15
3.12
15
6.20
4.77
4.15
3.80
3.58
3.41
3.29
3.20
3.12
3.06
2.86
2.76
2.69
2.64
2.59
2.55
2.52
2.51
2.49
2.48
2.47
2.44
20
5.87
4.46
3.86
3.51
3.29
3.13
3.01
2.91
2.84
2.77
2.57
2.46
2.40
2.35
2.29
2.25
2.22
2.20
2.19
2.18
2.17
2.13
25
5.69
4.29
3.69
3.35
3.13
2.97
2.85
2.75
2.68
2.61
2.41
2.30
2.23
2.18
2.12
2.08
2.05
2.03
2.02
2.01
2.00
1.95
30
5.57
4.18
3.59
3.25
3.03
2.87
2.75
2.65
2.57
2.51
2.31
2.20
2.12
2.07
2.01
1.97
1.94
1.92
1.90
1.89
1.88
1.84
40
5.42
4.05
3.46
3.13
2.90
2.74
2.62
2.53
2.45
2.39
2.18
2.07
1.99
1.94
1.88
1.83
1.80
1.78
1.76
1.75
1.74
1.69
50
5.34
3.97
3.39
3.05
2.83
2.67
2.55
2.46
2.38
2.32
2.11
1.99
1.92
1.87
1.80
1.75
1.72
1.70
1.68
1.67
1.66
1.60
60
5.29
3.93
3.34
3.01
2.79
2.63
2.51
2.41
2.33
2.27
2.06
1.94
1.87
1.82
1.74
1.70
1.67
1.64
1.63
1.61
1.60
1.54
70
5.25
3.89
3.31
2.97
2.75
2.59
2.47
2.38
2.30
2.24
2.03
1.91
1.83
1.78
1.71
1.66
1.63
1.60
1.59
1.57
1.56
1.50
80
5.22
3.86
3.28
2.95
2.73
2.57
2.45
2.35
2.28
2.21
2.00
1.88
1.81
1.75
1.68
1.63
1.60
1.57
1.55
1.54
1.53
1.47
90
5.20
3.84
3.26
2.93
2.71
2.55
2.43
2.34
2.26
2.19
1.98
1.86
1.79
1.73
1.66
1.61
1.58
1.55
1.53
1.52
1.50
1.44
100
5.18
3.83
3.25
2.92
2.70
2.54
2.42
2.32
2.24
2.18
1.97
1.85
1.77
1.71
1.64
1.59
1.56
1.53
1.51
1.50
1.48
1.42
200
5.10
3.76
3.18
2.85
2.63
2.47
2.35
2.26
2.18
2.11
1.90
1.78
1.70
1.64
1.56
1.51
1.47
1.45
1.42
1.41
1.39
1.32
2.4
Rev. 2002
SECTION 2 STATISTICS
Goal: express a dependant variable ( Y : ( y i )i =1ton ) as a function of one or a series of p independent variables
X j : ( X j = ( x j ,i ) j =1top ,i =1ton ) ).
Y E = b0 + b1 X 1 + .. + b p X p
Y E = estimated dependant variable, X j = independent variables.
We have n observation for each variable.
The method minimizes the deviation E ( Ei ) between the points and the line.
x
(y y )
n
B1=y/x
(E
n
E=Y-YEst
YEst
Y
B0
i=1
i E
i =1
)2
We want to optimize SSR/SSE. Thus we test the hypothesis that the slope B1 equals 0:
H o : B1 = 0 , H 1 : B1 0 .
Under H o , the ratio (SSR/p)/(SSE/(n-p-1)) follows a Fisher distribution with p and n-p-1 degrees of freedom
(excel function FINV (, p, n-p-1)).
If F is high, then H o is rejected and with a certain significance , we assume the regression is significant.
Coefficient of Determination R2
The coefficient of determination R2=SSR/SST gives the proportion of variation in the dependent variable
( Y : ( y i )i =1 ton ) explained by the regression line.
Example
H0: there is no correlation
n=5, p=1, SST=0.051+0.019, MSR=0.051/1=0.051, MSE=0.019/3=0.0063, F=0.051/0063=8.05,
.75
.7
.65
SO3
.6
.55
.5
.45
.4
.35
42
43
44
45
46
47
48
49
50
51
52
CaO
2.5
Rev. 2002
SECTION 2 STATISTICS
The series X (t ) is stationary if its average X (t ) and its variance S 2 (t ) are constant (over time or over the
region of study) and if the covariance COV ( X (t ), X (t' )) does not depend on t and t'
difference (distance) t' t = t (= h ) .
4.2 Variogram
a) Variogram Construction
A variogram is a plot of the average difference of a selected variable (C3S for example) between pairs of units
selected as a function of time, where the pairs are chosen in whole-number multiples (e.g. every minute, 2
minutes, 1 meter, 2 meters, ).
X (h ) =
N
x j x j+h
j =1
with :
- j : numbering of the samples value
- N: number of pairs of sample with a specific time or
spatial distance (=h) between values of a pair.
2 ( N 1)
Example:
The C3S values of kiln feed samples are:
Sample#
Time
C3S (%)
1
1:00
54.2
2
2:00
57.8
3
3:00
59.8
4
4:00
61.2
5
5:00
60.0
6
6:00
56.0
7
7:00
52.0
8
8:00
52.0
9
9:00
52.4
10
10:00
57.0
1
3.6
12.96
2
2
4
3
1.4
1.96
4
-1.2
1.44
5
-4
16
6
-4
16
7
0
0
8
0.4
0.16
9
4.6
21.16
Sum
73.7
73.7
= 4.6
2 ( 9 1)
Two rules for variogram construction
Collect enough units (N) to get a statistical population (at least 30 samples for a short term experiment and 60
samples for a long term); the short term intends to define very precisely the random heterogeneity term (nugget
effect, refer below).
The number N should reach half the total amount of samples collected (N>n/2).
Then C 3 S ( 1 hour ) =
2.6
Rev. 2002
SECTION 2 STATISTICS
b) Variogram Interpretation
Interpretation of the limit of variogram (h) when h increases
Whatever the variable is, beyond a certain value of h, the variable ceases to be
correlated with itself. It is because the phenomenon taking place has no longer
any memory of a past long gone (see case 2 and case 3 where the variable level
off at a sill generally equal to the variance of the variable).
This is true for all raw mix analyses, which are limited in terms of the values they
can take.
However, over a short period of time (a few hours), the signal may well drift.
(See graph below). In such a case, the variogram will tend to increase instead of
Signal is drifting
X(h)
effect"
variance").
called
the
"nugget
2
2
x = xn
x (h)
#1
Nugget effect
t
h
x (h)
2
x
#2
Nugget effect
t
effect
h
x (h)
2
xn
2
x
#3
No nugget effect
t
Limitations in h value
If N values of X are available, shifts of more than N/2 should not be considered.
Regionalization and prediction
A very frequent pattern of variogram is
shown as below:
X (h )
2
x
2
xn
h
Area of ho
regionalization
close to
X
2
2.7
Rev. 2002
Pseudo-periodicity
The periodic variations can be self-sustained (control
cycle, oscillator, etc.) or induced by a periodic
phenomenon (buckets of elevator are unevenly distributed,
correction interval of raw meal).
Even if the periodicity is blurred on the graph of the signal
by random noises or variations of the period, the
variogram will tend to underline.
The variogram will hit a maximum, above the total
variance x2 , for a shift h of exactly 1 period. Maximum
and minimum will repeat themselves and fade away as h
increases. The fading will be quick if the pseudo period
varies much but slow if the signal is truly periodic.
SECTION 2 STATISTICS
x (h)
2
2 x
t
1 Pseudo-Period
x (h)
Periodic signal
2.8
Rev. 2002
SECTION 2 STATISTICS
C3S
SP C3S SP
2:00
64.1
60
4.1
4:00
58.5
60
-1.5
(4.1,-1.5)
6:00
58.9
60
-1.1
(-1.5,-1.1)
8:00
61.7
60
1.7
(-1.1,1.7)
10:00
56.7
58
-1.3
(1.7,-1.3)
(x , y)
4
3
2
1
0
-5
-4
-3
-2
-1
12:00
59.2
58
1.2
(-1.3,1.2)
-1
14:00
54.5
58
-3.5
(1.2,-3.5)
-2
16:00
60.8
58
2.8
(-3.5,2.8)
18:00
55.1
58
-2.9
(2.8,-2.9)
20:00
58.3
58
0.3
(-2.9,0.3)
22:00
59
58
1.0
(0.3,1.0)
-3
-4
SLOPE
-5
=0
Perfectly tuned control. All off-target values for the control parameter are due to random
variations (materials, feeder accuracy, etc.)
1 > slope > 0 Undercontrolling. Multiply gain by (1 + slope).
=1
No control taking place.
>1
Divergent control: gain value has wrong sign.
0 > slope > -1 Overcontrolling. Divide gain by (1 slope).
= -1
Overcontrolling is inducing a cycle with frequency = 2 x sampling interval. Divide gain by 2.
< -1
Divergent cycling due to severe overcontrolling. Divide gain by (1 slope).
The method is applicable to control response analysis in general.
It can be incorporated as an internal tuning device in a control algorithm.
Analyses of non linear control response can be performed by using polynomial fit rather than linear regression.
2.9
Rev. 2002
SECTION 2 STATISTICS
5. Sampling
5.1 Golden Rules
The MRW.
The sampling method must allow every particle the same chance of being collected.
C = fcl g with
f = Particle shape factor. (= 0.5 usually, ranges between 0 and 1)
= 1 when cubic, = 0.2 when flat, = 0.5 when spheroidal
l = liberation factor [0 to 1]
= 0 if homogeneous, = 1 if particles completely distinct, = .001 for homogeneous raw mix, = .2
medium, = .3-8 heterogeneous
g = factor describing the particle size distribution
If we call size range the ratio d M / d m of the upper size limit d M : (about 5% oversize) to the lower size
limit d m : (about 5% undersize):
Large size range ( d M / d m > 4): g = 0.25, medium size range (4 to 2): g = 0.50, small size range (< 2): g
= 0.75, uniform size ( d M / d m = 1): g = 1.00
p
i
1 ai
. i
ai
ai + (1 ai ) ic
With:
- pi = proportion of material I in the mix (%)
- a i = concentration of the critical within the material I (%) in mass ( g of CaO / g of solid )
-
pi =
2.10
Rev. 2002
SECTION 2 STATISTICS
Example:
Mix is crushed at 12.5 mm of 75% lime and 25% clay, CaO is the critical
Sample weight = 50 kg.
f = 0.5
l = 0.3
lime
content = 52%, CaO clay content = 24%
CaO
1 0.52
x 2.7
0.52
1 0.24
x 2.7 =
0.24
+ 0.25 x
(1.25 )3 x 0.15
And:
(FE ) =
MRW = 18. f . .
d3
( FE ) 2
Passing 95% = d M = 4.0 cm with ( FE ) = 0.04 (be careful, it is a relative standard deviation), then,
MRW =
3
C .d M
( FE ) 2
0.15 x 4 3
(0.04 )2
= 6 kg
Assuming we want to sample a maximum of 5 kg sample with a tolerate standard deviation of = 0.04
Then:
3
dM
M 2
C
d M =3
5000 x 0.04 2
= 3.8 cm
0.15
a) Rule of Thumb:
Maximum Particle Size (mm)
Min sample Coal (ISO1988), kg
Min Sample Aggregate, ASTM D75, kg
10
0.6
10
20
0.8
25
30
40
60
80
50
3
100
60
75
90
120
150
175
2.11
Rev. 2002
SECTION 2 STATISTICS
Once the right size (MRW) of the sample is calculated, we want to determine how many samples (n) have to
be collected to have the acceptable knowledge (precision P) of the parameter ( X ) we are interested in, with an
afforded risk .
The larger the sample size, the closer we can expect the sample mean X to be to the population mean X .
Refer to the Central Limit Theorem above. The reliability of X as an estimate of X is measured by the
standard error of the mean which is simply the standard deviation of the sample mean.
Rule of thumb: n =
2X
2
where:
-
2X is the variance of the material stream and, 2 is the variance of the mean (the variability desired in
X
the result).
Remark
Each sample must have the MRW in order to have a right observation of the parameter that we want to have
estimated.
Example
The small-scale random heterogeneity of the raw mix, expressed in C3S variance, at mill outpout is 10, thus
2X =10.
We would like to decrease this random heterogeneity to 2, thus 2 =2,
X
Then to achieve this goal we have to sample 10/2=5 increments. Normally they have to be collected closely to
one another (e.g. 30 second interval).
2.12
Rev. 2002
SECTION 2 STATISTICS
2.13
Rev. 2002