Statistics Ocenanography
Statistics Ocenanography
Statistics Ocenanography
George Casella
Cornell University
ABSTRACT
Both frequentist and Bayesian methodologies provides means for a
statistical solution to a problem.
Using a number
Ve argue that
Sherlock Holmes
The Adventure of Visteria Lodge
1.
INTRODUCTION
An alternate title for this paper might well be "Conditional and
In contrast, a
paper was presented at the 'Aha Huliko'a Winter Workshop on "Probability Concepts
in Physical Oceanography," January 12-15, 1993, Honolulu, Hawaii, and is technical report
BU-1187-M, in the Biometrics Unit, Cornell University. This research was supported by
National Science Foundation Grant No. DMS9100839 and National Security Agency Grant
No. 90F-073.
statistical view has a lot to offer, and, depending on the problem, one
methodology is probably more appropriate.
examples.
A second goal of this paper is to try to explain to the oceanographic
community how a statistician approaches a problem.
task of dealing with the ever-increasing data bases can be made a little
easier.
The remainder of the paper is arranged as follows.
In Section 2 we
2.
3.
4.
5.
6.
lmple~rent
the Solution
Vill
By answering
By
some differences in the approaches, with the major difference being in the
modeling and inference stages.
(a)
10
36
83
(b)
130 68
18
13
Nov Dec
25
13
386
51
For our example, we will look at the question of whether the yearly
distribution of icebergs is the same in each location.
A glance at
Figure 1 will show that such a hypothesis is very likely, but for
illustration we will step through both a Bayesian and frequentist
approach to the problem.
n0 :
To test this as a
The
probability of every data table vith the given marginal totals, using a
hypergeometric distribution.
inference.)
Ve now can clearly see the distinction betveen Bayesian and
frequentist inferences.
interpretation.
Ve now look at
To
Ve know
Thus, no
with the information in the data, are combined into the posterior
distribution.
Of
inferences.
More precisely, suppose there are data, X, which vary according to a
probability distribution f(xiO), a distribution indexed by an unknown
parameter 0.
a prior distribution
~(0).
(In
Thus the
equation "X= x" means that we have observed the value x of the random
variable X.)
g(Oix) =
95%."
Rather,
Returning to
As long
This concern
Thus, a
The conclusions
we again use a standard linear regression model with Gaussian errors, but
centering the prior at the hypothesized value gives equal prior weight
above and below the value, and may be considered an impartial prior
specification.)
Combining our prior specification with the observed data, we
calculate
Pr(b~4ldata)
= .999 and
Pr(b~3ldata)
= .623.
The
(The standard
deviation of the data is .082, and the graph shows the prior standard
deviation up to twice this value.)
where
empirical
Velocity
10
.666
-.084
.011
-.076
11
.924
-.040
.013
-.042
12
1.594
-.080
.008
-.073
13
1.669
- .050
.017
-.050
14
1.698
- .031
.029
-.035
15
1.635
- .0009
.027
- .011
intercept
slope
std. dev.
Bayes slope
allows the data to assess the tenability of the submodel, that the bu' s
come from a common population.
convex combination of the common overall slope (-.048) and the individual
least squares slopes, given by
empirical Bayes= ( 221 )( _ 048 )
slope
+ (. 779 )(least
squares).
slope
'l'he weighting factor .221 (and 779 = 1- .221) are data based estimates.
The empirical Bayes slope estimates are valid under the model of
frequentist repeatability.
Thus, on the
average, the empirical Bayes estimates will be closer to the true values
.than the standard frequentist estimates.
Although
they are not very different from the standard frequentist lines, they do
display a movement toward the common slope value.
analysis has uncovered a small amount of common structure, and has used
this in improving each of the estimates.
7. CONCLUSIONS
The statistical methodology to be used, whether Bayesian or
frequentist, should be selected according to the type of inference that
is desired (and is appropriate).
10
The
methodologies are not at odds with one another, they are complementary to
one another.
best.
~opportunism"
is
These
(See Berger
Such
New York:
Casella, G. (1985):
Springer-Verlag.
Casella, G. (1992):
11
Defant, A. (1961):
New York:
Pergamon
Press.
Fisher, R.A. (1970):
New York:
(1990):
28.
12
Figure 1:
0.3
0.25
0'
&I.
0.2
Gi
0.15
0::
0.1
:0.05
0
1
-3
5-
7
Month
13
10
11
12
Figure 2:
n8 ,
as a function of RMS
Breaking Waves
9
E6
-..
CJ
.c
Cl 5
Gi
:::E:
'i
~
cu
~ 3
+-----+-----+-----+-----+-----+-----+-----+-----+---~
0.89
1.14
1.34
1.5
1.76
2.02
2.41
14
2.57
2.92
3.02
Figure 3:
('
........
Q)
f-
Oro
_o .
1-
~0
-+--' .
_o
/"'
QO
---- --
o_
Q["-..
-
1-
LO
Q)
-+--'
(f)
o_~
I
I
I
.
0
i.{)
0.02
0.04
0.06
0.08
0.10
0.12
0.14
15
0.16
Figure 4:
The six
groups are each at a different wind velocity, from 10 to 15
m/s in steps of 1.
'
viewing.
Bubble Populations
1.8
1.6
1.4
C ")
1.2
E
u
a.
CD
0.8
0
Q.
25
.g
0.6
0.4
0.2
0
4
.6
16
7
Depth (em)
10
Figure 5:
The
mfs.
mfs,
and
The lines
mfs.
Bubble Populations
1.8
/1.
0.4
0.2
~:::.----
~---:...::__-.--
<>
- - - ---
<>
--~---
---
0+-------~-------+--------r---~--~----~~------~
7
Depth(cm)
17
10