THE CHALLENGE OF DEVELOPING STATISTICAL LITERACY,
REASONING AND THINKING
The Challenge of Developing
Statistical Literacy, Reasoning
and Thinking
Edited by
Dani Ben-Zvi
University of Haifa,
Haifa, Israel
and
Joan Garfield
University of Minnesota,
Minneapolis, U.S.A.
KLUWER ACADEMIC PUBLISHERS
NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN:
Print ISBN:
1-4020-2278-6
1-4020-2277-8
©2005 Springer Science + Business Media, Inc.
Print ©2004 Kluwer Academic Publishers
Dordrecht
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Springer's eBookstore at:
and the Springer Global Website Online at:
http://ebooks.springerlink.com
http://www.springeronline.com
Contents
List of Authors
vii
Foreword
ix
David S. Moore
xi
Preface
Dani Ben-Zvi and Joan Garfield
PART I: INTRODUCTION TO STATISTICAL LITERACY,
REASONING, AND THINKING
1
1.
Statistical Literacy, Reasoning, and Thinking:
Goals, Definitions, and Challenges
Dani Ben-Zvi and Joan Garfield
3
2.
Towards an Understanding of Statistical Thinking
Maxine Pfannkuch and Chris Wild
17
3.
Statistical Literacy: Meanings, Components, Responsibilities
Iddo Gal
47
4.
A Comparison of Mathematical and Statistical Reasoning
Robert C. delMas
79
5.
Models of Development in Statistical Reasoning
Graham A. Jones, Cynthia W. Langrall, Edward S.
Mooney, and Carol A. Thornton
97
v
TABLE OF CONTENTS
vi
PART II: STUDIES OF STATISTICAL REASONING
119
6.
Reasoning about Data Analysis
Dani Ben-Zvi
121
7.
Learning to Reason about Distribution
Arthur Bakker and Koeno P. E. Gravemeijer
147
8.
Conceptualizing an Average as a Stable Feature of a Noisy Process
Clifford Konold and Alexander Pollatsek
169
9.
Reasoning about Variation
Chris Reading and J. Michael Shaughnessy
201
10.
Reasoning about Covariation
Jonathan Moritz
227
11.
Students’ Reasoning about the Normal Distribution
Carmen Batanero, Liliana Mabel Tauber,
and Victoria Sánchez
257
12.
Developing Reasoning about Samples
Jane M. Watson
277
13.
Reasoning about Sampling Distributions
Beth Chance, Robert delMas, and Joan Garfield
295
PART III: INSTRUCTIONAL, CURRICULAR AND RESEARCH
ISSUES
325
14.
Primary Teachers’ Statistical Reasoning about Data
William T. Mickelson and Ruth M. Heaton
327
15.
Secondary Teachers’ Statistical Reasoning in Comparing Two
Groups
Katie Makar and Jere Confrey
353
16.
Principles of Instructional Design for Supporting the Development
of Students’ Statistical Reasoning
Paul Cobb and Kay McClain
375
17.
Research on Statistical Literacy, Reasoning, and Thinking:
Issues, Challenges, and Implications
Joan Garfield and Dani Ben-Zvi
397
Author Index
411
Subject Index
419
List of Authors
Bakker, Arthur
Gal, Iddo
Freudenthal Institute, Utrecht University,
the Netherlands
A.Bakker@fi.uu.nl
University of Haifa, Israel
iddo@research.haifa.ac.il
Batanero, Carmen
Garfield, Joan
Universidad de Granada, Spain
batanero@ugr.es
University of Minnesota, USA
jbg@umn.edu
Ben-Zvi, Dani
Gravemeijer, Koeno P. E.
University of Haifa, Israel
dbenzvi@univ.haifa.ac.il
Freudenthal Institute, Utrecht University,
the Netherlands
K.Gravemeijer@fi.uu.nl
Chance, Beth
Heaton, Ruth M.
California Polytechnic State University,
USA
bchance@calpoly.edu
University of Nebraska-Lincoln, USA
Rheaton1@unl.edu
Cobb, Paul
Jones, Graham A.
Vanderbilt University, USA
paul.cobb@vanderbilt.edu
Griffith University, Gold Coast Campus,
Australia
g.jones@griffith.edu.au
Confrey, Jere
Konold, Clifford
Washington University at St. Louis, USA
jconfrey@wustl.edu
University of Massachusetts, Amherst, USA
konold@srri.umass.edu
delMas, Robert C.
Langrall, Cynthia W.
University of Minnesota, USA
delma001@umn.edu
Illinois State University, USA
langrall@ilstu.edu
vii
viii
LIST OF AUTHORS
Makar, Katie
Reading, Chris
University of Texas at Austin, USA
kmakar@mail.utexas.edu
University of New England, Australia
creading@metz.une.edu.au
McClain, Kay
Sánchez, Victoria
Vanderbilt University, USA
kay.mcclain@vanderbilt.edu
Universidad de Sevilla, Spain
mvsanchez@cica.es
Mickelson, William T.
Shaughnessy, J. Michael
University of Nebraska-Lincoln, USA
wmickelson2@unl.edu
Portland State University, USA
mike@mth.pdx.edu
Mooney, Edward S.
Tauber, Liliana Mabel
Illinois State University, USA
mooney@ilstu.edu
Universidad Nacional del Litoral, Santa Fe,
Argentina
lilianatauber@gigared.com
Moore, David S.
Thornton, Carol A.
Purdue University, USA
dsmoore@stat.purdue.edu
Illinois State University, USA
thornton@ilstu.edu
Moritz, Jonathan
Watson, Jane M.
University of Tasmania, Australia
jonathan.moritz@utas.edu.au
University of Tasmania, Australia
Jane.Watson@utas.edu.au
Pfannkuch, Maxine
Wild, Chris
The University of Auckland, New Zealand
m.pfannkuch@auckland.ac.nz
The University of Auckland, New Zealand
c.wild@auckland.ac.nz
Pollatsek, Alexander
University of Massachusetts, Amherst, USA
pollatsek@psych.umass.edu
Foreword
David S. Moore
Purdue University
This
unique book is very welcome, for at least three reasons. The first is that
teachers of statistics have much to learn from those whose primary expertise is the
study of learning. We statisticians tend to insist that we teach first of all from the
base of our knowledge of statistics, and this is true. Teachers at all levels must
understand their subject matter, and at a depth at least somewhat greater than that of
the content they actually teach. But teachers must also understand how students
learn, be aware of specific difficulties, and consider means to guide students toward
understanding. Unaided, we gain skill intuitively, by observing our own teachers
and by experience. Teachers below the university level receive specific instruction
in teaching—this is, after all, their profession—and this book will improve that
instruction where statistics is concerned. Teachers at the university level consider
themselves first of all mathematicians or psychologists or statisticians and are
sometimes slow to welcome pedagogical wisdom from other sources. This is folly,
though a folly typical of professionals everywhere. I have learned a great deal from
some of the editors and authors of this book in the past, and yet more from reading
this volume.
Second, this book is timely because data-oriented statistics has at last moved into
the mainstream of mathematics instruction. In the United States, working with data
is now an accepted strand in school mathematics curricula, a popular uppersecondary Advanced Placement syllabus adds a full treatment of inference, and
enrollment in university statistics courses continues to increase. (Indeed, statistics is
almost the only subject taught by university mathematics departments that is
growing.) Similar trends, particularly in school mathematics, are evident in other
nations. The title of this volume, with its emphasis on “statistical literacy, reasoning,
and thinking” reflects the acceptance of statistics as a mainstream subject rather than
ix
x
FORE WORD
a technical specialty. If some degree of statistical literacy is now part of the
equipment of all educated people, then more teachers, and teachers of more varied
backgrounds, must be prepared to help students learn to think statistically. Here at
last is a single source that can inform our preparation.
Finally, statisticians in particular should welcome this book because it is based
on the recognition that statistics, while it is a mathematical science, is not a subfield
of mathematics. Statistics applies mathematical tools to illuminate its own subject
matter. There are core statistical ideas—think of strategies for exploratory data
analysis and the distinction between observational and experimental studies with the
related issue of establishing causation—that are not mathematical in nature.
Speaking broadly, as long as “statistics education” as a professional field was
considered a subfield of “mathematics education,” it was in fact primarily the study
of learning probability ideas. Understanding that statistics is not just mathematics is
giving rise to a new field of study, closely related to mathematics education but not
identical to it. The editors and authors of this volume are leaders in this new field. It
is striking that the chapters in this book concern reasoning about data more than
about chance. Data analysis, variation in data and its description by distributions,
sampling, and the difficult notion of a sampling distribution are among the topics
receiving detailed study.
It is not often that a book serves to synthesize an emerging field of study while
at the same time meeting clear practical needs. I am confident that The Challenge of
Developing Statistical Literacy, Reasoning, and Thinking will be seen as a classic.
Preface
Over
the past decade there has been an increasingly strong call for statistics
education to focus on statistical literacy, statistical reasoning, and statistical
thinking. Our goal in creating this book is to provide a useful resource for educators
and researchers interested in helping students at all educational levels to develop
these cognitive processes and learning outcomes. This book includes cutting-edge
research on teaching and learning statistics, along with specific pedagogical
implications. We designed the book for academic audiences interested in statistics
education as well as for teachers, curriculum writers, and technology developers.
The events leading to the writing of this book began at the Fifth International
Conferences on Teaching Statistics (ICOTS-5), held in 1998 in Singapore. We
realized then that there are no consistent definitions for the often stated learning
goals of statistical reasoning, thinking, and literacy. In light of the rapid growth of
statistics education at all levels, and the increasing use of these terms, we realized
that it was important to clearly define and distinguish between them in order to
facilitate communication as well as the development of instructional materials and
educational research.
A small, focused conference bringing together an international group of
researchers interested in these topics appeared to be an important next step in
clarifying the terms, connecting researchers working in this area, and identifying
ways to move the field forward together. The first International Research Forum on
Statistical Reasoning, Thinking, and Literacy (SRTL-1) was held in Israel in 1999 to
address these needs. Due to the success of SRTL-1 and the strong feeling that this
type of forum should be repeated, SRTL-2 was held in 2001 in Australia, this time
with a focus on different types of statistical reasoning. Many of the papers from
these first two forums have led to chapters in this book. The forums continue to be
offered every two years, with SRTL-3 held in the USA in 2003, as interest and
research in this area steadily increase.
xi
PREFACE
xii
To get the most out of this book, readers may find the following points useful:
•
•
•
•
Chapter 1 may be a good starting point. It offers preliminary definitions and
distinctions for statistical literacy, reasoning, and thinking. It also describes
some of the unique issues addressed by each chapter to help readers in their
journey within the book.
The first part of this book (Chapters 2 through 5) is a comprehensive
overview of statistical literacy, reasoning, and thinking from historical,
psychological, and educational perspectives. In addition, cognitive models of
development in statistical reasoning are examined. Readers who wish to
examine the theoretical foundations upon which the individual studies in
subsequent parts are based are referred to these chapters.
Many chapters that focus on a particular type of statistical reasoning follow a
unified structure, starting with a description of the type of reasoning studied
and ending with key practical implications related to instruction, assessment,
and research. Readers can examine these sections to quickly determine the
chapter’s contents.
The closing chapter (Chapter 17) describes the current state of statistics
education research and its uniqueness as a discipline. It offers a summary of
issues and challenges raised by chapters in this book and presents
implications for teaching and assessing students.
The seventeen chapters in this volume by no means exhaust all issues related to
the development of statistical literacy, reasoning, and thinking. Yet, taken as a
whole, the chapters constitute a rich resource summarizing current research, theory,
and practical suggestions related to these topics. We hope that this volume will
contribute to and stimulate the scholarly discourse within the statistics education
community, and that in coming years additional publications will more closely
examine the many issues and challenges raised.
A project of this magnitude would have been impossible without the help of
numerous individuals and organizations. First and most importantly, we would like
to thank our many contributors, who remained focused on the goal of sharing their
experiences and insights with the educational community while enduring multiple
review cycles and editing demands. Their enthusiasm, support, and friendship are
valuable to us and have made this long process easier to complete.
Many thanks go to Kibbutz Be’eri (Israel), the University of New England
(Australia), and the University of Nebraska–Lincoln (USA) for hosting and
supporting the three SRTL Research Forums in 1999, 2001, and 2003. These
meetings, which we co-chaired, informed our work as well as the writings of some
of many contributors to this volume. In addition, numerous organizations and
institutions helped sponsor these forums: the University of Minnesota (USA), the
Weizmann Institute of Science (Israel), the International Association on Statistics
Education (IASE), the American Statistical Association (ASA) Section on Statistics
Education, and Vanderbilt University. This funding has been pivotal in enabling us
to sustain our extended effort through the years it took to complete this project.
PREFACE
xiii
We are grateful to Kluwer Academic Publishers for providing a publishing
venue for this book, and to Michel Lokhorst, the humanities and social sciences
publishing editor, who skillfully managed the publication on their behalf. We
appreciate the support received from the University of Minnesota (USA) and the
University of Haifa (Israel) for copyediting and formatting this volume. We are
especially grateful for the contributions of our copy editor, Christianne Thillen, as
well as Ann Ooms and Michelle Everson, who under a tight production schedule
diligently and ably worked to prepare this book for publication.
Lastly, many thanks go to our spouses, Hava Ben-Zvi and Michael Luxenberg,
and to our children—Noa, Nir, Dagan, and Michal Ben-Zvi, and Harlan and
Rebecca Luxenberg. They have been our primary sources of energy and support.
Dani Ben-Zvi1 and Joan Garfield2
University of Haifa, Israel1 and University of Minnesota, USA2
PART I
INTRODUCTION TO STATISTICAL
LITERACY, REASONING, AND THINKING
Chapter 1
STATISTICAL LITERACY, REASONING, AND
THINKING: GOALS, DEFINITIONS, AND
CHALLENGES
Dani Ben-Zvi1 and Joan Garfield2
University of Haifa, Israel1; University of Minnesota, USA2
INTRODUCTION
Over the past decade there has been an increasingly strong call for statistics
education to focus more on statistical literacy, reasoning, and thinking. One of the
main arguments presented is that traditional approaches to teaching statistics focus
on skills, procedures, and computations, which do not lead students to reason or
think statistically. This book explores the challenge posed to educators at all
levels—how to develop the desired learning goals for students by focusing on
current research studies that examine the nature and development of statistical
literacy, reasoning, and thinking. We begin this introductory chapter with an
overview of the reform movement in statistics education that has led to the focus on
these learning outcomes. Next, we offer some preliminary definitions and
distinctions for these often poorly defined and overlapping terms. We then describe
some of the unique issues addressed by each chapter and conclude with some
summary comments and implications.
THE GROWING IMPORTANCE OF STATISTICS IN TODAY’S WORLD
Quantitative information is everywhere, and statistics are increasingly presented
as a way to add credibility to advertisements, arguments, or advice. Being able to
properly evaluate evidence (data) and claims based on data is an important skill that
all students should learn as part of their educational programs. The study of statistics
provides tools that informed citizens need in order to react intelligently to
quantitative information in the world around them. Yet many research studies
indicate that adults in mainstream society cannot think statistically about important
issues that affect their lives.
3
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 3–15.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
4
DANI BEN-ZVI AND JOAN GARFIELD
As former president of the American Statistical Association, David Moore
(1990) wrote, “Statistics has some claim to being a fundamental method of inquiry,
a general way of thinking that is more important than any of the specific techniques
that make up the discipline” (p. 134). It is not surprising, given the importance of
statistics, that there has been an increase in the amount of statistical content included
in the elementary and secondary mathematics curriculum (NCTM, 2000) and an
ever-increasing number of introductory statistics courses taught at the college level.
THE CHALLENGE OF TEACHING STATISTICS
Despite the increasing need for statistics instruction, historically statistics
education has been viewed by many students as difficult and unpleasant to learn,
and by many instructors as frustrating and unrewarding to teach. As more and more
students enroll in introductory statistics courses, instructors are faced with many
challenges in helping these students succeed in the course and learn statistics. Some
of these challenges include
•
•
•
•
Many statistical ideas and rules are complex, difficult, and/or
counterintuitive. It is difficult to motivate students to engage in the hard
work of learning statistics.
Many students have difficulty with the underlying mathematics (such as
fractions, decimals, algebraic formulas), and that interferes with learning the
related statistical content.
The context in many statistical problems may mislead the students, causing
them to rely on their experiences and often faulty intuitions to produce an
answer, rather than select an appropriate statistical procedure.
Students equate statistics with mathematics and expect the focus to be on
numbers, computations, formulas, and one right answer. They are
uncomfortable with the messiness of data, the different possible
interpretations based on different assumptions, and the extensive use of
writing and communication skills.
Amidst the challenges of dealing with students’ poor mathematics skills, low
motivation to learn a difficult subject, expectations about what the course should be,
and reliance on faulty intuitions and misconceptions, many instructors strive to
enable students to develop statistical literacy, reasoning, and thinking. There appears
to be a consensus that these are the most important goals for students enrolled in
statistics classes, and that these goals are not currently being achieved. The
dissatisfaction with students’ ability to think and reason statistically, even after
formally studying statistics at the college and graduate level, has led to a
reexamination of the field of statistics.
GOALS, DEFINITIONS, AND CHALLENGES
5
EFFORTS TO CHANGE THE TEACHING OF STATISTICS
Today’s leading statisticians see statistics as a distinct discipline, and one that is
separate from mathematics (see Chapter 4). Some suggest that statistics should in
fact be considered one of the liberal arts (e.g., Moore, 1998). The liberal arts image
emphasizes that statistics involves distinctive and powerful ways of thinking:
“Statistics is a general intellectual method that applies wherever data, variation, and
chance appear. It is a fundamental method because data, variation, and chance are
omnipresent in modern life. It is an independent discipline with its own core ideas
rather than, for example, a branch of mathematics” (Moore, 1998, p. 1254).
As the discipline has evolved and become more distinct, changes have been
called for in the teaching of statistics. Dissatisfaction with the introductory college
course has led to a reform movement that includes focusing statistics instruction
more on data and less on theory (Cobb, 1992). Moore (1997) describes the reform in
terms of changes in content (more data analysis, less probability), pedagogy (fewer
lectures, more active learning), and technology (for data analysis and simulations).
At the elementary and secondary level, there is an effort to help students develop
an understanding and familiarity with data analysis (see Chapter 6) rather than
teaching them a set of separate skills and procedures. New K–12 curricular
programs set ambitious goals for statistics education, including developing students’
statistical reasoning and understanding (e.g., Australia—Australian Education
Council, 1991, 1994; England—Department for Education and Employment, 1999;
New Zealand—Ministry of Education, 1992; USA—National Council of Teachers
for Mathematics, 2000; and Project 2061’s Benchmarks for Science Literacy,
American Association for the Advancement of Science, 1993).
Several factors have led to these current efforts to change the teaching of
statistics at all educational levels. These factors include
•
•
•
•
Changes in the field of statistics, including new techniques of data
exploration
Changes and increases in the use of technology in the practice of statistics,
and its growing availability in schools and at home
Increased awareness of students’ inability to think or reason statistically,
despite good performance in statistics courses
Concerns about the preparation of teachers of statistics at the K–12 and
college level, many of whom have never studied applied statistics nor
engaged in data analysis activities.
Many recommendations have been given for how statistics courses should be
taught, as part of the general reform movement. Some of these recommendations are
as follows:
•
•
•
Incorporate more data and concepts.
Rely heavily on real (not merely realistic) data.
Focus on developing statistical literacy, reasoning, and thinking.
6
DANI BEN-ZVI AND JOAN GARFIELD
•
•
•
•
Wherever possible, automate computations and graphics by relying on
technological tools.
Foster active learning, through various alternatives to lecturing.
Encourage a broader range of attitudes, including appreciation of the power
of statistical processes, chance, randomness, and investigative rigor, and a
propensity to become a critical evaluator of statistical claims.
Use alternative assessment methods to better understand and document
student learning.
There appears to have been some impact on teaching practices from these
recommendations
at
the
college
level
(Garfield,
Hogg,
Schau,
& Whittinghill, 2002). However, despite reform efforts, many statistics courses still
teach the same progression of content and emphasize the development of skills and
procedures. Although students and instructors appear to be happier with reformed
courses, many students still leave the course perceiving statistics as a set of tools and
techniques that are soon forgotten. Pfannkuch and Wild (Chapter 2) discuss how
current methods of teaching have often focused on the development of skills and
have failed to instill the ability to think statistically.
STATISTICAL LITERACY, REASONING, AND THINKING:
DEFINITIONS AND DISTINCTIONS
It is apparent, when reading articles about recommendations to reform the
teaching of statistics, that there are no consistent definitions for the often stated
learning goals of literacy, reasoning, and thinking. Statistical literacy is used
interchangeably with quantitative literacy, while statistical thinking and reasoning
are used to define the same capabilities. This confusion of terms was especially
evident at the Fifth International Conference on Teaching Statistics, held in
Singapore in 1998. It became apparent that when statistics educators or researchers
talk about or assess statistical reasoning, thinking, or literacy, they may all be using
different definitions and understandings of these cognitive processes.
The similarities and differences among these processes are important to consider
when formulating learning goals for students, designing instructional activities, and
evaluating learning by using appropriate assessment instruments. A small, focused
conference consisting of researchers interested in these topics appeared to be an
important next step in clarifying the issues, connecting researchers and their studies,
and generating some common definitions, goals, and assessment procedures. The
first International Research Forum on Statistical Reasoning, Thinking, and Literacy
(SRTL-1) was held in Israel in 1999 to address these needs. At this first conference
some preliminary definitions were presented and discussed. A second forum (SRTL2) was held in 2001 in Australia, with a focus on different types of statistical
reasoning. Many of the papers from these first two forums have led to chapters in
GOALS, DEFINITIONS, AND CHALLENGES
7
this book. The forums continue to be offered every two years (SRTL-3 in USA,
2003) as interest and research in this area steadily increase.
Although no formal agreement has been made regarding the definitions and
distinctions of statistical literacy, reasoning, and thinking, the following list
summarizes our current thoughts (Garfield, delMas, & Chance, 2003):
•
•
•
Statistical literacy includes basic and important skills that may be used in
understanding statistical information or research results. These skills include
being able to organize data, construct and display tables, and work with
different representations of data. Statistical literacy also includes an
understanding of concepts, vocabulary, and symbols, and includes an
understanding of probability as a measure of uncertainty.
Statistical reasoning may be defined as the way people reason with
statistical ideas and make sense of statistical information. This involves
making interpretations based on sets of data, representations of data, or
statistical summaries of data. Statistical reasoning may involve connecting
one concept to another (e.g., center and spread), or it may combine ideas
about data and chance. Reasoning means understanding and being able to
explain statistical processes and being able to fully interpret statistical
results.
Statistical thinking involves an understanding of why and how statistical
investigations are conducted and the “big ideas” that underlie statistical
investigations. These ideas include the omnipresent nature of variation and
when and how to use appropriate methods of data analysis such as numerical
summaries and visual displays of data. Statistical thinking involves an
understanding of the nature of sampling, how we make inferences from
samples to populations, and why designed experiments are needed in order
to establish causation. It includes an understanding of how models are used
to simulate random phenomena, how data are produced to estimate
probabilities, and how, when, and why existing inferential tools can be used
to aid an investigative process. Statistical thinking also includes being able to
understand and utilize the context of a problem in forming investigations and
drawing conclusions, and recognizing and understanding the entire process
(from question posing to data collection to choosing analyses to testing
assumptions, etc.). Finally, statistical thinkers are able to critique and
evaluate results of a problem solved or a statistical study.
For more discussion of these definitions and distinction, see papers by Chance
(2002), delMas (2002), Garfield (2002), Rumsey (2002), and Chapters 2 through 4
in this book.
8
DANI BEN-ZVI AND JOAN GARFIELD
RATIONALE AND GOALS FOR THIS BOOK
With the increasing attention given to the need to develop students’ statistical
literacy, reasoning, and thinking at all levels, it has become apparent that these
educational outcomes were not being adequately addressed in the research literature
and, therefore, not used as the foundation for curricular programs. In fact, research
studies on statistical reasoning are still evolving, and are just beginning to suggest
ways to help students develop these outcomes.
Our goal in creating this book is to provide a useful resource for educators and
researchers interested in helping students at all educational levels to develop
statistical literacy, statistical reasoning, and statistical thinking. Given the increased
attention being paid worldwide to the need for statistically literate citizens, the broad
inclusion of statistics in the K–12 mathematics curriculum, the increasing numbers
of students taking statistics at the secondary level (e.g., Advanced Placement
Statistics courses in high school in the USA), and the increasing numbers of students
required to take introductory statistics courses in postsecondary programs, it is
crucial that the cutting-edge research being conducted on teaching and learning
statistics be collected and disseminated along with specific pedagogical
implications.
This book offers a synthesis of an emerging field of study, while at the same
time responding to clear practical needs in the following ways:
•
•
•
It establishes a research base for statistics education by focusing on and
distinguishing between different outcomes of statistics instruction.
It raises awareness of unique issues related to teaching and learning
statistics, and it distinguishes statistical literacy, reasoning, and thinking
from both general and mathematical literacy, reasoning, and thinking.
It provides a bridge between educational research and practice, by offering
research-based guidelines and suggestions to educators and researchers.
Although the word statistics is often used to represent both probability and
statistical analysis, the authors and editors of this book focus on reasoning and
thinking exclusively on the statistical analysis area, rather than on probability.
Although statistics as a discipline uses mathematics and probability, probability is
actually a field of mathematics. Since most of the early work in statistics education
focused on the teaching and learning of probability, we wanted to move away and
look at how students come to reason and think about data and data analysis.
However, because the two subjects are so interrelated, several chapters mention
issues related to learning probability as they relate to the focus of a particular
chapter.
GOALS, DEFINITIONS, AND CHALLENGES
9
AUDIENCE FOR THIS BOOK
This book was designed to appeal to a diverse group of readers. The primary
audience for this book is current or future researchers in statistics education (e.g.,
graduate students). However, we encourage others who do not identify themselves
as researchers to read the chapters in this book as a way to understand the current
issues and challenges in teaching and learning statistics. By asking authors to
specifically address implications for teaching and assessing students, we hope that
teachers of students at all levels will find the research results directly applicable to
working with students.
SUGGESTED WAYS TO USE THIS BOOK
Given the different audiences for this book, we suggest several different ways to
use this book for researchers, teachers, curriculum writers, and technology
developers.
•
•
•
Researchers Each chapter includes a detailed review of the literature related
to a particular topic (e.g., reasoning about variability, statistical literacy,
statistics teachers’ development), which will be helpful to researchers
studying one of these areas. The chapters also provide examples of current
research methodologies used in this area, and present implications for
teaching practice as well as suggestions for future research studies. By
providing cutting-edge research on statistical literacy, reasoning, and
thinking, the book as a whole outlines the state of the art for the statistics
education research community. In addition, the contributing authors may be
regarded as useful human resources for researchers who are interested in
pursuing studies in these areas.
Curriculum writers By reading this book, people designing statistics
instructional activities and curricula may learn about current research results
in statistics education. Curriculum development involves tightly integrated
cycles of reviewing related research, instructional design, and analysis of
students’ learning, which all feed back to inform the revision of the design.
Many chapters in this book also give recommendations for appropriate ways
to assess learning outcomes.
Technology Many chapters in this book offer discussion on the role of
technology in developing statistical reasoning. Types of technologies used
are presented and assessed in relation to their impact on students’ reasoning.
Given the different uses just listed, we believe that this book can be used in a
variety of graduate courses. Such courses include those preparing mathematics
teachers at the K–12 level; courses preparing teachers of statistics at the high
10
DANI BEN-ZVI AND JOAN GARFIELD
secondary and tertiary level; and research seminars in mathematics, statistics
education, or psychology.
We advise readers focused on students at one level (e.g., secondary) not to skip
over chapters describing students at other levels. We are convinced that students
who are introduced to statistical ideas and procedures learn much the same material
and concepts (e.g., creating graphical displays of data, describing the center and
dispersion of data, inference from data, etc.) regardless of their grade level.
Furthermore, reasoning processes develop along extended periods of time,
beginning at early encounters with data in elementary grades and continuing through
high school and postsecondary education. Therefore, we believe that discussions of
reasoning issues couched in the reality of one age group will be of interest to those
working with students of other ages and abilities.
OVERVIEW OF CHAPTERS
All of the chapters in this book discuss issues pertaining to statistical literacy,
reasoning, or thinking. Some chapters focus on general topics (e.g., statistical
literacy) while others focus on the context of a specific educational level or setting
(e.g., teaching middle school students to reason about distribution). Whenever
possible, the chapter authors outline challenges facing educators, statisticians, and
other stakeholders. The chapters present many examples (or references to resources)
of activities, data sets, and assessment tasks suitable for a range of instructional
levels. This emphasis of connection to practice is a result of our strong belief that
researchers are responsible for translating their findings to practical settings.
All the chapters that focus on a particular type of student or teacher statistical
reasoning (Chapters 6 through 15) follow a unified and familiar structure to
facilitate their effective use by the readers. These chapters typically start with a
section introducing the key area of reasoning explored in the chapter. This is
followed by clear and informative descriptions of the problem (a description of the
type of reasoning studied, why it is important, and how this type of reasoning fits
into the curriculum); literature and background (prior and related work and relevant
theoretical background); methods (the subjects, methods used, data gathered, and
activities or interventions used); analysis and results (description of how the data
were analyzed, and the results and findings of the study); and discussion (lessons
learned from the study, new questions raised, limitations found). Finally, in the
implications section, each chapter highlights key practical implications related to
teaching and assessing students as well as implications for research.
The chapters have been grouped into three parts, each of which is summarized
here.
GOALS, DEFINITIONS, AND CHALLENGES
11
Part I. Introduction to Statistical Literacy, Reasoning, and Thinking
(Chapters 2 through 5)
The first part of this book is a comprehensive overview of the three interrelated
but distinct cognitive processes (or learning outcomes) of statistical literacy,
reasoning, and thinking from historical, psychological, and educational perspectives.
This part is therefore the basis upon which the individual studies in subsequent parts
are built.
In the first chapter of this part (Chapter 2), Pfannkuch and Wild present their
paradigm on statistical thinking (part of their four-dimensional framework for
statistical thinking in empirical enquiry; Wild & Pfannkuch, 1999). The authors
identify five types of thinking, considered to be fundamental to statistics. They
follow the origins of statistical thinking through to an explication of what is
currently understood to be statistical thinking. They begin their historical
exploration with the early developers of statistics; move on to more recent
contributions from epidemiology, psychology, and quality management; and
conclude with a discussion of recent writings of statistics education researchers and
statisticians influential in the movement of pedagogy from methods toward thinking.
Gal proposes in Chapter 3 a conceptualization of statistical literacy and its main
components. Statistical literacy is described as a key ability expected of citizens in
information-laden societies, an expected outcome of schooling, and a necessary
component of adults’ numeracy and literacy. Statistical literacy is portrayed as the
ability to interpret, critically evaluate, and communicate about statistical information
and messages. Gal suggests that statistically literate behavior is predicated on the
joint activation of both a knowledge component (comprised of five cognitive
elements: literacy skills, statistical knowledge, mathematical knowledge, context
knowledge, and critical questions) and a dispositional component (comprised of two
elements: critical stance, and beliefs and attitudes).
The focus of delMas’s chapter (Chapter 4) is on the nature of mathematical and
statistical reasoning. The author first outlines the general nature of human reasoning,
which he follows with an account of mathematical reasoning as described by
mathematicians along with recommendations by mathematics educators regarding
educational experiences to improve mathematical reasoning. He reviews the
literature on statistical reasoning and uses findings from the general literature on
reasoning to identify areas of statistical reasoning that students find most
challenging. Finally, he compares and contrasts statistical reasoning and
mathematical reasoning.
The last chapter in this part (Chapter 5) is a joint work by Jones, Langrall,
Mooney, and Thornton that examines cognitive models of development in statistical
reasoning and the role they can play in statistical education. The authors consider
models of development from a psychological perspective, and then describe how
models of statistical reasoning have evolved historically from models of
development in probability. The authors describe and analyze comprehensive
models of cognitive development that deal with multiple processes in statistical
reasoning as well as models of cognitive development that characterize students’
statistical reasoning as they deal with specific areas of statistics and data
12
DANI BEN-ZVI AND JOAN GARFIELD
exploration. The authors suggest that school students’ statistical reasoning passes
through a number of hierarchical levels and cycles.
Part II. Studies of Statistical Reasoning
(Chapters 6 through 13)
The chapters in this part focus on how students reason about specific areas of
statistics. The topics of these chapters include data analysis, distributions, measures
of center, variation, covariation, normal distribution, samples, and sampling
distributions. These studies represent the current efforts in the statistics education
community to focus statistical instruction and research on the big ideas of statistics
(Chapter 17) and on developing students’ statistical reasoning at all levels of
education.
In the first chapter of this part (Chapter 6), Ben-Zvi describes and analyzes the
ways in which middle school students begin to reason about data and come to
understand exploratory data analysis (EDA). He describes the process of developing
reasoning about data while learning skills, procedures, and concepts. In addition, the
author observes the students as they begin to adopt and exercise some of the habits
and points of view that are associated with statistical thinking. Ben-Zvi offers two
case studies focusing on the development of a global view of data and data
representations, and on design of a meaningful EDA learning environment that
promotes statistical reasoning about data analysis. In light of the analysis, the author
proposes a description of what it may mean to learn to reason about data analysis.
Bakker and Gravemeijer explore (Chapter 7) how informal reasoning about
distribution can be developed in a technological learning environment. They
describe the development of reasoning about distribution in seventh-grade classes in
three stages as students reason about different representations. The authors show
how specially designed software tools, students’ created graphs, and prediction tasks
supported the learning of different aspects of distribution. In this process, several
students came to reason about the shape of a distribution using the term bump along
with statistical notions such as outliers and sample size.
Chapter 8 presents an article by Konold and Pollatsek originally published in a
research journal; therefore, it does not follow the same format as the other chapters
in this part. Their chapter offers a conceptualization of averages as a stable feature
of a noisy process. To explore the challenges of learning to think about data as
signal and noise, the authors examine that metaphor in the context of three different
types of statistical processes. For each process, they evaluate the conceptual
difficulty of regarding data from that process as a combination of signal and noise.
The authors contrast this interpretation of averages with various other interpretations
of averages (e.g., summaries of groups of values) that are frequently encountered in
curriculum materials. They offer several recommendations about how to develop
and extend the idea of central tendency as well as possible directions for research on
student thinking and learning.
Understanding the nature of variability and its omnipresence is a fundamental
component of statistical reasoning. In Chapter 9, Reading and Shaughnessy bring
GOALS, DEFINITIONS, AND CHALLENGES
13
together findings from a number of different studies, conducted in three different
countries, designed to investigate students’ conceptions of variability. The focus of
the chapter is on details of one recent study that investigates reasoning about
variation in a sampling situation for students aged 9 to 18.
In Chapter 10, Moritz investigates three skills of reasoning about covariation: (a)
speculative data generation, demonstrated by drawing a graph to represent a verbal
statement of covariation; (b) verbal graph interpretation, demonstrated by describing
a scatterplot in a verbal statement and by judging a given statement; and (c)
numerical graph interpretation, demonstrated by reading a value and interpolating a
value. The authors describe survey responses from students in grades 3, 5, 7, and 9
in four levels of reasoning about covariation.
Batanero, Tauber, and Sánchez present (Chapter 11) the results of a study on
students’ learning of the normal distribution in a computer-assisted, university-level
introductory course. The authors suggest a classification of different aspects of
students’ correct and incorrect reasoning about the normal distribution as well as
giving examples of students’ reasoning in the different categories.
Chapter 12, written by Watson, extends previous research on students’ reasoning
about samples and sampling by considering longitudinal interviews with students 3
or 4 years after they first discussed their understanding of what a sample was, how
samples should be collected, and the representing power of a sample based on its
size. Of the six categories of response observed at the time of the initial interviews,
all were confirmed after 3 or 4 years, and one additional preliminary level was
observed.
Reasoning about sampling distributions is the focus of Chance, delMas, and
Garfield in the last chapter of this part (Chapter 13). In this chapter, the authors
present a series of research studies focused on the difficulties students experience
when learning about sampling distributions. In particular, the authors trace the 7year history of an ongoing collaborative classroom-based research project
investigating the impact of students’ interaction with computer software tools to
improve their reasoning about sampling distributions. The authors describe the
complexities involved in building a deep understanding of sampling distributions,
and formulate models to explain the development of students’ reasoning.
Part III. Curricular, Instructional, and Research Issues
(Chapters 14 through 16)
The third and final part of this book deals with important educational issues
related to the development of students’ statistical reasoning: (a) teachers’ knowledge
and understanding of statistics, and (b) instructional design issues.
Mickelson and Heaton (Chapter 14) explore the complexity of teaching and
learning statistics, and offer insight into the role and interplay of teachers’ statistical
knowledge and context. Their study presents an analysis of one third-grade teacher’s
statistical reasoning about data and distribution in the context of classroom-based
statistical investigation. In this context, the teacher’s statistical reasoning plays a
central role in the planning and orchestration of the class investigation.
14
DANI BEN-ZVI AND JOAN GARFIELD
Makar and Confrey also discuss (Chapter 15) teachers’ statistical reasoning.
They focus on the statistical reasoning about comparing two distributions of four
secondary teachers addressing the research question: “How do you decide whether
two groups are different?” The study was conducted at the end of a 6-month
professional development sequence designed to assist secondary teachers in making
sense of their students’ results on a state-mandated academic test. The authors
provide qualitative and quantitative analyses to examine the teachers’ reasoning.
In Chapter 16, Cobb and McClain propose design principles for developing
statistical reasoning about data in the contexts of EDA and data generation in
elementary school. They present a short overview of a classroom design experiment,
and then frame it as a paradigm case in which to tease out design principles
addressing five aspects of the classroom environment that proved critical in
supporting the students’ statistical learning: The focus on central statistical ideas, the
instructional activities, the classroom activity structure, the computer-based tools the
students used, and the classroom discourse.
Summary and Implications
(Chapter 17)
In the closing chapter (Chapter 17) the editors summarize issues, challenges, and
implications for teaching and assessing students emerging from the collection of
studies in this book. We begin with some comments on statistics education as an
emerging research area, and then concentrate on the need to focus research,
instruction, and assessment on the big ideas of statistics. We address the role of
technology in developing statistical reasoning as well as the diversity of various
statistics learners (e.g., students at different educational levels as well as their
teachers). Next we present a summary of research methodologies used to study
statistical reasoning, along with comments on the extensive use of qualitative
methods and the lack of traditional experimental designs. Finally, we consider some
implications for teaching and assessing students and suggest future research
directions.
We hope that the articulated, coherent body of knowledge on statistical literacy,
reasoning, and thinking presented in this book will contribute to the pedagogical
effectiveness of statistics teachers and educators at all levels; to the expansion of
research studies on statistical literacy, reasoning and thinking; and to growth of the
statistics education community.
REFERENCES
American Association for the Advancement of Science (Project 2061). (1993). Benchmarks for science
literacy. New York: Oxford University Press.
Australian Education Council (1991). A national statement on mathematics for Australian schools.
Carlton, Vic.: Author.
Australian Education Council (1994). Mathematics—a curriculum profile for Australian schools. Carlton,
Vic.: Curriculum Corporation.
GOALS, DEFINITIONS, AND CHALLENGES
15
Chance, B. L. (2002). Components of statistical thinking and implications for instruction and assessment.
Journal of Statistics Education [Online], 10(3). Retrieved June 24, 2003, from
http://www.amstat.org/publications/jse/
Cobb, G. W. (1992). Report of the joint ASA/MAA committee on undergraduate statistics. In American
Statistical Association 1992 Proceedings of the Section on Statistical Education, (pp. 281–283).
Alexandria, VA: ASA.
delMas, R. C. (2002). Statistical literacy, reasoning, and learning: A commentary. Journal of Statistics
Education [Online], 10(3). Retrieved June 24, 2003, from http://www.amstat.org/publications/jse/
Department for Education and Employment (1999). Mathematics: The national curriculum for England.
London: Author and Qualifications and Curriculum Authority.
Garfield, J. (2002). The challenge of developing statistical reasoning. Journal of Statistics Education
[Online], 10(3). Retrieved June 24, 2003, from http://www.amstat.org/publications/jse/
Garfield, J., delMas, R., & Chance, B. (2003). Web-based assessment resource tools for improving
statistical thinking. Paper presented at the annual meeting of the American Educational Research
Association, Chicago.
Garfield, J., Hogg, B., Schau, C., & Whittinghill, D. (2002). First courses in statistical science: The status
of educational reform efforts. Journal of Statistics Education [Online], 10(2). Retrieved June 24,
2003, from http://www.amstat.org/publications/jse/
Ministry of Education (1992). Mathematics in the New Zealand curriculum. Wellington, NZ: Author.
Moore, D. S. (1990). Uncertainty. In Lynn Steen (Ed.), On the shoulders of giants: A new approach to
numeracy (pp. 95–137). National Academy of Sciences.
Moore, D. (1997). New pedagogy and new content: The case of statistics. International Statistical
Review, 65, 123–137.
Moore, D. (1998). Statistics among the liberal arts. Journal of the American Statistical Association, 93,
1253–1259.
National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics.
Reston, VA: Author.
Rumsey, D. J. (2002). Statistical literacy as a goal for introductory statistics courses. Journal of Statistics
Education [Online], 10(3). Retrieved June 24, 2003, from http://www.amstat.org/publications/jse/
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review 67(3), 223–265.
Chapter 2
TOWARDS AN UNDERSTANDING OF
STATISTICAL THINKING
Maxine Pfannkuch and Chris Wild
The University of Auckland, New Zealand
INTRODUCTION
There has been an increasingly strong call from practicing statisticians for statistical
education to focus more on statistical thinking (e.g., Bailar, 1988; Snee, 1993;
Moore, 1998). They maintain that the traditional approach of teaching, which has
focused on the development of skills, has failed to produce an ability to think
statistically: “Typically people learn methods, but not how to apply them or how to
interpret the results” (Mallows, 1998, p. 2).
Solutions offered for changing this situation include employing a greater variety
of learning methods at undergraduate level and compelling students to experience
statistical thinking by dealing with real-world problems and issues. A major
obstacle, as Bailar (1988) points out, is teacher inexperience. We believe this is
greatly compounded by the lack of an articulated, coherent body of knowledge on
statistical thinking that limits the pedagogical effectiveness even of teachers who are
experienced statisticians. Mallows (1998) based his 1997 Fisher Memorial lecture
on the need for effort to be put into developing a theory for understanding how to
think about applied statistics, since the enunciation of these principles would be
useful for teaching.
This chapter focuses on thinking in statistics rather than probability. Although
statistics as a discipline uses mathematics and probability, as Moore (1992b) states,
probability is a field of mathematics, whereas statistics is not. Statistics did not
originate within mathematics. It is a unified logic of empirical science that has
largely developed as a new discipline since the beginning of the 20th century. We
will follow the origins of statistical thinking through to an explication of what we
currently understand to be statistical thinking from the writings of statisticians and
statistics educationists.
17
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 17–46.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
18
MAXINE PFANNKUCH AND CHRIS WILD
Model for Interpretation of Literature
We will be interpreting the literature from our own paradigm (Figure 1) on
statistical thinking (Wild & Pfannkuch, 1999). The model was developed by
interviewing statisticians and tertiary students about statistical projects they had
been involved in; interviewing tertiary students as they performed statistical tasks;
and analyzing the literature below (see “Discussion and Summary” for more detail).
In our model we identified the types of thinking we consider to be fundamental to
statistics (Figure 1b). These five fundamental thinking types are now elaborated
upon.
Recognition of the Need for Data
The foundations of statistical enquiry rest on the assumption that many real
situations cannot be judged without the gathering and analysis of properly collected
data. Anecdotal evidence or one’s own experience may be unreliable and misleading
for judgments and decision making. Therefore, properly collected data are
considered a prime requirement for reliable judgments about real situations.
Transnumeration
For this type of thinking we coined the word transnumeration, which means
“changing representations to engender understanding.” Transnumeration occurs in
three specific instances. If one thinks of the real system and statistical system from a
modeling perspective, then transnumeration thinking occurs when (1) measures that
“capture” qualities or characteristics of the real situation are found; (2) the data that
have been collected are transformed from raw data into multiple graphical
representations, statistical summaries, and so forth, in a search to obtain meaning
from the data; and (3) the meaning from the data, the judgment, has to be
communicated in a form that can be understood in terms of the real situation by
others.
Consideration of Variation
Adequate data collection and the making of sound judgments from data require
an understanding of how variation arises and is transmitted through data, and the
uncertainty caused by unexplained variation. It is a type of thinking that starts from
noticing variation in a real situation, and then influences the strategies we adopt in
the design and data management stages when, for example, we attempt to eliminate
or reduce known sources of variability. It further continues in the analysis and
conclusion stages through determining how we act in the presence of variation,
which may be to either ignore, plan for, or control variation. Applied statistics is
about making predictions, seeking explanations, finding causes, and learning in the
context sphere. Therefore we will be looking for and characterizing patterns in the
UNDERSTANDING OF STATISTICAL THINKING
19
variation, and trying to understand these in terms of the context in an attempt to
solve the problem. Consideration of the effects of variation influences all thinking
through every stage of the investigative cycle.
(a) DIMENSION 1 : T HE INVESTIGATIVE CYCLE
(b) DIMENSION 2 : TYPES OF THINKING
(PPDAC)
•
•
•
•
Interpretation
Conclusions Conclusions
New ideas
Communication
•
•
•
•
Data exploration
Planned analyses
Unplanned analyses
Hypothesis generation
Analysis
Problem
• Grasping system dynamics
• Defining problem
Plan
Data
• Data collection
• Data management
• Data cleaning
Planning
• Measurement system
• “Sampling design”
• Data management
• Piloting & analysis
(c) DIMENSION 3 : THE INT ERROGATIVE CYCLE
Decide what to:
Judge
• believe
• continue to entertain
• discard
Criticise
Generate
Imagine possibilities for:
• plans of attack
• explanations / models
• information requirements
Seek
Information and ideas
• internally
• externally
Check against
reference points:
• internal
• external
GENERAL TYPES
• Strategic
— planning, anticipating problems
— awareness of practical constraints
• Seeking Explanations
• Modellin g
— construction followed by use
• Ap plying Techniques
— following precedents
— recognition and use of archetypes
— use of problem solving tools
TYPES FUNDAMENTAL TO STATISTICAL
THINKING
(Foundations)
• Recognition of need for data
• Tran snumeration
(Changing representations to engender understanding)
— capturing “measures” from real system
— changing data representations
— communicating messages in data
• Consideration of variation
— noticing and acknowledging
— measuring and modelling for the purposes of
prediction, explanation, or control
— explaining and dealing with
— investigative strategies
• Reasoning with statistical models
•
•
•
•
•
Read/hear/see
Translate
Internally summarise
Compare
Connect
— aggregate-based reasoning
• Integrating the statistical and contextual
— information, knowledge, conceptions
(d) DIMENSION 4 : D ISPOSITIONS
• Scepticism
• Imagination
• Cu riosity and awareness
— observant, noticing
• Openness
— to ideas that challenge preconceptions
• A propensity to seek deeper meaning
• Bein g Logical
• Engagement
• Perseverance
Figure 1. A four-dimensional framework for statistical thinking in empirical enquiry. (From
“Statistical Thinking in Empirical Enquiry,” by C. J. Wild and M. Pfannkuch, 1999,
International Statistical Review, 67, p. 226. Copyright 1999 by International Statistical
Institute. Reprinted with permission.)
20
MAXINE PFANNKUCH AND CHRIS WILD
Reasoning with Statistical Models
The predominant statistical models are those developed for the analysis of data.
When we talk about “statistical models,” most people interpret the term as meaning,
for example, regression models or time-series models. Even much simpler tools such
as statistical graphs can be thought of as statistical models since they are statistical
ways of representing and thinking about reality. When we use statistical models to
reason with, the focus is more on aggregate-based rather than individual-based
reasoning, although both types are used. Proper individual-based reasoning
concentrates on single data points with little attempt to relate them to the wider data
set, whereas aggregate-based reasoning is concerned with patterns and relationships
in the data set as a whole. A dialogue is set up between the data and statistical
models. The models may allow us to find patterns in the data, find group
propensities, and see variation about these patterns via the idea of distribution. The
models enable us to summarize data in multiple ways depending on the nature of the
data. For example, graphs, centers, spreads, clusters, outliers, residuals, confidence
intervals, and p-values are read, interpreted, and reasoned with in an attempt to find
evidence on which to base a judgment. Different types of statistical models based on
the idea of “process” are starting to be used for reasoning with in the other stages of
the investigative cycle (e.g., see Joiner, 1994; Wild & Pfannkuch, 1999, Section 4).
Integrating the Statistical and Contextual
Although the above types of thinking are linked to contextual knowledge, the
integration of statistical knowledge and contextual knowledge is an identifiable
fundamental element of statistical thinking. The statistical model must capture
elements of the real situation; thus the resultant data will carry their own literature
base (Cobb & Moore, 1997), or more generally, their own body of context
knowledge. Because information about the real situation is contained in the
statistical summaries, a synthesis of statistical and contextual knowledge must
operate to draw out what can be learned from the data about the context sphere.
These ideas will be used to analyze and interpret the perspectives of different
fields on statistical thinking.
CONTRIBUTIONS FROM DIFFERENT FIELDS
Statistics has been like a tiny settlement taking root and steadily growing into a
large, rich country through continual two-way trade with the many neighbors on its
borders. Tracing all the contributions from all the fields that have fed and enriched
statistics would be an impossibly large undertaking; see, for example, the three
volumes of Kotz and Johnson (1992). We will just concentrate on some high points
in the development of statistical ways of thinking, and more recently of pedagogy
aimed at enhancing statistical thinking (see Scheaffer, 2001). Our account stresses
thinking that led to new ways of perceiving a world reality. We do not, for example,
UNDERSTANDING OF STATISTICAL THINKING
21
discuss how different schools of inference use probability models to draw inferences
from data. The big developmental step, as we see it, was to begin to use probability
models to draw inferences from data.
We begin this section with the early developers of statistics; move on to much
more recent contributions from epidemiology, psychology, and quality management;
and conclude the section with a discussion of recent writings of statistics education
researchers and statisticians influential in the movement of pedagogy from methods
toward thinking.
Origins
Statistical thinking permeates the way we operate and function in everyday life.
Yet, it remains an enigma as to why even the most basic of the statistical
perspectives on the world—namely, reasoning from data—is less than 350 years old
(Davis & Hersh, 1986). Many have put forward explanations for the delay. The
current theory (Hacking, 1975) is that in the Renaissance two significant shifts in
thinking occurred about what was considered to be the nature of knowledge and the
nature of evidence. First, the concept of knowledge shifted from an absolute truth
toward a knowledge based on opinion, resulting in the thinking shifting toward a
probabilistic perspective. This required a skeptical attitude and inductive thinking.
Second, the nature of evidence shifted away from the pronouncements of those in
authority and toward making inferences from observations, resulting in the thinking
shifting toward reasoning from data. Both of these shifts initiated a new paradigm
for viewing and learning about the world.
Drawing Inferences from Data
The roots of such statistical thinking can be traced to John Graunt (David, 1962;
Kendall, 1970; Greenwood, 1970), who in 1662 published the book Natural and
Political Observations. Previously, official statistics had lain dormant as stored data.
Graunt’s new way of thinking is best illustrated with a centuries-old controversy
about whether the plague was carried by infection from person to person or carried
through infectious air. Most people believed both methods were true. They believed
sick people caused the air to be infectious. They also knew that the plague could
start at the dock since a ship from overseas brought with it foul and infectious air.
The practice and advice were to flee such sources of infection. But when Graunt
looked at the number of plague cases, he reasoned (Hacking, 1975, p. 105):
The contagion of the plagues depends more on the disposition of the air than upon the
effluvia from the bodies of men. Which we also prove by the sudden jumpings which
the plague hath made, leaping in one week from 118 to 927, and back again from 993
to 258, and from thence again the very next week to 852.
If the plague was passed from one person to another, then these statistics could
not be explained; but they could be explained by the infectious air theory. In this
22
MAXINE PFANNKUCH AND CHRIS WILD
graphic example, we see Graunt taking mankind’s first steps in making inferences
from data. He uses some fundamental statistical thinking, such as noticing and
seeking to explain the differences in the numbers using his context knowledge.
Graunt also gave the “first reasoned estimate of the population of London”
(Hacking, 1975, p. 106) using arithmetical calculations. From knowing the number
of births, he inferred the number of women of childbearing age and hence estimated
the total number of families and the mean size of a family to produce an estimate of
the population. In his time Graunt was regarded by some of his peers as the
“Columbus” who discovered how to think and reason with data and hence opened
up a new world in which old and new demographic reports could be surveyed.
Similar ways of thinking arose independently in many parts of Western Europe
in the same decade. Other pioneers were Petty, King, Halley, Hudde, Huyghens, and
Davenant. According to Kendall (1970, p. 46), these political arithmeticians had an
inferential approach to data and “thought as we think today” since “they reasoned
about their data.” Their approach was to estimate and predict and then learn from the
data, not to describe or collect facts.
Recognition of the Need for Data
Graunt and these other political arithmeticians, besides calculating insurance
rates—which involved much discussion among them on producing realistic
mortality tables—were also promoting the notion that state policy should be
informed by the use of data rather than by the authority of church and nobility
(Porter, 1986). In these ideas we see fundamental statistical thinking operating—
there is a recognition that data are needed in order to make a judgment on a
situation. This notion was not a part of the mainstream consciousness until the late
1800s (Cline Cohen, 1982), when politicians were urged to argue for a policy based
on quantitative evidence since “without numbers legislation is ill-informed or
haphazard” (Porter, 1986, p. 37).
The Beginnings of Statistical Modeling
Even though the foundations of probability were laid down, by Pascal (1623–
1662) and later by Bernoulli (1654–1705) at the same time and in parallel with the
foundations of statistics, probability ideas were not incorporated into empirical data
or statistical analyses. There appeared to be stumbling blocks in (1) relating urndevice problems to real-world problems; (2) a lack of equiprobability in the realworld problems; (3) the notion that prediction is impossible when there is a
multitude of causes; (4) thinking tools such as graphs not being available; and (5)
the inevitable time lags in drawing disparate and newly developed strands together
into a coherent whole. According to Stigler (1986), the chief conceptual step toward
the application of probability to quantitative inference involved the inversion of the
probability analyses of Bernoulli and de Moivre (1667–1754).
This ground-breaking inference work of Bayes in 1764 was encouraged by two
critical key ideas. The first key idea was not to think in terms of games of chance.
UNDERSTANDING OF STATISTICAL THINKING
23
That is, instead of thinking of drawing balls from an urn, Bayes thought in terms of
a square table upon which two balls were thrown. This new thinking tool allowed
for continuous random quantities to be described as areas and for the problem to
assume a symmetric character. The second key idea was from Simpson, who in 1755
had a conceptual breakthrough in an astronomical problem. Rather than calculating
the arithmetic mean of the observations, Simpson focused on the errors of the
observations (the difference between the recorded observation and the actual
position of the body being observed) and assumed a specific hypothesis for the
distribution of the measurement errors. These critical changes in thinking opened the
door to an applicable quantification of uncertainty. Lightner (1991, p. 628) describes
this as a transition phase as “many concepts from probability could not be separated
from statistics, for statisticians must consider probabilistic models to infer properties
from observed data.”
Thus in astronomy and geodesy (surveying) the use of probability to assess
uncertainty and make inferences from data employing the mathematical methods of
Laplace (1749–1827) and Gauss (1777–1855) such as the normal distribution for
measurement errors and the method of least squares became commonplace. At this
stage we see the beginning of some more fundamental statistical thinking; there is a
movement from reasoning with arithmetic to reasoning with statistical models and to
the measuring and modeling of error. It is important to note that there was still no
concept of variation in nature. This concept and some other major conceptual
barriers had to be overcome before this thinking could spread to the social sciences.
Social Data and Reasoning with the Aggregate
At the beginning of the 19th century a new sense of dynamism in society, after
the French Revolution, produced a subtle shift in thinking when statistics was seen
as a science of the state. The statists, as they were known, conducted surveys of
trade, industrial progress, labor, poverty, education, sanitation, and crime (Porter,
1986). “The idea of using statistics for such a purpose—to analyze social conditions
and the effectiveness of public policy—is commonplace today, but at that time it
was not” (Cohen, 1984, p. 102). Into this milieu a pioneer in social statistics,
Quetelet (1796–1874), arrived. Quetelet argued that the foundations of statistics had
been established by mathematicians and astronomers. He looked at suicide rates and
crime rates and was amazed to find large-scale regularity. Through realizing that
general effects in society are produced by general causes and that chance could not
influence events when considered collectively, he was able to recast Bernoulli’s law
of large numbers as a fundamental axiom of social physics. Porter (1986, p. 55)
suggested that Quetelet’s major contribution was in: “persuading some illustrious
successors of the advantage that could be gained in certain cases by turning attention
away from the concrete causes of individual phenomena and concentrating instead
on the statistical information presented by the larger whole.” The effect of
Quetelet’s findings reverberated. Debates raged about the “free will of man.”
Politicians and writers such as Buckle and Dickens were impressed; they wrote
about these constant statistical laws that seemed to govern the moral and physical
24
MAXINE PFANNKUCH AND CHRIS WILD
condition of people. For instance, the argument was promoted that if a particular
individual did not commit a crime, others would be impelled until the annual quota
of crime had been reached. Thus this new way of processing information was
catalyzing a new awareness of reality and a reevaluation of determinism.
Quetelet’s other major contribution occurred in 1844, when he announced that
the astronomer’s error law, or error curve, also applied to human traits such as
height. He viewed the “average man” (his findings about the average man became so
well known in his day that the phrase is still part of our language today) as the
perfect archetype. All men were designed according to this specification; but
because of nutrition, climate, and so forth failed to achieve the average man’s
measurements (Porter, 1986). He believed, therefore, that such human
measurements were indeed errors. Although too many of his data sets revealed
evidence of normality, he succeeded in creating a climate of awareness that
empirical social observations could be modeled by theoretical distributions. His
work provided “evidence” that there appeared to be an averaging of random causes
and “that nature could be counted on to obey the laws of probability” (Stigler, 1986,
p. 220). Quetelet started to shift the interest within probability from measurement
error to variation and began the process by which the “error curve” became a
distribution governing variation.
Variation as a Concept
Galton in the late 19th century provided the major conceptual breakthrough
(Stigler, 1986) for rationalizing variation in nature to the normal curve. To him the
curve stood as a denial of the possibility of inheritance. In other words, why did
population variability in height not increase from year to year, since tall parents
should have taller children and short parents should have shorter children? His
pondering on the size of pears (large, moderate, and small) in a garden and his
development of the quincunx as an analogy “demonstrated” that the resulting
mixture of approximately normal conditional distributions was itself approximately
normal. This empirical theory, coupled with his work on reversion in sweet pea
experiments and his study of hereditary stature, eventually led to the theory of
regression to the mean. For the first time, statistical thinking had incorporated the
notion of variation rather than error.
Debates about the use of statistics in the social sciences continued. An argument
promoted was that statistical regularities proved nothing about the causes of things.
When Einstein declared in his famous quotation that “God did not play dice,” he
was stating the viewpoint of the late 19th century that scientific laws were based on
causal assumptions and reflected a causal reality. The defense of human freedom
inspired a wide-ranging reevaluation of statistical thought. Variation and chance
were recognized as fundamental aspects of the world in a way that they had not been
before. This acceptance of indeterminism constituted one of the noteworthy
intellectual developments of the time. According to Porter (1986, p. 319) the
evolvement of statistical thinking from 1662 to 1900 “has been not just to bring out
UNDERSTANDING OF STATISTICAL THINKING
25
the chance character of certain individual phenomena, but to establish regularities
and causal relationships that can be shown to prevail nonetheless.”
New Tools and Transnumeration Thinking
The use of abstract, nonrepresentational pictures to show numbers, rather than
tables of data, was not thought of until 1750–1800. Statistical graphics such as timeseries and scatter plots were invented long after the use of Cartesian coordinates in
mathematics. “William Playfair (1759–1823) developed or improved upon nearly all
the fundamental graphical designs, seeking to replace conventional tables of
numbers with the systematic visual representations of his ‘linear arithmetic’” (Tufte,
1983, p. 9). Another pioneer, Florence Nightingale (1820–1910), also developed
new graphical representations (Cohen, 1984). The representation of her tables of
data into new graph forms, for example, revealed the extent to which deaths in the
Crimea War had been preventable. This changing of data representation in order to
trigger new understandings from the data or to communicate the messages in the
data illustrates some fundamental statistical thinking.
Emergence of a New Discipline
Porter (1986, p. 315) states that “the intellectual character of statistics” had been
crystallized by 1900, and that modern statisticians perceived “the history of their
field as beginning with Galton, [(1822–1911)] if not Pearson [(Karl Pearson, 1857–
1936)].” The emergence of statistical thinking appears to have been based on four
main factors. The first factor is a fundamental realization that the analysis of data
will give knowledge about a situation. The basis to this factor is recognition that
knowledge acquisition can be based on investigation. The second factor is a
recognition that mathematical probability models can be used to model and predict
group (e.g., human group) behavior. Thus an interplay between the mathematical
probability model and the real situation resulted in a shift of thinking to include a
nondeterministic view of reality. The third factor is the application of mathematical
probability models to a variety of domains, resulting in new ways of thinking,
perceiving, and interpreting in the statistics discipline. For example, these new ways
of thinking occurred when mathematical error models were used by Quetelet in the
social science field, and by Galton in the biological sciences, and consequently
became reinterpreted in fundamentally different ways as variation or chance
statistical models. The fourth factor is the development of new tools for analysis,
arising from the new situations where statistics was being applied. These new tools
helped to aid the development of statistical thinking. Statistical thinking appears to
have arisen from a context-knowledge base interacting with a statistical-knowledge
base, with the resultant synthesis producing new ways of modeling and perceiving
the world.
At the beginning of the 20th century people such as Karl Pearson, Ronald A.
Fisher (1890–1962), Jerzy Neyman (1894–1981) and Egon Pearson (1885–1980)
built the foundations of modern statistics (see Salsburg, 2001). Their particular
26
MAXINE PFANNKUCH AND CHRIS WILD
insights into principles such as randomization in experiments and surveys, coupled
with the development of theoretical statistics, promoted new ways of thinking in
many fields. In particular, Fisher’s work is regarded as providing the conceptual
underpinnings not only for the academic discipline of statistics but also for fields
such as plant and animal breeding, evolutionary biology, and epidemiology.
Krishnan (1997) believes that Fisher’s most important contribution to statistics and
science was his formulation of the basics of experimental design—randomization,
replication, and local control. Consideration of variation (e.g., variation in the
growing conditions for plants) is a core element in the thinking behind such
experimental design.
Fisher’s famous thought experiment on “the lady and the cup of tea,” on which
he based his discussion on experimental designs, was never undertaken. The idea
arose from an actual incident 12 years earlier, when a Dr. Muriel Bristol declined a
cup of tea on the grounds that the milk had not been poured in first. Fisher and her
fiancé immediately set out to test whether she could tell the difference. Her fiancé
declared she was able to prove her case. Box (1978, p. 134), however, thinks that
Fisher pondered on questions such as: “How many cups should be used in the test? .
. . What should be done about chance variations in the temperature, sweetness and
so on? What conclusions could be drawn from a perfect score or from one with one
or more errors?” Therefore, Fisher initiated his groundbreaking work by considering
questions relevant to designing an experiment for the following situation:
A lady declares that by tasting a cup of tea made with milk she can discriminate
whether the milk or the tea infusion was first added to the cup. We will consider the
problem of designing an experiment by means of which this can be asserted. (Fisher,
1935, cited in Box, 1978, p. 135)
Fisher’s two main innovations for the design of experiments were the
introduction of analysis of variance and randomization. According to Box (1997, p.
102), Fisher elucidated “the underlying theory and provided the statistical methods
that research workers urgently needed to deal with the ubiquitous variation
encountered in biological experimentation.” Fisher also played a pivotal role in the
actual use of randomization in controlled agricultural experiments (Fienberg and
Tanur, 1996). Randomization was described by Fisher as a method that was
necessary for the validity of any test of significance, since it “affords the means, in
respect of any particular body of data, of examining the wider hypothesis in which
no normality of distribution is implied” (1935; cited in Box, 1978, p. 151). Without
randomization, confounding factors would give biased estimates. Fisher’s work
contributed to the recognition that uncertainty could be captured by quantifiable
measures that led to a deeper appreciation and understanding of its nature (Box,
1997). Porter (1986) also observed that Fisher’s integration of statistics with
experimental design essentially changed the character of statistics by moving it
beyond observing patterns in data to demonstrating the existence of causal
relationships.
UNDERSTANDING OF STATISTICAL THINKING
27
Some Contributions from Epidemiology
Variation and Randomization
In accounts of statistical thinking in medicine, variation is never mentioned; yet
it is at the heart of the methodology and the thinking. Perhaps it is because medicine
has only recently accepted the quantification and objectification of its practice
(Porter, 1995). This awareness of the importance of statistical thinking and methods
in epidemiology can largely be attributed to the work of three statisticians—Austin
Bradford Hill, Jerome Cornfield, and Richard Doll (Gail, 1996)—during the mid20th century. They were the main statisticians behind the general acceptance by the
medical profession of (1) the randomized comparative clinical trial, starting with
Hill’s pioneering work with the whooping-cough vaccine in the 1940s; and (2)
acceptance of a code of practice for observational studies, through their data
analyses on the association between smoking and lung cancer. Before the technique
of randomized comparative trials could be applied to humans, however, there were
ethical issues to be overcome, as well as a largely innumerate profession (Gail,
1996). Another reason for this recent acceptance of randomized comparative clinical
trials is that the statistical methods for comparison were only invented in the 1920s
by Fisher (in the context of agricultural experiments). It is noteworthy that what are
now common practices and ways of thinking about what constitutes evidence only
began to be accepted by the medical profession during the 1960s.
Causal Inference
Fisher was a significant protagonist in the prolonged debate on whether smoking
causes lung cancer (Box, 1978). However, his insistence on raising other possible
causes for lung cancer—together with Cornfield, Doll, and Hill’s careful, logical
analyses of data—markedly increased awareness of the importance of statistical
thinking in medicine (Gail, 1996). Alternative explanations for the association
between lung cancer and smoking suggested by Fisher and others were
systematically refuted by Cornfield, Doll, and Hill until there could be no other
plausible interpretation of the data. The debate on the association between smoking
and lung cancer, which began in 1928, culminated in the 1964 publication of the
U.S. Surgeon General’s report, a landmark in the setting of standards of evidence for
inference of a causal relationship from observational studies.
Thus in epidemiology it was recognized that purely statistical methods applied to
observational data cannot prove a causal relationship. Causal significance was
therefore based on “expert” judgment utilizing a number of causal criteria such as
consistency of association in study after study, strength of association, temporal
pattern, and coherence of the causal hypothesis with a large body of evidence (Gail,
1996). It should be noted that whether the study is experimental or observational, the
researcher always has the obligation to seek out and evaluate alternative
explanations and possible biases before drawing causal inference.
28
MAXINE PFANNKUCH AND CHRIS WILD
Causal inference is at the heart of epidemiology. Epidemiology laid down the
foundations for causal criteria as first enunciated by Hill (1965). According to Porter
(1995), these agreed-upon rules and conventions are paramount for trusted
communication globally. Thalidomide was originally considered safe according to
expert judgment. The resulting disaster led to more criteria being laid down for
scientific procedure and quantification of new knowledge. Gail (1996, p. 1) believes
that:
Statistical thinking, data collection and analysis were crucial to understanding the
strengths and weaknesses of the scientific evidence … [and] gave rise to new
methodological insights and constructive debate on criteria needed to infer a causal
relationship. These ideas form the foundation for much of current epidemiologic
practice.
The statistical thinking that would seem to permeate epidemiology is a
synthesizing of contextual knowledge with statistical knowledge and the
consideration of variation at all stages of the investigative cycle for experimental
and observation studies. Statistical thinking in this context is about seeking causes
with a knowledge and understanding of variation.
Some Contributions from Psychology
The centrality of variation in statistical thinking was being recognized in
experimental design and in observational studies. In psychology in the late 1960s,
however, a link was recognized between statistics and how people think in everyday
situations.
Recognizing Statistical Thinking as a Way of Perceiving the World
In the early 1970s Kahneman and Tversky began publishing important work on
decision making under uncertainty (see Tversky and Kahneman, 1982). They
discovered that statistical thinking is extraordinarily difficult for people. These
researchers’ particular insights transformed the idea of statistical thinking from
making inferences from purposefully collected data, to making inferences from
everyday data that are not collected for any purpose nor seen as data. To illustrate
this concept, the story of how this field was started is related. According to McKean
(1985), Kahneman mentioned, in a psychology course to flight instructors, that from
research with pigeons there was evidence that reward was a more effective teaching
strategy than punishment. The flight instructors disagreed vehemently that this
research was applicable to humans. They knew from their experience that if they
praised a person for a good maneuver then invariably the next maneuver would be
worse, and that if they yelled at the person for a badly executed maneuver then the
next one would more than likely be an improvement. At that instant, Kahneman
made an insightful connection with Galton’s statistical principle of regression to the
mean.
UNDERSTANDING OF STATISTICAL THINKING
29
1
0
-2
-1
rnorm(80)
2
3
We can explain the idea as follows (Figure 2). If you look at a time-series plot of
data points independently sampled from a random distribution, say the normal as in
the figure, you will see that the observation that follows a fairly small value tends to
be larger, and the observation that follows a fairly large value tends to be smaller. It
tends to go back, or “regress,” toward the mean.
Thus if flight performance was a random process and praise for good
performance and censure for poor performance had absolutely no effect at all, flight
instructors would tend to have experienced students performing better after censure
and worse after praise. They would then come to exactly the same conclusion—that
censure was effective and praise was, if anything, counterproductive:
The student pilots, Kahneman explained, were improving their skills so slowly
that the difference in performance from one maneuver to the next was largely a
matter of luck. Regression dictated that a student who made a perfect three-point
landing today would make a bumpier one tomorrow—regardless of praise or blame.
But the flight instructors, failing to realize this, had underestimated the effect of
reward and overestimated the effect of punishment. (McKean, 1985, p. 25)
0
20
40
60
80
Observation Number
Figure 2. Time-series plot of data independently sampled from a normal distribution
(µ=0, σ=1).
This type of statistical thinking we will label as understanding the behavior of
variation, though Kahneman and Tversky do not explicitly write in these terms. It
requires admitting the possibility of indeterminism. People were mistakenly
attributing each change to a cause rather than perceiving the students’ performance
as a random process with an underlying mean. Kahneman and Tversky became
sensitized to seeing regression to the mean everywhere. They developed a long list
of phenomena that people have found surprising that can be explained in terms of
regression to the mean.
This insight led to the two men thinking of other statistical principles that were
counterintuitive. One of these was that people believe that a small sample is a
representative sample, or that a small sample should reflect the characteristics of the
30
MAXINE PFANNKUCH AND CHRIS WILD
population (Tversky & Kahneman, 1982). From a variation perspective there is
more variation in a small sample than in a large sample. Consequently, people often
put too much faith in the results of small samples. But there is an ambivalence here.
At other times people severely doubt results from small (i.e., small in proportion to
the size of the population) randomly selected samples (Bartholomew, 1995) and do
not believe that the sample will reflect the population.
Tversky and Kahneman’s work has revealed that statistical thinking is not
embedded in how people act and operate in the world, which is not surprising given
its youth. In fact the psychologists Falk and Konold (1992, p. 151) believe people
must undergo their own ‘probabilistic revolution’ and shift their perception of the
world from a deterministic view to one “in which probabilistic ideas have become
central and indispensable.” A complementary but very differently expressed view is
shared within the quality management field, where there is a belief that peoples’
conception of statistical thinking will alter their understanding of reality (Provost &
Norman, 1990).
Some Contributions from Quality Management
Statistical thinking is at the forefront of the quality management literature. Snee
(1999) believes that the development of statistical thinking will be the next step in
the evolution of the statistics discipline, while Provost and Norman (1990, p. 43)
state "the 21st century will place even greater demands on society for statistical
thinking throughout industry, government, and education.” Such strong beliefs about
the value of statistical thinking pervade the quality management field, which focuses
on systematic approaches to process improvement. At the heart of these approaches
is learning from and about processes so that changes can be made to improve them.
This has led to a literature and to a large numbers of courses in statistical thinking,
many of them concerned with the skill sets required of managers (e.g., Joiner, 1994).
What stands out immediately in their definitions of statistical thinking is the role of
variation. Process improvement, in large part, consists of controlling and minimizing
variation.
Controlling Variation
Hare, Hoerl, Hromi, and Snee (1995) state that statistical thinking has its roots in
the work of Shewhart, who in 1925 published a paper about maintaining the quality
of a manufactured product. This led to the development of the quality control field,
of which Deming was also at the forefront (Shewhart & Deming, 1939). The basis
of Shewhart and Deming’s work was that there are two sources of variation in a
process: special-cause variation and common-cause variation, or chance variation.
For quality control the prevailing wisdom for a long time had been to identify, fix,
and eliminate the special causes (thus bringing the process to ever-improved levels
of statistical stability) and to accept the inherent variability within a process (i.e., the
common cause or chance variation). So long as the observations fell within the
three-sigma limits, the rule was to leave the process alone. This attitude to variation
UNDERSTANDING OF STATISTICAL THINKING
31
has been changing due to a climate of continually shifting standards and higher
expectations. It is no longer quality control but continuous quality improvement that
is the focus of management.
Minimizing Variation
Pyzdek’s (1990, p. 104) approach to thinking about variation is summarized as:
•
•
•
•
All variation is caused.
Unexplained variation in a process is a measure of the level of ignorance
about the process.
It is always possible to improve understanding (reduce ignorance) of the
process.
As the causes of process variation are understood and controlled variation
will be reduced.
This understanding of variation enables not only the reduction of process
variation but also the changing of the average level of the process (Snee, 1999).
Thus in quality improvement it is believed that to truly minimize variability, the
sources of variation must be identified and eliminated (or at least reduced). The first
task, however, is to distinguish common-cause and special-cause variation. It is
recognized that variation from special causes should be investigated at once, while
variation from common causes should be reduced via structural changes to the
system and long-term management programs. The method for dealing with common
causes is to investigate cause and effect relationships using such tools as cause and
effect diagrams, stratification analysis, pareto analysis, designed experiments,
pattern analysis, and modeling procedures. In-depth knowledge of the process is
essential. Patterns in the data must be looked for, and depending on the question
asked, data must be aggregated, re-aggregated, stratified, or re-stratified. There is a
need to look at the data in many ways in the search for knowledge about common
causes. The context must also be known in order to ask good questions of the data.
Pyzdek (1990) gives a graphic example of how viewing “chance” as being
explicable and reducible rather than unexplainable but controllable in a system can
lead to improvements. In a manufacturing process the average number of defects in
solder-wave boards declined from 40 to 20 per 1,000 leads, through running the
least dense circuit pattern across the wave first. Another two changes to the system
later on reduced the average number of defects to 5 per 1,000 leads. Therefore
Pyzdek (1990, p. 108) repudiates the “outdated belief that chance causes should be
left to chance and instead presents the viewpoint that all variation is caused and that
many, perhaps most processes can be improved economically.” His perspective is on
the marketplace with its increasing emphasis on continuous improvement. Although
this may be considered a deterministic outlook, there is still an acceptance of
indeterminism—it is more about reducing the level of indeterminism by acquiring
more knowledge.
32
MAXINE PFANNKUCH AND CHRIS WILD
In reality, variation is ever present. If patterns cannot be found in the data, then
the extent of the variability can be estimated and allowed for in the process. If
patterns are found, but the cause is not manipulable (e.g., gender), then the
identification of the cause enables better prediction for individuals and processes can
be designed to allow for the variation. If the cause is manipulable, then the process
can be changed to increase the “desirable” outcomes (Wild & Pfannkuch, 1999).
Therefore the thinking is to search for causes, for all possible explanations, but to
recognize that variation will be present. Coupled with this thinking is the cognition
that what may appear to be a pattern may in reality be random or unexplained
variation.
Variation as a Way of Perceiving the World
In the quality management area, common consensus is being developed on the
characteristics of the statistical thinking required for improving systems. As people
in the quality field have moved from quality control to quality management, the
nature of the thinking required has developed from an emphasis on stable variability
in manufactured products toward an emphasis on the way managers (from any
environment) should operate and think.
Snee (1990, p. 116) believes there is a need to acquire a greater understanding of
statistical thinking and the key is to focus on statistical thinking at the conceptual
level or from a “systems” perspective rather than focusing on the statistical tools:
I define statistical thinking as thought processes, which recognize that variation is all
around us and present in everything we do, all work is a series of interconnected
processes, and identifying, characterizing, quantifying, controlling and reducing
variation provide opportunities for improvement. This definition integrates the ideas
of processes, variation, analysis, developing knowledge, taking action and quality
improvement.
According to Hare et al. (1995, p. 55), “Statistical thinking is a mind-set.
Understanding and using statistical thinking requires changing existing mind-sets.”
They state that the key components of statistical thinking for managers are “(1)
process thinking; (2) understanding variation; (3) using data whenever possible to
guide actions.” In particular, they reinforce ideas like these: improvement comes
from reducing variation; managers must focus on the system, not on individual
people; and data are the key to improving processes. Kettenring (1997, p. 153)
supports this view when he states that managers need to have an “appreciation for
what it means to manage by data.”
Snee (1999, p. 257), however, contends that while data should be used for
effective statistical thinking, data are not essential to the use of statistical thinking.
He observes variation is present in processes without data being available. For
example, it is generally known that “decreasing the variation of process inputs
decreases the variation of process outputs.” Hence, without data, statistical thinking
would suggest, for example, that companies should significantly reduce their
number of suppliers. Britz, Emerling, Hare, Hoerl, and Shade (1997, p. 68) sum up
UNDERSTANDING OF STATISTICAL THINKING
33
this ability to use statistical thinking without data as follows: “the uniqueness of
statistical thinking is that it consists of thought processes rather than numerical
techniques. These thought processes affect how people take in, process, and react to
information.”
Tversky and Kahneman’s insights about how regression to the mean affects
people’s beliefs about the effects of reward and punishment are widely promulgated
in quality management as part of “understanding the theory of variation.” The
setting used to illustrate this is typically the reactions of sales managers to the highs
and lows in sales figures of their staff. According to Joiner and Gaudard (1990),
many managers fail to recognize, interpret, and react appropriately to variation over
time in employee performance data. These statisticians are attempting to get
managers to understand that looking at single time-interval changes and meting out
praise and censure is not conducive to improving performance. The way to improve
performance is to make some system change that will increase the average level of
performance. Managers need to recognize that there will always be variation, and
that unless there is a system change there will be regression to the mean. This
suggests that managers are being asked to take on a world view that allows for
indeterminism.
Statistical thinking in quality management is now seen not only as necessary for
gleaning information from data but also as a way of perceiving the world reality.
From quality management we learn that statistical thinking is, first and foremost,
about thought processes that consider variation, about seeking explanations to
explain the variation, about recognizing the need for data to guide actions, and about
reasoning with data by thinking about the system or process as a whole. Implicit in
their concepts about variation is that system (not people or individual) causal
thinking is paramount. Once the type of variation has been categorized as specialcause or common-cause, then there are appropriate strategies for identifying the
causes of that variation. The quality management thinking approach is not to leave
variation to chance, but to reduce it in an attempt to improve processes and
performance.
Some Contributions from Statistics Education Researchers
The quality management approach to statistical thinking arose from the
confluence of a focus on empirical data and the need to improve processes. In
contrast, the statistics education field tended to have its origins in mathematics
education and in a deductive rather than inductive culture.
Statistics education research emerged in the late 1970s and focused mainly on
probability (e.g., Fischbein, 1975; Tversky & Kahneman, 1982). It has really only
been in the last decade that statistical thinking has begun to be addressed. We will
now discuss some of these developments.
34
MAXINE PFANNKUCH AND CHRIS WILD
Integrating the Statistical and the Contextual
It emerged from the research of Biehler and Steinbring (1991) that the interplay
between data and context was essential for the generation and interpretation of
graphical representations. They used the term statistical detective work to describe
this process of questioning the data through to a judgment or decision about the
original situation. Shaughnessy, Garfield, and Greer (1996, p. 206) also suggested
that students need to set up a dialogue with data with the mind-set of a detective and
to “look behind the data” since data arise from a specific context.
Data are often gathered and presented by someone who has a particular agenda. The
beliefs and attitudes lying behind the data are just as important to include in the
treatment of data handling as are the methods of organizing and analyzing the data …
it is mathematical detective work in a context … relevance, applicability, multiple
representations and interpretations of data are lauded in a data handling environment.
Discussion and decision-making under uncertainty are major goals … so too are
connections with other disciplines.
Transnumeration and Context Knowledge
From their research on students involved in statistical projects using technology,
Ben-Zvi and Friedlander (1997) emphasized, in their hierarchy of thinking modes,
the role of representation and implicitly the role of context. Students who were
handling multiple representations in a meaningful and creative way, and were using
graphs to search for patterns and to convey ideas—coupled with a critical attitude—
were considered to be thinking statistically. One of the main notions identified in
this hierarchy is the fundamental type of statistical thinking that we call
transnumeration.
Hancock, Kaput, and Goldsmith (1992, p. 339) view statistics from a modeling
perspective encapsulating the idea that data are a model of a real-world situation.
They identified data creation and data analysis as making up the domain of data
modeling. “Like any model it is a partial representation and its validity must be
judged in the context of the uses to which it will be put. The practical understanding
of this idea is the key to critical thinking about data-based arguments.” They state
that data creation has been neglected and includes:
Deciding what data to collect, designing a structure for organizing the data and
establishing systematic ways of measuring and categorizing … data creation informs
data analysis because any conclusion reached through analysis can only be as reliable
and relevant as the data on which it is based. The most interesting criticisms of a
data-based argument come not from scrutinizing graphs for misplotted points … but
from considering some important aspect of the situation that has been neglected,
obscured or biased in the data collection.
This is a good example of (1) transnumeration at the beginning of the problem
when relevant “measures” need to be captured from the real system and (2) bringing
UNDERSTANDING OF STATISTICAL THINKING
35
to the problem context knowledge of the situation and integrating it with statistical
knowledge to challenge the interpretation of the data.
Reasoning with Statistical Models
Hancock et al. (1992) and Konold, Pollatsek, Well, and Gagnon (1997)
conclude, from their research on students, that reasoning about group propensities
rather than individual cases is fundamental in developing statistical thinking. But,
according to research by Konold et al. (1997), students dealing with data find it very
difficult to make the transition from thinking about and comparing individual cases
to aggregate-based reasoning. For example, in mathematics one counterexample
disproves a conjecture or claim, whereas in statistics a counterexample (an
individual case) does not disprove a theory concerning group propensities.
Furthermore, for students to reason with a statistical graph they must “see” patterns
in the data set as a whole, with the proviso that patterns can be seen in randomness
and that individual-based reasoning may be required in some situations.
Recognition of the Need for Data
Hancock et al. (1992), Konold et al. (1997), and Watson et al. (1995) have
observed in their research that it was not unusual to find students who expected that
the collection and analysis of data would confirm their personal knowledge of the
situation. In fact, the students often ignored the graphs they had constructed and
wrote their conclusions based on their own beliefs. This fundamental statistical
thinking element, which some students seem to lack, is the recognition that data are
needed to judge a situation. This facet includes the recognition that personal
experience and opinions may be inadequate or possibly biased, and furthermore that
opinions may need to be revised in light of the evidence gained.
Statistical Thinking and Interacting with Statistically Based Information
Many mathematics curricula (e.g., Ministry of Education, 1992) have
incorporated the interpretation and critical evaluation of media and other statistically
based reports as a desirable outcome in a statistics course. This is not surprising
given the high level of statistical information present in the media (Knight et al.,
1993) and that the general aim of education programs is to produce literate citizens.
The ability to question claims in the media and to critically evaluate such reports
requires high-level thinking skills (Watson, 1997). When students are confronted
with having to form a judgment on a report, they have to weigh up what they are
willing to believe, what else should be done, or what should be presented to them to
convince them further. Gal (1997) suggests that evaluation of a report requires
students to have a critical list of “worry” questions in their heads, coupled with a
critical disposition. This list of worry questions is based on critiquing the
investigative cycle stages. This underlying thinking requires the students to place
themselves in the position of being the investigators and thereby determining the
36
MAXINE PFANNKUCH AND CHRIS WILD
considerations that an investigator should give to such aspects as the measures,
design, alternative explanations, inference space, and so forth. In doing so, the
student checks for possible flaws in the design and reasoning. This evaluation
process requires the students to use not only their statistical knowledge, but their
contextual knowledge. Often when thinking of, for example, alternative explanations
for the meaning of findings, students must “consider other information about the
problem context or consult world knowledge they may have to help in ascribing
meaning to the data” (Gal, 1997, p. 50).
Gal and Watson, through their research, have alerted statistics educators to the
fact that involving students in statistical investigations does not appear to fully
develop statistical thinking. Gal et al. (1995, p. 25) believe the reason for this is
“both an issue of skill transfer, as well as the fact that a somewhat different set of
cognitive skills and dispositions is called for.” Therefore it would seem that specific
instruction in the evaluation of statistically based reports is required to fully develop
statistical thinking.
Probabilistic and Deterministic Thinking
Apart from Biehler’s (1994) work, educationists have not paid a great deal of
attention to explicating statistical thinking from a practitioner perspective. Biehler
(1994) believes there are two cultures of thinking in statistics, deterministic and
probabilistic. This deterministic thinking is demonstrated in the methods of
exploratory data analysis (EDA), which does not try to calibrate variability in data
against a formal probability model. Patterns are sought in an attempt to search for
causes; but there is the awareness that people often “see” patterns in randomness,
and a filter is needed for such a phenomenon. “EDA people seem to appreciate
subject matter knowledge and judgment as a background for interpreting data much
more than traditional statisticians seem to” (Biehler, 1994, p. 7).
Probabilistic thinking occurs when reasoning with theoretical probability
models, for example, in situations where the argument is based on the data being a
random sample from a particular model. Biehler (1999, p. 261) argues strongly that
the modeling of a system by a probability distribution can “reveal new types of
knowledge, new causes, explanations and types of factors that cannot be detected at
the individual level.” Systematic and random variation and their complementary
roles also need to be understood (Konold et al., 1991) in these situations. Therefore
Biehler suggests that statistical thinking requires both probabilistic and deterministic
thinking as well as both aggregate-based and individual-based reasoning. This shift
toward EDA in statistics, which was influenced by the 1962 landmark paper of
Tukey (Kotz & Johnson, 1992) and further developed by him (see Tukey, 1977), has
focused statistics educators’ attention on the fact that statistical thinking involves a
context knowledge base, a statistical knowledge base, variation as a core component,
a search for causes, and reasoning with statistical and probability models.
UNDERSTANDING OF STATISTICAL THINKING
37
Variation as Fundamental in Statistical Thinking
The notion that variation is fundamental in statistical thinking was not
recognized by educationists until recently (Shaughnessy, 1997; Pfannkuch, 1997),
although the idea was being vigorously promoted by statisticians with an interest in
education (e.g., Moore, 1990). Shaughnessy (1997) believes the lack of research and
mention of variation is that research largely reflects the emphasis in curricula
materials. This situation is now being addressed by Shaughnessy, Watson, Moritz, &
Reading (1999) who, in their research, have found a lack of clear growth in students’
conceptions of variability for a particular task.
From this brief overview of research into students’ thinking, we note that the
fundamental elements of statistical thinking have been identified in statistics
education research. The variation element has only recently been addressed. It is a
powerful underlying conception that allows us to relate behavior we can actually
observe to the abstract ideas of pattern, exceptions, and randomness. Statistics
education research has added important insights into statistical thinking by
identifying the way students think and by recognizing that statistical thinking is not
an innate, nor a community way of thinking. It must be specifically learned and
developed in an educational environment and in the statistics discipline. Statistics
education researchers have highlighted the difficulties students have in making the
transition to a statistical way of thinking. They have also promoted awareness that
statistical thinking involves a different set of cognitive skills in the arena of
empirical enquiry and in the arena of the evaluation of statistically based reports.
Some Contributions from Statisticians
In the last decade in the statistics literature, David Moore has been vigorously
promoting the idea that the development of a statistical way of thinking must be
central in the education process and that the variation-type thinking should be at the
heart of statistics education. By 1996 the board of directors of the American
Statistical Association (ASA) had approved recommendations that the curriculum
should emphasize the elements of statistical thinking (Moore, 1997) and adopted a
definition very similar to that given by Moore (1990, below).
Variation Is the Core of Statistical Thinking
Moore (1990, p. 135) summarizes statistical thinking as:
•
•
The omnipresence of variation in processes. Individuals are variable;
repeated measurements on the same individual are variable. The domain of
strict determinism in nature and in human affairs is quite circumscribed.
The need for data about processes. Statistics is steadfastly empirical rather
than speculative. Looking at the data has first priority.
38
MAXINE PFANNKUCH AND CHRIS WILD
•
•
•
The design of data production with variation in mind. Aware of sources of
uncontrolled variation, we avoid self-selected samples and insist on
comparison in experimental studies. And we introduce planned variation into
data production by use of randomization.
The quantification of variation. Random variation is described
mathematically by probability.
The explanation of variation. Statistical analysis seeks the systematic effects
behind the random variability of individuals and measurements.
Moore (1992a, p. 426) extends this notion of the centrality of variation by stating
that “pupils in the future will bring away from their schooling a structure of thought
that whispers ‘variation matters.’” What specifically that structure of thought is and
how it would be articulated or modeled in the teaching process is a matter of
conjecture. At the root of that structure appears to be ideas about determinism and
indeterminism.
There is a minefield of interrelated and overlapping concepts surrounding
variation, randomness, chance, and causation. Section 3 of Wild and Pfannkuch
(1999) attempts to explicate the distinctions.
Arguing with a Context Knowledge Base
Cobb and Moore (1997, p. 801) also believe that context plays an important role
in how to think with data: “statistics requires a different kind of thinking, because
data are just not numbers, they are numbers with a context.” They emphasize that
the data “literature” must be known in order to make sense of data distributions.
When looking for patterns, data analysts must ultimately decide “whether the
patterns have meaning and whether they have any value”; this will depend on “how
the threads of those patterns interweave with the complementary threads of the story
line,” since the “context provides meaning” (Cobb and Moore, 1997, p. 803).
Hawkins (1996) concurs, stating that students are statistically illiterate if they think
that the statistical distribution is the final product.
Context knowledge is also essential for judging (1) the quality of the data arising
from a particular data collection design and (2) the relevance of the data to the
problem. Mallows (1998, p. 2) believes that statisticians have not paid enough
attention to thinking about what he calls the zeroth problem: “considering the
relevance of the observed data, and other data that might be observed, to the
substantive problem.” He is concerned that thinking about the relevance of the data
to the problem should not be neglected when statisticians attempt to capture
measures from the real situation, since “statistical thinking concerns the relation of
quantitative data to a real-world problem, often in the presence of variability and
uncertainty. It attempts to make precise and explicit what the data has to say about
the problem of interest” (Mallows, 1998, p. 3). Moore (1997) and Hoerl, Hahn, &
Doganaksoy (1997) emphasize that attention should be paid to the design of the data
production process since context knowledge about the design will enable the quality
of the data to be assessed. Hawkins (1996) extends this notion further by suggesting
UNDERSTANDING OF STATISTICAL THINKING
39
students cannot acquire statistical reasoning without knowing why and how the data
were collected. Scheaffer (1997, p. 156) also emphasizes the importance of knowing
“how the data originated [and] what the numbers might mean.” Moore (1998, p.
1263) perhaps sums up these concerns: “effective use of statistical reasoning
requires considering the zeroth problem and interpretation of formal results in the
context of a specific setting.” The implication is that statistical thinking involves
going beyond and looking behind the data, and making connections to the context
from which they came.
Transnumeration
Data reduction and data representation are an essential requirement of dealing
with masses of data. Moore (1998, p. 1258) considers “statistical thinking offers
simple but non-intuitive tools for trimming the mass, ordering the disorder,
separating sense from nonsense, selecting the relevant few from the irrelevant
many.” Thus thought processes must be triggered for initiating the changing of the
data into a manageable form from which information can be gleaned. Hawkins
(1997, p. 144) coins the term informacy in an attempt to describe such reasoning and
thinking. To be informate means “one requires skills in summarizing and
representing information, be it qualitative or quantitative, for oneself and others.”
We believe this transnumeration type of thinking is fundamental for data-handling
processes.
The communication of messages in the data, transnumeration-type thinking, is
intimately linked with inferential thinking. Apart from considering the relevance of
the data to the problem, it is also important to consider the inferences that can be
made from the data. W. E. Deming first raised the important distinction between
enumerative and analytical studies in 1950 (for a detailed discussion, see Hahn &
Meeker, 1993). The aim of an enumerative study is to describe the current situation,
whereas the aim of an analytical study is to take actions on or make predictions
about a future population or process. The space for reliable statistical inference is
limited to the population or process actually sampled. For example, a public opinion
poll to assess the current view of U.S. voters on who they would vote for in the next
election is an enumerative study. Formal inference will provide reasonably reliable
answers. If the poll was used to predict the outcome of the next election (future
process), the study then becomes analytic. Many, if not most, important problems
require using data from current processes or populations to make predictions about
the likely behavior of future processes or populations. There are no statistically
reliable ways of doing this. Our measures of uncertainty reflect uncertainty about the
true characteristics of the current process, thus understating rational levels of
uncertainty about the future process. The validity of extrapolation to future
processes can be justified only by contextual knowledge of the situation.
40
MAXINE PFANNKUCH AND CHRIS WILD
Statistical Thinking as a Way of Perceiving the World
Ullman (1995) perceives the framework in which statistical thinking operates as
being broadly based, to the extent that it could be used informally in everyday life.
“We utilize our quantitative intelligence all the time. … We are measuring,
estimating and experimenting all without formal statistics” (p. 6). Ullman believes
this quantitative intelligence is unique to statistics. Some principles he suggests as a
basis for quantitative intelligence follow: “to everything there is a purpose; most
things we do involve a process; measurements inform us; typical results occur;
variation is ever present; evaluation is on going; decisions are necessary” (p. 5).
Quantitative intelligence allows a statistical perception of reality.
Statistical Thinking Is an Independent Intellectual Method
Statistics is an epistemology in its own right; it is not a branch of mathematics
(Moore, 1992b). Hawkins (1996) suggests that a mathematically educated person
can be statistically illiterate. Statistical thinking, states Moore (1998, p. 1263), “is a
general, fundamental and independent mode of reasoning about data, variation and
chance.” Ullman (1995, p. 2) concurs that statistical thinking or quantitative
intelligence is an inherently different way of thinking because the reasoning
involves dealing with uncertain empirical data: “I claim that statistical thinking is a
fundamental intelligence.”
The statistical thinking promulgated by these statisticians is encapsulated as an
independent intellectual method. Its domain is the empirical enquiry cycle, but the
domain should also be extended to a way of thinking about and perceiving the
world. Statistical thinking goes beyond the domain of mathematics, which
statisticians use simply as a means to help them achieve their own ends. The nature
of statistical thinking is explained by these statisticians as noticing, understanding,
using, quantifying, explaining, and evaluating variation; thinking about the data
“literature”; capturing relevant data and measurements; summarizing and
representing the data; and taking account of uncertainty and data variability in
decision making.
DISCUSSION AND SUMMARY
Statistical Thinking and Empirical Enquiry
The Wild & Pfannkuch (1999) four-dimensional model (Figure 1) was an
attempt to characterize the way experienced statistical practitioners think when
conducting empirical enquiries. As such it represents a goal for education programs
to strive for. The model was developed as a result of interviewing statisticians and
tertiary students about statistical projects they had been involved in; interviewing
UNDERSTANDING OF STATISTICAL THINKING
41
tertiary students as they performed statistical tasks; and analyzing the literature
described earlier. The research focused on statistical thinking at the broad level of
the statistical enquiry cycle, ranging from problem formulation to the
communication of conclusions. Our four-dimensional framework (Figure 1) for
statistical thinking in empirical enquiry describes a nonhierarchical, nonlinear,
dynamic way of thinking that encompasses an investigative cycle, an interrogative
cycle, types of thinking, and dispositions, all of which are brought to bear in the
solving of a statistically based problem. The thinker operates in all four dimensions
at once. For example, the thinker could be categorized as currently being in the
planning stage of the investigative cycle (Dimension 1), dealing with some aspect of
variation in Dimension 2 (types of thinking) by criticizing a tentative plan in
Dimension 3 (interrogative cycle) driven by skepticism in Dimension 4
(dispositions).
The investigative cycle (Figure 1a) describes the procedures a statistician works
through and what the statistician thinks about in order to learn more in the context
sphere. The dispositions (Figure 1d) affect or even initiate entry of the thinker into
the other dimensions. The interrogative cycle (Figure 1c) is a generic thinking
process that is in constant use by statisticians as they carry out a constant dialogue
with the problem, the data, and themselves. It is an interrogative and evaluative
process that requires effort to make sense of the problem and the data with the aim
of eventually coming to some resolutions about the problem and data during that
dialogue. The types of thinking (Figure 1b) are divided into generic types of
thinking, which are common to all problem solving, and fundamental statistical
types of thinking, which we believe are inherently statistical (see the section titled
“Model for Interpretation of Literature”). These types of thinking reflect that
thinking, when applied in a statistical context, will enable the statistician to abstract
a statistical question from the real situation; capture cogent elements of that reality
in measurements and statistical models; work within models using statistical
methods to draw out inferences from the data; and communicate what has been
learned from the data about the real situation.
This framework was an attempt to make explicit what has previously been
largely implicit—the thinking processes used by practitioners during data-based
enquiry. According to Resnick (1987, p. 35), “each discipline has [its own]
characteristic ways of reasoning,” and such thinking processes should be embedded
into the teaching and learning of that discipline. Statistical problem solving requires
holistic thinking informed by statistical elements. These peculiarly statistical
elements appear as the “Types Fundamental to Statistical Thinking” in Dimension 2
(Figure 1b).
From a survey of history, literature, and our own exploratory studies, we believe
our four-dimensional framework is one way of incorporating this knowledge into a
current explication of what we understand to be statistical thinking in the domain of
problem solving. This framework does not, however, address statistical thinking in
the arenas of evaluating enquiries and in everyday life, but it can shed light on them.
We want students to learn to interact with accounts of statistical investigations
performed by others—in “the information-using domain” (Barabba, 1991; Gal,
2000). Statistically based information will be used by students to obtain information
42
MAXINE PFANNKUCH AND CHRIS WILD
about societal issues; to make decisions about their own lives in areas such as
medicine, gambling and insurance; and to make decisions in their occupations such
as marketing, manufacturing, and law. Major sources include technical reports
written by investigators and media reports, which are typically at least third-hand
summaries. Two main processes need to be invoked. One addresses the question,
“To what extent do I trust this information?” and the other extracts meaning from
the information. Critical appraisal of information in a report largely consists of
appraising the way in which the investigators have proceeded through the steps of
PPDAC (Problem, Plan, Data, Analysis, Conclusions) in Dimension 1. We often
find fatal flaws through inappropriate choices of the measures used, the study
design, and the analysis used; and have learned to beware, at the conclusions stage,
of extrapolations beyond the sampled inference space. Extracting meaning tends to
be given less emphasis in teaching than more peripheral issues such as misleading
graphics. (With a little knowledge, we can often extract correct information from a
“misleading” graph.) We argue that, apart from the use of reading strategies, the
extracting of meaning that goes on in the interpretation of reports is a subset of the
extracting of meaning that is required during investigation. Since knowledge about
investigations precedes the ability to criticize, this implies that statistical thinking in
empirical enquiry is an extremely basic form of statistical thinking. Even though the
evaluation of enquiries is based on knowledge of the investigation process, it still
requires specific instruction to enhance the links and connections.
In addition, there is statistical thinking that affects our interpretation of the
phenomena and happenstance information we come across in daily life; such
thinking skills can be valuable even in the absence of data. In particular, many
everyday lessons flow from an appreciation of variation, as described by Tversky
and Kahneman (1982), Snee (1999), and Britz et al. (1997). We know that our
statistical learning can sensitize us to such issues as bias, small sample size, and
variation in the “data” that we gain through our own experience, and it can alter the
way we think about risk and making decisions. It seems to us that there is potentially
valuable work to be done in assembling these ideas and giving them some coherence
(see Gigerenzer, Todd,& ABC Research Group, 1999; Gigerenzer, 2002). It also
occurs to us that coherence might not even be possible, since people experience
reality in their own unique ways. We might be dealing with inherently fragmentary
side benefits of an appreciation of investigation. But someone needs to make the
attempt. Unless the link is directly made, in the teaching process, to the “data”
gained through people’s own experience, statistical education will not help develop
the way people think in everyday life.
Statistical thinking is thought processes that are triggered (1) during data-based
enquiry to solve a practical problem, (2) during interaction with a data-based
argument, and (3) during interaction with data-based phenomena within one’s
operational environment. This “art” of thinking is new and is increasingly becoming
an integral part of many areas of human thought. Its importance should not be
underestimated. The development of statistical thinking should be seen by educators
as crucial for understanding and operating in today’s environment and for perceiving
a world reality. The challenge is to find ways to incorporate its explication into
pedagogical practice.
UNDERSTANDING OF STATISTICAL THINKING
43
Implications for Teaching and Assessing Students
The development of students’ statistical thinking presents four major challenges
in teaching. The first challenge for educators is to raise awareness about the
characteristics of statistical thinking, to reach a common consensus on their
understanding of it, and to develop a common language to describe and
communicate it. The second challenge is to recognize statistical thinking in a variety
of contexts and situations and be able to explain and justify how and why that type
of communication constitutes statistical thinking (e.g., Chance, 2002). When
educators themselves are sufficiently attuned to recognition of statistical thinking,
then the third challenge is to develop teaching strategies that will promote and
enhance students’ statistical thinking. It will also require mapping out a
developmental pathway for statistical thinking across the curriculum and learning
about and recognizing the intuitive statistical thinking that is already present in
students (e.g., Pfannkuch & Rubick, 2002). The final challenge is to implement
teaching and assessment strategies that focus on developing students’ statistical
thinking. This should include acculturating students to how statisticians reason and
work within the statistics discipline and developing new ways for them to view the
world.
REFERENCES
Bailar, B. (1988). Statistical practice and research: The essential interactions. Journal of the American
Statistical Association, 83(401), 1–8.
Barabba, V. (1991). Through a glass lens darkly. Journal of the American Statistical Association,
86(413), 1–8.
Bartholomew, D. (1995). What is statistics? Journal of the Royal Statistical Society A, 158 (Part 1), 1–20.
Ben-Zvi, D., & Friedlander, A. (1997). Statistical thinking in a technological environment. In J. Garfield
& G. Burrill (Eds.), Research on the role of technology in teaching and learning statistics (pp. 45–
55). Voorburg, The Netherlands: International Statistical Institute.
Biehler, R. (1994). Probabilistic thinking, statistical reasoning and the search for causes: Do we need a
probabilistic revolution after we have taught data analysis? In J. Garfield (Ed.), Research Papers from
The Fourth International Conference on Teaching Statistics, Marrakech, 1994. Minneapolis, MN:
University of Minnesota.
Biehler, R. (1999). Discussion: Learning to think statistically and to cope with variation. International
Statistical Review, 67(3), 259–262.
Biehler, R., & Steinbring, H. (1991). Entdeckende Statistik, Stengel-und-Blatter, Boxplots: Konzepte,
Begrundungen und Erfahrungen eines Unterrichtsversuches. Der Mathematikunterricht, 37(6), 5–32.
Box, J. F. (1978). R. A. Fisher, The life of a scientist. New York: Wiley.
Box, J. F. (1997). Fisher, Ronald Aylmer. In N. Johnson & S. Kotz (Eds.), Leading personalities in
statistical sciences: From the 17th century to the present. New York: Wiley.
Britz, G., Emerling, D., Hare, L., Hoerl, R., & Shade, J. (1997). How to teach others to apply statistical
thinking. Quality Progress, June 1997, 67–79.
Chance, B. (2002). Components of statistical thinking and implications for instruction and assessment.
Journal of Statistics Education, 10(3). Retrieved February 10, 2003 from
http://www.amstat.org/publications/jse/v10n3/chance.html
Cline Cohen, P. (1982). A calculating people: The spread of numeracy in early America. Chicago:
University of Chicago Press.
44
MAXINE PFANNKUCH AND CHRIS WILD
Cobb, G., & Moore, D. (1997). Mathematics, statistics and teaching. American Mathematical Monthly,
104(9), 801–823.
Cohen, I. B. (1984). Florence Nightingale. Scientific American, 250(3), 98–107.
David, F. (1962). Games, gods and gambling. London: Charles Griffen.
Davis, P., & Hersh, R. (1986). Descartes’ dream. Orlando, FL: Harcourt Brace Jovanovich.
Falk, R., & Konold, C. (1992). The psychology of learning probability. In F. & S. Gordon (Eds.),
Statistics for the twenty-first century. MAA Notes, no. 29 (pp. 151–164). Washington, DC:
Mathematical Association of America.
Fienberg, S., & Tanur, J. (1996). Reconsidering the fundamental contributions of Fisher and Neyman on
experimentation and sampling. International Statistical Review, 64(3), 237–253.
Fischbein, E. (1975). The intuitive sources of probabilistic thinking in children. Dordrecht, The
Netherlands: Reidel.
Gail, M. (1996). Statistics in action. Journal of the American Statistical Association, 91(433), 1–13.
Gal, I. (1997). Assessing students’ interpretation of data. In B. Phillips (Ed.), IASE papers on statistical
education ICME-8, Spain, 1996 (pp. 49–57). Hawthorn, Australia: Swinburne Press.
Gal, I. (2000). Statistical literacy: Conceptual and instructional issues. In D. Coben, J. O’Donoghue, & G.
FitzSimons (Eds.), Perspectives on adults learning mathematics (pp. 135–150). Dordrecht, The
Netherlands: Kluwer Academic Publishers.
Gal, I., Ahlgren, C., Burrill, G., Landwehr, J., Rich, W., & Begg, A. (1995). Working group: Assessment
of interpretive skills. In Writing group draft summaries, Conference on Assessment Issues in Statistics
Education (pp. 23–25). Philadelphia: University of Pennsylvania.
Gigerenzer, G. (2002). Calculated risks: How to know when numbers deceive you. New York: Simon &
Schuster.
Gigerenzer, G., Todd, P.M., & ABC Research Group (1999). Simple heuristics that make us smart. New
York: Oxford University Press.
Greenwood, M. (1970). Medical statistics from Graunt to Farr. In E. S. Pearson & M. G. Kendall (Eds.),
Studies in the history of statistics and probability (pp. 47–126). London: Charles Griffen.
Hacking, I. (1975). The emergence of probability: A philosophical study of early ideas about probability,
induction and statistical inference. Cambridge, England: Cambridge University Press.
Hahn, G., & Meeker, W. (1993). Assumptions for statistical inference. American Statistician, 47(1), 1–
11.
Hancock, C., Kaput, J., & Goldsmith, L. (1992). Authentic enquiry with data: Critical barriers to
classroom implementation. Educational Psychologist, 27(3), 337–364.
Hare, L., Hoerl, R., Hromi, J., & Snee, R. (1995, February). The role of statistical thinking in
management. Quality Progress, 28(2), 53–60.
Hawkins, A. (1996). Can a mathematically-educated person be statistically illiterate? Mathematics for the
new Millennium—What needs to be changed and why? Nuffield Foundation: pre-conference paper
(pp. 107–117).
Hawkins, A. (1997). Discussion—New pedagogy and new content: The case of statistics. International
Statistical Review, 65(2), 141–146.
Hill, A. B. (1965). The environment and disease: Association or causation. Proceedings of the Royal
Society of Medicine, 58, 295–300.
Hoerl, R., Hahn, G., & Doganaksoy, N. (1997). Discussion—New pedagogy and new content: The case
of statistics. International Statistical Review, 65(2), 147–153.
Joiner, B. (1994). Fourth generation management. New York: McGraw-Hill.
Joiner, B., & Gaudard, M. (1990, December). Variation, management, and W. Edwards Deming. Quality
Progress, 23(12), 29–37.
Kendall, M. G. (1970). Where shall the history of statistics begin? In E. S. Pearson & M. G. Kendall
(Eds.), Studies in the history of statistics and probability (pp. 45–46). London: Charles Griffen.
Kettenring, J. (1997). Discussion—New pedagogy and new content: The case of statistics. International
Statistical Review, 65(2), 153.
Knight, G., Arnold, G., Carter, M., Kelly, P., & Thornley, G. (1993). The mathematical needs of New
Zealand school leavers. Palmerston North, New Zealand: Massey University.
Konold, C., Lohmeier, J., Pollatsek, A., Well, A., Falk, R., & Lipson, A. (1991). Novice views on
randomness. In Proceedings of the Thirteenth Annual Meeting of the International Group for the
Psychology of Mathematics Education—North American Chapter (pp. 167–173). Blacksburg, VA:
Virginia Polytechnic Institute and State University.
UNDERSTANDING OF STATISTICAL THINKING
45
Konold, C., Pollatsek, A., Well, A., & Gagnon, A. (1997). Students analyzing data: Research of critical
barriers. In J. Garfield & G. Burrill (Eds.), Research on the role of technology in teaching and
learning statistics (Proceedings of the 1996 International Association of Statistics Education round
table conference, pp. 151–167). Voorburg, The Netherlands: International Statistical Institute.
Kotz, S., & Johnson, N. (1992). Breakthroughs in statistics, Volumes I–III. New York: Springer-Verlag.
Krishnan, T. (1997). Fisher’s contributions to statistics. Resonance Journal of Science Education, 2(9),
32–37.
Lightner, J. (1991). A brief look at the history of probability and statistics. Mathematics Teacher, 84(8),
623–630.
Mallows, C. (1998). 1997 Fisher Memorial Lecture: The zeroth problem. American Statistician, 52(1), 1–
9.
McKean, K. (1985, June). Decisions, decisions. Discover, 6, 22–33.
Ministry of Education (1992). Mathematics in New Zealand Curriculum. Wellington, New Zealand:
Learning Media.
Moore, D. (1990). Uncertainty. In L. Steen (Ed.), On the shoulders of giants: New approaches to
numeracy (pp. 95–137). Washington, DC: National Academy Press.
Moore, D. (1992a). Statistics for all: Why? What and how? In D. Vere-Jones (Ed.), Proceedings of the
Third International Conference on Teaching Statistics, Vol. 1 (pp. 423–428). Voorburg, The
Netherlands: International Statistical Institute.
Moore, D. (1992b). Teaching statistics as a respectable subject. In F. & S. Gordon (Eds.), Statistics for
the twenty-first century. MAA Notes, no. 26 (pp. 14–25). Washington, DC: Mathematical Association
of America.
Moore, D. (1997). New pedagogy and new content: The case of statistics. International Statistical
Review, 65(2), 123–165.
Moore, D. (1998). Statistics among the liberal arts. Journal of the American Statistical Association,
93(444), 1253–1259.
Pfannkuch, M. (1997). Statistical thinking: One statistician’s perspective. In F. Biddulph & K. Carr
(Eds.), People in mathematics education (Proceedings of the 20th annual conference of the
Mathematics Education Research Group of Australasia, pp. 406–413). Rotorua, New Zealand:
MERGA.
Pfannkuch, M., & Rubick, A. (2002). An exploration of students’ statistical thinking with given data.
Statistics Education Research Journal, 1(2), 4-21. Retrieved December 19, 2002 from
http://fehps.une.edu.au/serj/
Porter, T. M. (1986). The rise of statistical thinking 1820–1900. Princeton, NJ: Princeton University
Press.
Porter, T. M. (1995). Trust in numbers: The pursuit of objectivity in science and public life. Princeton,
NJ: Princeton University Press.
Provost, L., & Norman, C. (1990, December). Variation through the ages. Quality Progress, 23(12), 39–
44.
Pyzdek, T. (1990). There’s no such thing as a common cause. Proceedings of American Society for
Quality Control 44th Annual Quality Congress Transactions—San Francisco (pp. 102–108).
Milwaukee, WI: ASQC.
Resnick, L. (1987). Education and learning to think. Washington, DC: National Academy Press.
Salsburg, D. (2001). The lady tasting tea: How statistics revolutionized science in the twentieth century.
New York: Freeman.
Scheaffer, R. (1997). Discussion—New pedagogy and new content: The case of statistics. International
Statistical Review, 65(2), 156–158.
Scheaffer, R. (2001). Statistics education: Perusing the past, embracing the present, and charting the
future. Newsletter of the Section on Statistical Education of the American Statistical Association, 7(1),
Winter 2001. (Reprinted in Statistics Education Research Newsletter, 2(2), May 2001. Retrieved May
18, 2001 from http://www.ugr.es/local/batanero/sergroup.htm)
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 465–494). New
York: Macmillan.
46
MAXINE PFANNKUCH AND CHRIS WILD
Shaughnessy, J. M. (1997). Missed opportunities in research on the teaching and learning of data and
chance. In F. Biddulph & K. Carr (Eds.), People in mathematics education (Proceedings of the 20th
annual conference of the Mathematics Education Research Group of Australasia, pp. 6–22). Rotorua,
New Zealand: MERGA.
Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data handling. In A. Bishop, K. Clements, C.
Keitel, J. Kilpatrick, & C. Laborde (Eds.), International handbook of mathematics education (pp.
205–238). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Shaughnessy, J. M., Watson, J., Moritz, J., & Reading, C. (1999). School mathematics students’
acknowledgement of statistical variation. Paper presented at the research pre-sessions of 77th annual
meeting of the National Council of Teachers of Mathematics, San Francisco, 1999.
Shewhart, W., & Deming, W. E. (Ed.). (1986). Statistical method from the viewpoint of quality control.
New York: Dover Publications. (Original work published 1939)
Snee, R. (1990). Statistical thinking and its contribution to quality. American Statistician, 44(2), 116–121.
Snee, R. (1993). What’s missing in statistical education? American Statistician, 47(2), 149–154.
Snee, R. (1999). Discussion: Development and use of statistical thinking: A new era. International
Statistical Review, 67(3), 255–258.
Stigler, S. (1986). The history of statistics—The measurement of uncertainty before 1900. Cambridge,
MA: Belknap Press of Harvard University Press.
Tukey, J. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.
Tversky, A., & Kahneman, D. (1982). Judgment under uncertainty: Heuristics and biases. In D.
Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp.
3–20). New York: Press Syndicate of the University of Cambridge. (Originally published in Science,
185 (1974), 1124–1131.)
Ullman, N. (1995). Statistical or quantitative thinking as a fundamental intelligence. Unpublished paper,
County College of Morris, Randolph, NJ.
Watson, J. (1997). Assessing statistical thinking using the media. In I. Gal & J. Garfield (Eds.), The
assessment challenge in statistics education (pp. 107–121). Amsterdam: IOS Press.
Watson, J., Collis, K., Callingham, R., & Moritz, J. (1995). A model for assessing higher order thinking
in statistics. Educational Research and Evaluation, 1, 247–275.
Wild, C., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry (with discussion).
International Statistical Review, 67(3), 223–265.
Chapter 3
STATISTICAL LITERACY
1
Meanings, Components, Responsibilities
Iddo Gal
University of Haifa, Israel
INTRODUCTION AND NEED
Many curriculum frameworks and national and international educational initiatives,
including but not limited to those focusing on the mathematical sciences, underscore
the importance of enabling all people to function effectively in an information-laden
society (e.g., United Nations Educational, Scientific and Cultural Organization
[UNESCO], 1990; Australian Education Council, 1991; American Association for
the Advancement of Science (AAAS), 1995; European Commission, 1996; National
Council of Teachers of Mathematics [NCTM], 2000). The present paper focuses on
statistical literacy, one critical but often neglected skill area that needs to be
addressed if adults (or future adults) are to become more informed citizens and
employees.
Statements regarding the importance of statistical reasoning or statistical
knowledge in society have been eloquently made in the past. For example, Moore
(1998), in his presidential address to the American Statistical Association (ASA),
claimed that it is difficult to think of policy questions that have no statistical
component, and argued that statistics is a general and fundamental method because
data, variation and chance are omnipresent in modern life. Wallman (1993), in a
1992 ASA presidential address, emphasized the importance of strengthening
understanding of statistics and statistical thinking among all sectors of the
population, in part due to the various misunderstandings, misperceptions, mistrust,
and misgivings that people have toward the value of statistics in public and private
choices. Researchers interested in cognitive processes have emphasized the
contribution of proper judgmental processes and probabilistic reasoning to people’s
1
This chapter is a reprint of “Adults’ statistical literacy: Meaning, components,
responsibilities,” from the International Statistical Review, 70, pages 1–52, copyright 2002,
and is reproduced here with the permission of the International Statistical Institute. All rights
reserved.
47
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 47–78.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
48
IDDO GAL
ability to make effective decisions (Kahneman, Slovic, & Tversky, 1982) and
showed that training in statistics can aid in solving certain types of everyday
problems (Kosonen & Winne, 1995). Industry trainers and education planners have
pointed to the important role of statistical understanding and mathematical
competencies as a component of the skills needed by workers in diverse industries
(e.g., Carnevale, Gainer, & Meltzer, 1990; Packer, 1997).
While these and other sources have helped to highlight the centrality of
statistical literacy in various life contexts, few attempts to describe the nature of
adults’ overall statistical literacy have been published to date. It is necessary to first
grapple with definitional issues. In public discourse “literacy” is sometimes
combined with terms denoting specific knowledge domains (e.g., “computer
literacy”). In such cases the usage of “literacy” may conjure up an image of the
minimal subset of “basic skills” expected of all citizens, as opposed to a more
advanced set of skills and knowledge that only some people may achieve. Along
these lines, statistical literacy may be understood by some to denote a minimal
(perhaps formal) knowledge of basic statistical concepts and procedures. Yet
increasingly the term literacy, when used as part of the description of people’s
capacity for goal-oriented behavior in a specific domain, suggests a broad cluster not
only of factual knowledge and certain formal and informal skills, but also of desired
beliefs, habits of mind, or attitudes, as well as general awareness and a critical
perspective.
In line with the expanding conception of the term literacy, Wallman (1993)
argued that statistical literacy is the ability to understand and critically evaluate
statistical results that permeate daily life, coupled with the ability to appreciate the
contributions that statistical thinking can make in public and private, professional
and personal decisions. Watson (1997) presented a framework of statistical literacy
comprised of three tiers with increasing sophistication: a basic understanding of
probabilistic and statistical terminology; an understanding of statistical language and
concepts when they are embedded in the context of wider social discussion; and a
questioning attitude one can assume when applying concepts to contradict claims
made without proper statistical foundation.
The complex and expanding meaning of domain-specific literacy can also be
illustrated by examining extant conceptions of “scientific literacy.” Shamos (1995)
reviews prior works on scientific literacy that suggest common building blocks:
basic vocabulary, understanding of science process, and understanding of the impact
of science and technology on society. Jenkins (1996) suggests that scientific literacy
can be characterized as scientific knowledge and attitudes, coupled with some
understanding of scientific methodology.
Shamos (1995) argues that it would be a simplification to assume that somebody
is either literate or illiterate in science, and suggests a continuum along which
scientific literacy can be described, comprised of three overlapping levels that build
upon each other in sophistication. The most basic one, “cultural” scientific literacy,
refers to a grasp of basic terms commonly used in the media to communicate about
science matters. Next, “functional” scientific literacy adds some substance by
requiring that “the individual not only have command of a science lexicon but also
be able to converse, read and write coherently, using such science terms in perhaps a
STATISTICAL LITERACY
49
non-technical but nevertheless meaningful context” (p. 88). This level also requires
that the person has access to simple everyday facts of nature, such as some
knowledge of the solar system (e.g., that the Earth revolves around the Sun, how
eclipses occur). Finally, “true” scientific literacy requires some understanding of the
overall scientific enterprise (e.g., basic knowledge of key conceptual schemes or
theories that form the foundation of science and how they were arrived at), coupled
with understanding of scientific and investigative processes. Examples are (see also
Rutherford, 1997): appreciation of the relativity of “fact” and “theory,” awareness of
how knowledge accumulates and is verified, the role of experiments and
mathematics in science, the ability to make sense of public communications about
scientific matters, and the ability to understand and discuss how science and
technology impinge on public life.
With the above broad usage of “literacy” and “statistical literacy” in mind, this
paper develops a conception of statistical literacy that pertains to what is expected of
adults (as opposed to students actively learning statistics), particularly those living
in industrialized societies. It is proposed here that in this context, the term statistical
literacy refers broadly to two interrelated components, primarily (a) people’s ability
to interpret and critically evaluate statistical information, data-related arguments, or
stochastic phenomena, which they may encounter in diverse contexts, and when
relevant (b) their ability to discuss or communicate their reactions to such statistical
information, such as their understanding of the meaning of the information, their
opinions about the implications of this information, or their concerns regarding the
acceptability of given conclusions. These capabilities and behaviors do not stand on
their own but are founded on several interrelated knowledge bases and dispositions
which are discussed in this paper.
Statistical literacy can serve individuals and their communities in many ways. It
is needed if adults are to be fully aware of trends and phenomena of social and
personal importance: crime rates, population growth, spread of diseases, industrial
production, educational achievement, or employment trends. It can contribute to
people’s ability to make choices when confronted with chance-based situations (e.g.,
buying lottery tickets or insurance policies, and comprehending medical advice). It
can support informed participation in public debate or community action. The need
for statistical literacy also arises in many workplaces, given growing demands that
workers understand statistical information about quality of processes (Packer, 1997),
and the contention that workers’ understanding of data about the status of their
organization can support employee empowerment (Bowen & Lawler, 1992).
The many examples of contexts where statistical literacy may be activated
indicate that most adults are consumers (rather than producers) of statistical
information. Yet, despite the centrality of statistical literacy in various life contexts,
the nature of the skills and dispositions that comprise adults’ statistical literacy have
not received detailed discussion in the literature (Gal, 1994; Watson, 1997), and are
thus the focus of this paper. Clarity on the characteristics of the building blocks of
statistical literacy is needed before other questions can be addressed in earnest
regarding assessment and instruction focused on statistical literacy.
50
IDDO GAL
A MODEL
This paper concerns itself with people’s ability to act as effective “data
consumers” in diverse life contexts that for brevity are termed here reading contexts.
These contexts emerge, for example, when people are at home and watch TV or read
a newspaper, when they look at advertisements while shopping, when they visit
Internet sites, when they participate in community activities or attend a civic or
political event, or when they read workplace materials or listen to reports at work.
They include but are not limited to exposure to print and visual media, and represent
the junctures where people encounter the much-heralded “information-laden”
environments (European Commission, 1996). In such contexts, statistical
information may be represented in three ways—through text (written or oral),
numbers and symbols, and graphical or tabular displays, often in some combination.
To simplify the presentation in this paper, the term readers will be used throughout
to refer to people when they participate in reading contexts as actors, speakers,
writers, readers, listeners, or viewers, in either passive or active roles.
Reading contexts should be distinguished from enquiry contexts, where people
(e.g., students, statisticians) engage in empirical investigation of actual data (Wild
and Pfannkuch, 1999). In enquiry contexts individuals serve as “data producers” or
“data analyzers” and usually have to interpret their own data and results and report
their findings and conclusions. Reading contexts may differ from enquiry contexts
in important ways that have not been sufficiently acknowledged in the literature on
statistical reasoning and are examined later.
This paper proposes a model, summarized in Table 1, of the knowledge bases
and other enabling processes that should be available to adults, and by implication to
learners graduating from schools or colleges, so that they can comprehend, interpret,
critically evaluate, and react to statistical messages encountered in reading contexts.
Based on earlier work such as cited above on statistical literacy and scientific
literacy, the model assumes that people’s statistical literacy involves both a
knowledge component (comprised of five cognitive elements: literacy skills,
statistical knowledge, mathematical knowledge, context knowledge, and critical
questions) and a dispositional component (comprised of two elements: critical
stance, and beliefs and attitudes).
STATISTICAL LITERACY
51
Table 1. A model of statistical literacy
Knowledge elements
Dispositional elements
Literacy skills
Statistical knowledge
Mathematical knowledge
Context knowledge
Critical Questions
Beliefs and Attitudes
Critical stance
Statistical Literacy
As with people’s overall numeracy (Gal, 2000), the components and elements in
the proposed model should not be viewed as fixed and separate entities but as a
context-dependent, dynamic set of knowledge and dispositions that together enable
statistically literate behavior. Understanding and interpretation of statistical
information requires not only statistical knowledge per se but also the availability of
other knowledge bases: literacy skills, mathematical knowledge, and context
knowledge. However, critical evaluation of statistical information (after it has been
understood and interpreted) depends on additional elements as well: the ability to
access critical questions and to activate a critical stance, which in turn is supported
by certain beliefs and attitudes.
The model’s elements are described in subsequent sections, although some
overlap with each other and do not stand in isolation. The final section of the paper
discusses resulting educational and policy challenges and implications for needed
research. The expected contribution of this paper is to facilitate further dialogue and
action by educators, practicing statisticians, policy makers, and other professionals
who are interested in how citizens can be empowered to make sense of real-world
messages containing statistical elements or arguments.
KNOWLEDGE ELEMENTS OF STATISTICAL LITERACY
This section reviews the five elements listed in Table 1 as comprising the
knowledge component of statistical literacy. It is proposed that these elements
jointly contribute to people’s ability to comprehend, interpret, critically evaluate,
and if needed react to statistical messages.
To provide a context for some of the ideas presented below, Figures 1, 2, 3, and
4 illustrate key modes through which statistical concepts and statistics-related
information or arguments are communicated to adults in the printed media, a prime
reading context. Figure 1 contains six excerpts illustrating statistical messages in
daily newspapers and magazines from different countries. Figure 2 presents a
statistics-related table from an American newspaper. Figure 3 presents a bar graph
that appeared in a widely circulated Israeli newspaper. Figure 4 includes a pie chart
52
IDDO GAL
used in the International Adult Literacy Survey (IALS; Statistics Canada and
Organization for Economic Co-operation and Development [OECD], 1996) to
simulate a newspaper graph.
Literacy Skills
A discussion of literacy skills opens the review of the knowledge bases needed
for statistical literacy, given that virtually all statistical messages are conveyed
through written or oral text, or require that readers navigate through tabular or
graphical information displays that require the activation of specific literacy skills
(Mosenthal & Kirsch, 1998).
The understanding of statistical messages requires the activation of various textprocessing skills in order to derive meaning from the stimulus presented to readers.
The written portion of a message may be quite long (as in some of the excerpts in
Figure 1) and demand complex text comprehension skills, or may sometimes
involve a graph with only a few words (Figures 3 or 4). Readers also have to
comprehend surrounding text (i.e., within which the statistical portion is embedded
or which explains a graph or chart presented) to place the statistical part in the
proper context. Depending on the circumstances, readers may have to communicate
clear opinions, orally or in writing, in which case their response should contain
enough information about the logic or evidence on which it is based to enable
another listener or reader to judge its reasonableness. Thus, statistical literacy and
general literacy are intertwined.
In the real world, readers have to be able to make sense of a wide range of
messages, formulated at different levels of complexity and in different writing or
speaking styles (Wanta, 1997). Messages may be created by journalists, officials,
politicians, advertisers, or others with diverse linguistic and numeracy skills.
Message originators may have diverse aims in terms of the presumed facts, images,
or conclusions they aim to create or instill in the mind of the reader. Some messages
may be created to convince the reader or listener to adopt a specific point of view or
reject another, and hence may use one-sided arguments or present selective
information (Clemen & Gregory, 2000), or may use modifiers (e.g., “a startling 5%
gain …”) to shape a desired impression.
As several authors have pointed out (Laborde, 1990; Gal, 1999), coping with
mathematical or statistical messages presents various demands on readers’ literacy
skills. For instance, readers have to be aware that the meanings of certain statistical
terms used in the media (e.g., random, representative, percentage, average, reliable)
may be different than their colloquial or everyday meaning. Messages may use
technical terms in a professionally appropriate way but may also contain statistical
jargon that is ambiguous or erroneous. Some newspapers and other media channels
tend to employ conventions in reporting statistical findings, such as referring to
“sampling error” (or “margin of error”) when discussing results from polls, but
without explaining the meaning of terms used.
Space and time limitations or editorial decisions may force writers (or
professionals who speak on TV) to present messages that are terse, choppy, or lack
STATISTICAL LITERACY
53
essential details. Readers may need to make various assumptions and inferences,
given the absence of details or the inability in many cases to interrogate the creators
of messages encountered. Overall, these factors can make comprehension more
challenging, complicate the interpretation task, and could place heavy demands on
readers’ literacy skills. This is true for adults from all walks of life, but especially of
adults who are bilingual or otherwise have a weak mastery of the national/dominant
language (Cocking, & Mestre, 1988). However, results from the International Adult
Literacy Survey (IALS; Statistics Canada and OECD, 1996) suggest that in most of
the countries surveyed, a large proportion of adults have only basic comprehension
skills and are unable to cope effectively with a range of everyday literacy and
computation tasks. Hence, people’s literacy skills may be a bottleneck affecting their
statistical literacy skills.
Figure 1. Illustrations of statistical texts in daily newspapers and magazines.
54
IDDO GAL
‘Matrix’ a virtual lock at No. 1
The Keanu Reeves sci-fi thriller The Matrix remained the box office champ for the second
consecutive week. Newcomers had mixed results: The romantic comedy Never Been
Kissed opened fairly strong at No. 2, … The top 10:
Box office (millions)
Avg.
Pct.
Weeks
Film
Wkd.
Total
Per site
Chg.
Out
1 The Matrix
$22.6
$73.3
$7,772
-19%
2
2 Never Been Kissed
$11.8
New
$4,821
1
3 10 Things I Hate
$5.05
$20.4
$2,218
-39%
2
About You
4 The out-of$5.01
$16.2
$2,380
-39%
2
Towners
5 Analyze This
$5.0
$85.8
$2,125
-21%
6
*
Re-creation of a selected portion of a table from USA Today (April 13, 1999). Some details omitted to
conserve space.
Figure 2. Illustration of a tabular display in a newspaper.
Graph in Yediot Aharonot, the daily
newspaper with the largest circulation
in Israel, July 11, 2000. The title says:
“Women in Israel are more educated”.
The subtitle says: “Israel holds the
world record in the percentage of
women among students for Master
and Doctoral degrees”. The bars
represent percentages for (from top to
bottom): Israel (55.4%), United
States, Australia, Denmark, Great
Britain,
Finland,
Sweden,
Switzerland, and Japan (21.5%).
(Reprinted with permission).
Figure 3. Women’s education in different countries.
STATISTICAL LITERACY
55
Figure 4. Oil use in two years. Stimulus from an IALS item. (Reprinted with permission).
Document Literacy
The literacy skills needed for statistical literacy are not limited to those
involving processing of prose text. This subsection extends the preceding discussion
by examining Document Literacy skills, which pertain to reading various nonprose
texts, including graphs, charts, and tables. The growing literature on graph
comprehension examines various processes involved in making sense of graphs,
from simple graph-reading to making inferences based on graphs (Bright & Friel,
1998), but has seldom viewed graphs as a subtype of documents in general.
The notion of Document Literacy comes out of the influential work of Kirsch
and Mosenthal (Kirsch, Jungeblut, & Mosenthal, 1998), who view literacy as
comprised of three interrelated components: Prose Literacy, Document Literacy, and
Quantitative Literacy. This conceptualization of literacy served as a basis for several
large-scale studies, most recently the International Adult Literacy Survey (IALS;
Statistics Canada and OECD, 1996; OECD & Human Resources Development
Canada, 1997), and prior national studies of the literacy of adults and young adults,
mainly in the United States and Canada (e.g., Kirsch, Jungeblut, Jenkins, & Kolstad,
1993), but also in Australia.
Kirsch and Mosenthal (1990) claim that documents tend to be the predominant
form of literacy in nonschool settings, and serve as an important source of
information and a basis for enabling actions and decisions. Document Literacy tasks
require people to identify, interpret, and use information given in lists, tables,
indexes, schedules, charts, and graphical displays. The information in such displays
often includes explicit quantitative information, such as numbers or percentages, in
addition to the quantitative or statistical information conveyed by graphs and charts.
Mosenthal & Kirsch (1998) argue that documents, which include graphs and charts,
are usually arranged in arrays of varying degrees of complexity: they may include
“simple lists” or “combined lists,” as in a simple table or a simple bar graph or pie
56
IDDO GAL
chart (Figures 3 and 4); or “intersecting lists” or “nested lists,” as in a two-way table
(Figure 2) or in a complex multielement graph.
An important aspect of the Kirsch and Mosenthal work (Kirsch, Jungeblut, &
Mosenthal, 1998) is the description (“grammar”) provided of the cognitive
operations required to locate information in documents, and the reading strategies
required to match information in a question or directive to corresponding
information in arrays of varying degrees of complexity. Key processes include
locating specific information in given texts or displays, cycling through various
parts of diverse texts or displays, integrating information from several locations
(e.g., across two graphs, as in Figure 4), and generating new information (e.g.,
finding the difference between percentages in different parts of a table or between
bars in a graph). Further, readers have to make inferences, quite often in the
presence of irrelevant or distracting information, and perhaps apply mathematical
operations as well to information contained in graphs or tables.
As Mosenthal and Kirsch (1998) argue, many types of common statistical
information can be displayed in both graphs and tables, and one form is often a mere
transformation of the other (e.g., when a table with a simple list is transformed into a
simple bar chart). Hence, putting aside specialized aspects of graph comprehension
(Tufte, 1997), their work provides a generalized way to understand literacy aspects
of interpreting multiple types of documents and displays, and enables us to embed a
discussion of statistical literacy within a broader framework of general literacy.
Statistical Knowledge Base
An obvious prerequisite for comprehending and interpreting statistical messages
is knowledge of basic statistical and probabilistic concepts and procedures, and
related mathematical concepts and issues. However, almost all authors who are
concerned about the ability of adults or of school graduates to function in a
statistics-rich society do not discuss what knowledge is needed to be statistically
literate per se, but usually focus on what needs to be taught in schools and argue that
all school (or college) graduates should master a range of statistical topics, assuming
this will ensure learners’ statistical literacy as adults. A recent example can be found
in Scheaffer, Watkins, and Landwehr (1998). Based on their extensive prior work in
the area of teaching statistics and on reviewing various curriculum frameworks,
these authors describe numerous areas as essential to include in a study of statistical
topics in high school:
•
•
•
•
•
Number sense
Understanding variables
Interpreting tables and graphs
Aspects of planning a survey or experiment, such as what constitutes a good
sample, or methods of data collection and questionnaire design
Data analysis processes, such as detecting patterns in univariate or two-way
frequency data, or summarizing key features with summary statistics
STATISTICAL LITERACY
•
•
57
Relationships between probability and statistics, such as in determining
characteristics of random samples, background for significance testing
Inferential reasoning, such as confidence intervals or testing hypotheses
It is tempting to regard this list as a possible candidate for an “ideal” set of
mathematical and statistical knowledge bases that can guarantee statistical literacy.
(Indeed, this author would be happy if most adults possessed such knowledge.)
However, what is “basic” knowledge cannot be discussed in absolute terms, but
depends on the desired level of statistical literacy expected of citizens, on the
functional demands of contexts of action (e.g., work, reading a newspaper), and on
the characteristics of the larger societal context of living. Hence, the above list may
not be appropriate for all cultural contexts, may be an overspecification in some
cases, and other elements could be added to it.
Unfortunately, no comparative analysis has so far systematically mapped the
types and relative prevalence of statistical and probabilistic concepts and topics
across the full range of statistically related messages or situations that adults may
encounter and have to manage in any particular society. Hence, no consensus exists
on a basis for determining the statistical demands of common media-based
messages. To date, only a single comparative study (Joram, Resnick, & Gabriele,
1995) addressed this complex issue, by analyzing the characteristics of rational
numbers (especially fractions, percentages, and averages) that appear in weekly or
monthly magazines written for children, teenagers, and adults in the United States.
This study was based on the assumption that it is useful to view literacy not only as
a skill or ability but also as a set of cultural practices that people engage in, and
hence that it is important to examine the characteristics of the texts that people may
have to make sense of, and ask how these characteristics shape people’s literacy
practices.
Regarding adults, Joram et al. (1995) sampled seven widely circulated
magazines that aim at different types of readers: Reader’s Digest, National
Geographic, Better Homes and Gardens, National Enquirer, Time, Consumer
Reports, and Sports Illustrated. They applied a complex coding scheme to capture
the number of occurrences of rational numbers, especially fractions, percentages,
and averages, in the middle 20 pages of one issue. Some findings that are relevant
for the present paper were:
•
•
•
The mean frequencies (per 20 pages) of fractions, percentages, and averages
were 4.86, 10.00, and 2.00, respectively.
Regarding percentages found in these magazines, about half expressed
part/whole relations (“The nation’s 113 nuclear reactors already generate 20
percent of our electricity”), and one-third referred to increase/decrease (“If
… electricity consumption increases by 2.5 percent a year, we could be
headed for real problems”).
Only 14% of statements regarding rational numbers in adult magazines were
modified by a part of speech such as an adjective (“An astonishing 35
percent of all …”). This finding suggested to Joram et al. that authors in
58
IDDO GAL
•
adult magazines do not provide a great deal of interpretation of numbers in
their immediate context and hence numbers are usually allowed to speak for
themselves.
Four of the seven adult magazines contained within the pages sampled at
least one table or graph. Overall, the seven magazines included four tables,
four bar graphs, and one pyramid graph (used to show quantities).
These and other findings reported by Joram et al. suggest that percentages are
the most common rational number in magazines used to convey statistical
information (see also Parker & Leinhardt, 1995), and that numerical or statistical
information may appear in tables and not only in graphs. In order to make full sense
of statistical information appearing in magazines, adults should be able to
understand plain passages that provide the context for the rational numbers or
graphs shown, and relate different elements in given passages or displays to each
other. These conclusions agree with and complement the earlier discussion of
literacy skills needed for interpreting statistical messages.
Beyond the data provided by Joram et al. (1995), there is no comprehensive
research base from which to establish the statistical literacy requirements in the full
range of domains and environments where adults function. Five key parts of the
statistical knowledge base required for statistical literacy are proposed in this
subsection and summarized in Table 2. These building blocks were identified on the
basis of reviewing writing by mathematics and statistics educators (such as
Shaughnessy, 1992, Moore, 1990, 1997b; chapters in Steen, 1997; chapters in Gal &
Garfield, 1997; chapters in Lajoie, 1998; NCTM, 2000), sources on scientific
literacy (e.g., Shamos, 1995; AAAS, 1995), and on mathematics and statistics in the
news (e.g., Huff, 1954; Hooke, 1983; Crossen, 1994; Paulos, 1995; Kolata, 1997).
Table 2. Five parts of the statistical knowledge base
1. Knowing why data are needed and how data can be produced
2. Familiarity with basic terms and ideas related to descriptive statistics
3. Familiarity with basic terms and ideas related to graphical and tabular displays
4. Understanding basic notions of probability
5. Knowing how statistical conclusions or inferences are reached
Knowing Why Data Are Needed and How Data Can Be Produced
Overall, adults should possess some understanding of the origins of the data on
which reported findings or displays are based, understand the need to know how
data were produced, and be aware of the contribution of a good design for data
production to the possibility of answering specific questions (Cobb & Moore, 1997).
STATISTICAL LITERACY
59
Adults should also be aware that public officials, organizations, employers,
advertisers, and other players in the public arena need to base claims or conclusions
on credible empirical evidence, and that properly produced data can inform public
debate and serve as a basis for decisions and allocation of resources, much better
than anecdotal evidence (Moore, 1998).
To enable critical understanding of reported findings or data-based claims, adults
should possess some knowledge, at least informal, of key “big ideas” that underlie
statistical investigations (Garfield & Gal, 1999). First on the list of most statisticians
is the existence of variation (Moore, 1998). The need to reduce data in order to
identify key features and trends despite noise and variation should be understood by
adults as it provides the basis for accepting the use of statistical summaries (e.g.,
means, graphs) as tools for conveying information from data producers to data
consumers (Wild & Pfannkuch, 1999).
Further, adults should possess some understanding of the logic behind key
research designs commonly mentioned in the media, primarily experiments and the
reason for using experimental and control groups to determine causal influences (see
excerpt #6 in Figure 1); census (excerpt #2); polls/surveys (excerpts #3 and #4); and
perhaps the role and limitations of a pilot study. Given the prevalence of polls and
surveys, adults should also understand, at least intuitively, the logic of sampling, the
need to infer from samples to populations, and the notions of representativeness and
especially bias in this regard (Cobb & Moore, 1997; Wild & Pfannkuch, 1999).
Some specific ideas to be known in this regard are the advantages of probability
sampling, the dangers of convenience sampling, or the influence of the sampling
process, sample size, and sample composition on researchers’ ability to generalize
safely and infer about a population from sample data.
Familiarity with Basic Terms and Ideas Related to Descriptive Statistics
Assuming adults understand why and how data are produced, they need to be
familiar with basic concepts and data displays that are commonly used to convey
findings to target audiences. Two key types of concepts whose centrality is noted by
many sources are percentages (Parker & Leinhardt, 1995) and measures of central
tendency, mainly the arithmetic mean (often termed “average” in newspapers) but
also the median. Gal (1995) argues that it is desirable for consumers of statistical
reports to know that means and medians are simple ways to summarize a set of data
and show its “center”; that means are affected by extreme values, more so than
medians; and that measures of center can mislead when the distribution or shape of
the data on which they are based is very uneven or bimodal, or when the data or
sample from which they are calculated is not representative of the whole population
under study (see excerpt #5 in Figure 1). More broadly, it is useful for adults to be
aware that different types of seemingly simple summary indices (i.e., percentage,
mean, median) may yield different, and at times conflicting, views of the same
phenomena.
60
IDDO GAL
Familiarity with Graphical and Tabular Displays and Their Interpretation
Adults should know that data can be displayed or reported in both graphical and
tabular displays, which serve to organize multiple pieces of information and enable
the detection or comparison of trends in data (Tufte, 1997). In this regard, one hopes
that adults can first of all perform literal reading of data in tables or graphs, be
familiar with standard conventions in creating graphs and charts, and be attentive to
simple violations of such conventions (Bright & Friel, 1998) such as those in the
graph in Figure 3: The relative length of the bars is not proportional to the actual
percentages, and neither is the positioning of the boxes with percentages inside each
bar; the decision of the graphical artist to add a female figure on the left (probably
for decoration or to gain attention) masks the length of some bars and renders the
visual appearance misleading. In this case, one hopes that readers realize the need to
examine the actual percentages.
It is also expected that adults can do, on some level, what Curcio (1987) and
Wainer (1992) call “reading between the data” and “reading beyond the data,” such
as understand that projections can be made from given data, and that one should
look at overall patterns and not only specific points in a graph or a table (Gal, 1998).
Adults should also realize that different graphs and tables may yield different (and
possibly conflicting) views of the phenomena under investigation. Finally, adults
should be aware that graphs can be intentionally created to mislead or highlight/hide
a specific trend or difference. Various examples in this regard have been presented
by Huff (1954). (See also Orcutt & Turner’s [1993] analysis, discussed later, of how
Newsweek magazine manipulated survey data on drug use to advance a specific
point of view).
Understanding Basic Notions of Probability
Ideas regarding chance and random events are explicit or implicit in many types
of messages adults encounter. Many statistical reports make probabilistic statements
in the context of presenting findings from surveys or experiments, such as the
likelihood of obtaining certain results (see excerpts #1 and #6 in Figure 1).
Messages can also include probabilistic estimates made by various professionals
(weather forecasters, genetic counselors, physicians, admissions administrators in
colleges) regarding the likelihood of various events or the degree of confidence in
their occurrence (rain, risks, side effects, or acceptance, respectively). Some of these
claims may not be based on statistical studies, and could be couched in subjective
estimates of individuals.
It is safe to expect that at a minimum, adults should be sensitive to the problem
of interpreting correctly the “language of chance” (Wallsten, Fillenbaum, & Cox,
1986). Adults should have a sense for the many ways in which estimates of
probability or risk are communicated by various sources, such as by percentages,
odds, ratios, or verbal estimates. (Excerpt #6 illustrates how these combine in
complex ways within a single article.)
STATISTICAL LITERACY
61
Next, there is a need for adults to be familiar with the notion of randomness,
understand that events vary in their degree of predictability or independence, yet
also that some events are unpredictable (and hence that co-occurrence of certain
events does not mean that they are necessarily related or cause each other).
Unfortunately, while possible, it is difficult to present more advanced or explicit
expectations for adults in terms of understanding random processes without
appearing simplistic or naive. People from all walks of life have been shown to hold
many misconceptions and discontinuities in understanding and reasoning about
stochastic phenomena (Konold, 1989; Gal & Baron, 1996; Shaughnessy, Garfield, &
Greer, 1997). Further, understanding of random phenomena also takes part in
cognitive processes of judgment, decision making, and rationality, in which various
deficiencies have been documented as well (Baron, 1988; Mellers, Schwartz, &
Cooke, 1998).
Nonetheless, if adults are to understand and critically evaluate probabilistic
claims, they should at least recognize the importance of ascertaining the source for
probability estimates. Adults should realize that estimates of chance and risk may
originate from diverse sources, both formal (e.g., frequency data, modeling,
experimentation) and subjective or anecdotal, and that estimates may have different
degrees of credibility or accuracy. Thus, they should expect that the evidence or
information basis for statements of chance can be specified by those who make
claims, and that judgments of chance may fluctuate and forecasts may change when
additional data become available (Clemen & Gregory, 2000).
A final and more advanced expectation is that adults understand, at least
intuitively, the idea of a chance variability in (random) phenomena. As Cobb and
Moore (1997) explain, “When a chance mechanism is explicitly used to produce
data, probability … describes the variation we expect to see in repeated samples
from the same population” (p. 813). Some understanding of probability is thus also a
gateway to making sense of statements about the significance of differences between
groups or likelihood of obtaining certain results, since standard statistical inference
is based on probability (Cobb & Moore, 1997).
Knowing how statistical conclusions or inferences are reached.
Whereas most adults are data consumers and not producers, they do need to have
a grasp on some typical ways to summarize data, such as by using means or
medians, percentages, or graphs. However, given that there are different designs for
collecting data, and that sampling processes or random processes may be involved,
adults also need to possess some sense of how data are analyzed and conclusions
reached, and be aware of relevant problems in this regard.
First, adults need to be sensitive to the possibility of different errors or biases (in
sampling, in measurement, in inference) and possess a healthy concern regarding the
stability and generality of findings. Second, it is useful to realize that errors may be
controlled through proper design of studies, and can be estimated and described
(e.g., by means of probability statements). One concept mentioned in the media in
this regard is “margin of error” (see excerpt #3 in Figure 1, and the implicit
62
IDDO GAL
mentioning of inflated scores in excerpt #5). Third, it is useful to know that there are
ways to determine the significance or “trueness” of a difference between groups, but
that this requires attention to the size of the groups studied, to the quality of the
sampling process and the possibility that a sample is biased (understanding of these
notions is needed if one is to think critically of the claims in excerpts #1 and #6).
Finally, it is important to be aware that observed differences or trends may exist but
may not necessarily be large or stable enough to be important, or can be caused by
chance processes (as is the case with the reported increase in sexual intercourse in
excerpt #4).
Mathematical Knowledge Base
A determination of the types of mathematical knowledge expected of adults to
support statistical literacy should be made with caution. On the one hand, adults
clearly need to be aware of some of the mathematical procedures underlying the
production of common statistical indicators, such as percent or mean. At the same
time, expectations regarding the amount and level of formal mathematics needed to
comprehend basic statistical ideas taught at the introductory college level (or in high
schools) have been changing in recent years (Moore, 1998). A brief detour to
describe leading ideas in this regard is offered below to help frame later statements
about the mathematical knowledge base needed for statistical literacy.
Statisticians have gradually clarified over the last few years the nature of some
fundamental differences between mathematics and statistics (Moore & Cobb, 2000),
and have formulated some working assumptions about the general level of
mathematics one needs to learn statistics, at least at the introductory college level.
Cobb and Moore (1997) summarize recommendations of the ASA/MAA committee
on statistics instruction (Cobb, 1992), and suggest that while statistics makes heavy
use of mathematics, statistics instruction at the introductory college level should
focus on statistical ideas (need for data and importance of data production,
omnipresence of variability, need to explain and describe variability).
Understanding the mathematical derivations that underlie key ideas presented in
introductory statistics is of some importance but should be kept limited, since
computers now automate many computations. While there is no intention of leading
students to accept statistical derivations as magic (i.e., without knowing any of the
underlying mathematics), too much emphasis on mathematical theory is not
expected early on; it may disrupt the development of the necessary intuitive
understanding of key statistical ideas and concepts that often do not have
mathematical representations and are unique to the discipline of statistics (Moore,
1997a; Wild & Pfannkuch, 1999). Cobb and Moore (1997) further claim that
probability is conceptually the hardest subject in elementary mathematics, and
remind that psychological studies have documented confusion about probability
even among those who master the computational side of probability theorems and
can solve textbook exercises. Hence, even for understanding of the formal aspects of
inference or of probability, only a limited amount of mathematical knowledge is
expected.
STATISTICAL LITERACY
63
The above logic can help in determining the mathematical knowledge that adults
need to support statistical literacy. Given that most adults in any country do not
study statistics at the college level (Moore & Cobb, 2000; UNESCO, 2000), the
amount and level of formal knowledge of mathematics needed to support adult
statistical literacy can be restricted.
Perhaps the simplest knowledge expected of adults is the realization that any
attempt to summarize a large number of observations by a concise quantitative
statement (percentage, mean, probability, etc.) requires some application of
mathematical tools and procedures. Adults need to have numeracy skills at a
sufficient level to enable correct interpretation of numbers used in statistical reports.
“Number sense” is increasingly being touted as an essential skill for proper
understanding of diverse types of numbers (Paulos, 1995; Curry, Schmitt, &
Waldron, 1996; Scheaffer et al., 1998; NCTM, 2000), such as large numbers (e.g.,
trends in GNP) and small numbers, including fractions, decimals, and percents (e.g.,
estimates of risk or side effects).
Understanding of basic statistical findings pertaining to percentages or
“averages” requires familiarity, intuitive and to some extent formal, with underlying
mathematical procedures or computations used to generate these statistics (Garfield
& Gal, 1999). Citizens should know how an arithmetic mean is computed in order to
fully appreciate the meaning of the claim that an arithmetic mean can be influenced
by extreme values in a data set and hence may not represent the “middle” of a set of
values if the data are skewed. Excerpt #5 shows a variant on this demand, that is,
understanding of the impact of excluding a certain proportion of extreme
observations (6% in the example given) on the central tendency.
Many types of statistical information reported in the media are described in
terms of percentages (Joram et al., 1995) and are sometimes included in graphs.
Numerous examples can be found in Figures 1 and 2. Percentage is a seemingly
simple mathematical concept, commonly perceived as expressing a proportion or
ratio; it is presumably mastered in the middle grades, and hence it could be expected
that the vast majority of schooled adults will understand it. Yet, its understanding is
far from being simple. Parker and Leinhardt (1995) address the prevalence and
complexity of percentages, and also point to specific types of percentages that
normally are not encountered in routine classroom teaching but may appear in
newspaper statements, such as percentages larger than 100% or percentage of
percent. These authors argue that generations of students, including at the college
level, have failed to fully master percentage, in part because it is a multifaceted
concept that has multiple mathematical meanings and also statistical uses (e.g., a
number, an expression of a relationship, a statistic, a function, an expression of
likelihood). Understanding the mathematical and statistical meaning of a reported
percentage can be difficult. Readers may have to make inferences and assumptions,
for example, when a message does not specify the base for calculating a percentage.
Percentages may represent complex relationships (e.g., conditional probabilities)
and, as illustrated in Figure 1, may be linked to concepts that themselves have
multiple meanings (such as “15 percent below average,” “2% margin of error”).
The examples pertaining to percentages and computations of means and medians
imply that interpretation of even seemingly simple statistics reported in the media
64
IDDO GAL
requires some familiarity with their derivation (though not always formal training in
this regard). It follows that adults should understand, at least informally, some of the
mathematics involved in generating certain statistical indicators, as well as the
mathematical connection between summary statistics, graphs, or charts, and the raw
data on which they are based.
Questions about the amount of mathematics one needs to know to understand
more sophisticated concepts are more difficult to answer and have been the source
of some debate among statistics and mathematics educators (Moore, 1997a). Terms
or phrases that appear in the media, such as “margin of error” or “statistically
significant difference” can be understood intuitively in a way that can help adults
without formal statistical training make a superficial sense of news items. After all,
such ideas are being successfully taught at an introductory level to children in
elementary or middle schools (Friel, Russell, & Mokros, 1990). However, deeper
understanding of the above or related concepts, and proper interpretation of their
exact meaning, requires more solid understanding of underlying statistical ideas
(quantification of variance, repeated sampling, sampling distributions, curves, logic
of statistical inference, etc). These ideas are hard to grasp for college-bound students
(Cobb & Moore, 1997; Watson & Moritz, 2000) even without the added
complication of the need to understand their mathematical underpinnings.
Context/World Knowledge Base
Proper interpretation of statistical messages by adults depends on their ability to
place messages in a context, and to access their world knowledge. World knowledge
also supports general literacy processes and is critical to enable “sense-making” of
any message. Moore (1990) has argued that in statistics, the context motivates
procedures; data should be viewed as numbers with a context, and hence the context
is the source of meaning and basis for interpretation of obtained results. In reading
contexts, however, people do not engage in generating any data or in carrying any
computations or analysis. Their familiarity with the data-generation process (e.g.,
study design, sampling plan, questionnaires used), or with the procedures employed
by the researchers or statisticians to analyze the data, depends on the details and
clarity of the information given in the messages presented to them. As passive
receivers of messages, they are at the mercy of message creators.
It follows that adults’ ability to make sense of statistical claims or displays will
depend on whatever information they can glean from the message about the
background of the study or data being discussed. Context knowledge is the main
determinant of the reader’s familiarity with sources for variation and error. If a
listener or reader is not familiar with a context in which data were gathered, it
becomes more difficult to imagine why a difference between groups can occur, what
alternative interpretations may exist for reported findings about an association
detected between certain variables, or how a study could go wrong.
The ways in which a study is reported in the media can easily mask or distort the
information available to the reader about the source of the evidence presented. An
example is when a reporter uses the term experiment in a way that enhances the face
STATISTICAL LITERACY
65
validity of a study that is nonexperimental in nature. Thus world knowledge,
combined with some literacy skills, is prerequisite for enabling critical reflection
about statistical messages and for understanding the implications of the findings or
numbers reported. Adults can be helped by having a sense for, and expectations
about, elements of good journalistic writing, such as for objective writing,
presentation of two-sided arguments, accuracy in reporting, or provision of
background information to orient readers to the context of a story.
Critical Skills
Messages aimed at citizens in general may be shaped by political, commercial,
or other agendas which may be absent in statistics classrooms or in empirical
enquiry contexts. Fred Mosteller said, “Policy implies politics, and politics implies
controversy, and the same data that some people use to support a policy are used by
others to oppose it” (cited in Moore, 1998, p. 1255). Not surprisingly, the need for
critical evaluation of messages to the public has been a recurring theme in writings
of educators interested in adults’ literacy and numeracy (Freire, 1972; Frankenstein,
1989).
As noted in discussing literacy skills, messages in the general media are
produced by very diverse sources, such as journalists, politicians, manufacturers, or
advertisers. Depending on their needs and goals, such sources may not necessarily
be interested in presenting a balanced and objective report of findings or
implications. A potent example is Orcutt and Turner’s (1993) analysis of how the
print media, especially Newsweek magazine, selectively analyzed and intentionally
manipulated trend data collected by the Institute for Social Research (ISR) regarding
drug use among American high-school students between 1975 and 1985. According
to Orcutt & Turner, the media attempted to created for the public an image of a
“drug plague,” by selecting at its convenience only some of the data collected as part
of a multiyear survey project, using graphical methods to augment small percentage
differences (after truncating and censorizing), to appear visually large.
Orcutt and Turner (1993) add that later in 1992, Newsweek attempted again to
create a sense of national danger by reporting that the use of LSD is “rising
alarmingly” and that for the first time since 1976, more high-school seniors used
LSD than cocaine. However, analysis of the ISR data on which Newsweek based this
argument showed that this argument had no empirical basis. Cocaine use decreased
from 6.5% in 1989 to 5.3% in 1990, a statistically significant change (given sample
size used), whereas LSD use increased from 4.9% to only 5.4%, which was within
the range of sampling error. The contrast between these figures, which were
available to Newsweek, and the narrative and graphs used in the articles published,
suggest an intentional misuse of data and highlights the media’s tendency for
sensational reporting practices.
Excerpts #4 and #6 in Figure 1 further illustrate how data can be tailored to serve
the needs of specific organizations (e.g., states and manufacturers), and how reports
about data are shaped to influence the opinions of the listener or reader in a specific
direction. Paulos (1995, p. 79) notes that originators of messages regarding diseases,
66
IDDO GAL
accidents, or other misfortunes that afflict humans, depending on their interest, can
make them appear more salient and frightening by choosing to report absolute
numbers (e.g., 2,500 people nationwide suffer from X), or in contrast can downplay
them by using incidence rate (e.g., 1 in every 100,000 people suffer from X). Many
examples are also presented by Huff (1954) and Crossen (1994).
In light of such examples, and the possibility for biased reporting (Wanta, 1997),
adults have to worry about and examine the reasonableness of claims presented in
the media. They have to be concerned about the validity of messages, the nature and
credibility of the evidence underlying the information or conclusions presented, and
reflect upon possible alternative interpretations of conclusions conveyed to them. It
follows that adults should maintain in their minds a list of “worry questions”
regarding statistical information being communicated or displayed (Gal, 1994;
Moore, 1997b; Garfield & Gal, 1999). Ten such questions are listed in Table 3.
When faced with an interpretive statistical task, people can be imagined running
through this list and asking for each question, “Is this question relevant for the
situation/message/task I face right now?”
The answers people generate to these and related questions can support the
process of critical evaluation of statistical messages and lead to the creation of more
informed interpretations and judgments. This list can of course be modified, and
some of its elements regrouped, depending on the life contexts and functional needs
of different adults. It can expand beyond basic statistical issues to cover broader
issues of probability and risk, or job-specific statistical topics such as those related
to statistical process control or quality assurance.
STATISTICAL LITERACY
67
Table 3. Sample “worry questions” about statistical messages
1. Where did the data (on which this statement is based) come from? What kind of study
was it? Is this kind of study reasonable in this context?
2. Was a sample used? How was it sampled? How many people did actually participate?
Is the sample large enough? Did the sample include people/units which are
representative of the population? Is the sample biased in some way? Overall, could this
sample reasonably lead to valid inferences about the target population?
3. How reliable or accurate were the instruments or measures (tests, questionnaires,
interviews) used to generate the reported data?
4. What is the shape of the underlying distribution of raw data (on which this summary
statistic is based)? Does it matter how it is shaped?
5. Are the reported statistics appropriate for this kind of data? E.g., was an average used
to summarize ordinal data; is a mode a reasonable summary? Could outliers cause a
summary statistic to misrepresent the true picture?
6. Is a given graph drawn appropriately, or does it distort trends in the data?
7. How was this probabilistic statement derived? Are there enough credible data to justify
the estimate of likelihood given?
8. Overall, are the claims made here sensible and supported by the data? E.g., is
correlation confused with causation, or a small difference made to loom large?
9. Should additional information or procedures be made available to enable me to
evaluate the sensibility of these arguments? Is something missing? E.g., did the writer
“conveniently forget” to specify the base of a reported percent-of-change, or the actual
sample size?
10. Are there alternative interpretations for the meaning of the findings or different
explanations for what caused them, e.g., an intervening or a moderator variable
affected the results? Are there additional or different implications that are not
mentioned?
Interaction of Knowledge Bases
Five knowledge bases were described above separately for ease of presentation,
but they overlap and do not operate independently from each other. For example,
familiarity with possible language ambiguities and reporting conventions comprises
part of the literacy skills required of adults, yet they are also part of general world
knowledge, and related to the need for knowledge about intentional (and possibly
biased) reporting practices listed as part of critical skills. Some aspects of the
statistical knowledge base overlap with mathematical knowledge, for example
regarding the difference in the computational procedures used to find medians and
means and their implication for interpretation of such statistics under different
conditions.
The characteristics of certain real-world messages require that adults jointly
activate all the knowledge based described in order to manage tasks at hand (Gal,
68
IDDO GAL
1997). Figure 2 exemplifies the complex task that may face readers of print media
with regard to interpreting information of a statistical nature, and illustrates the
interconnected nature of the knowledge bases that underlie people’s statistical
literacy.
Figure 2 recreates a portion of a table that appeared in USA Today (a nationally
circulated daily newspaper) in 1999. This table combines an offbeat opening passage
with a tabular display of several simple lists, each containing information of a
different nature: absolute numbers, averages, percentages. Interpretation of the table
requires not only basic familiarity with averages and percentages, but also literacy
skills and access to different kinds of background knowledge. Some details needed
to make complete sense of the mathematical information are not fully stated, forcing
the reader to perform inferences, based on his or her general world knowledge:
averages are denoted as “avg.” and percentages as “pct. chg,” both nonstandard
abbreviations; the averages are “per site,” but it is not explained what is a “site” nor
if the average is calculated for a whole week or a weekend only; percentages
describe change in negative numbers, yet the base is not given, only implied.
DISPOSITIONAL ASPECTS OF STATISTICAL LITERACY
The notion of “critical evaluation,” highlighted in several of the conceptions of
statistical literacy cited earlier (e.g., Wallman, 1993), implies a form of action, not
just passive interpretation or understanding of the statistical or probabilistic
information available in a situation. It is hard to describe a person as fully
statistically literate if this person does not show the inclination to activate the five
knowledge bases described earlier or share with others his or her opinions,
judgments, or alternative interpretations.
Statistically literate action can take many forms, both overt and hidden. It can be
an internal mental process, such as thinking about the meaning of a passage one
read, or raising in one’s mind some critical questions and reflecting about them. It
can be extended to more external forms, such as rereading a passage, scanning a
graph one encountered in the newspaper, stopping a game of chance after one
remembers reading an article about the Gambler’s Fallacy, or discussing findings of
a survey one heard about on TV with family members at the dinner table or with coworkers. However, for any form of action to occur and be sustained, certain
dispositions need to exist and be activated.
The term dispositions is used here as a convenient aggregate label for three
related but distinct concepts—critical stance, beliefs, and attitudes—which are all
essential for statistical literacy. These concepts are interconnected (McLeod, 1992),
and hence are harder to describe in a compartmentalized way, unlike the description
of the five knowledge bases above. This section first describes critical stance, and
then examines beliefs and attitudes that underlie a critical stance.
STATISTICAL LITERACY
69
Critical Stance
A first expectation is that adults hold a propensity to adopt, without external
cues, a questioning attitude toward quantitative messages that may be misleading,
one-sided, biased, or incomplete in some way, whether intentionally or
unintentionally (Frankenstein, 1989). They should be able and willing to
spontaneously invoke their personal list of worry questions (see Table 3) when faced
with arguments that purport to be based on data or with reports of results or
conclusions from surveys or other empirical research (Gal, 1994).
It is important to keep in mind that willingness to invoke action by adults when
they encounter statistical information or messages may sometimes be required under
conditions of uncertainty. Examples are lack of familiarity with the background of
the issues discussed or estimates conveyed, partial knowledge of concepts and their
meanings, or the need to cope with technical terms that “fly above the head” of the
Reader. This may be the case for many adults without much formal education or
effective literacy skills, who constitute a sizable percentage of the population in
many countries (Statistics Canada and OECD, 1996; UNESCO, 2000). Action or
reaction in such situations may involve taking some personal risks, i.e., exposing to
others that one is naive about, or unfamiliar with, certain statistical issues, and
possibly suffering some embarrassment or the need to argue with others.
Beliefs and Attitudes
Certain beliefs and attitudes underlie people’s critical stance and willingness to
invest mental effort or occasionally take risks as part of acts of statistical literacy.
There is a definitional challenge in discussing “beliefs” and “attitudes,” as the
distinction between them is somewhat murky. (Researchers, for example, often
implicitly defined statistics attitudes or beliefs as whatever their favorite assessment
instrument measures in the context of a specific target population, such as school
students, college students, or adults at large).
Based on McLeod’s (1992) work on affective aspects of mathematics education,
a distinction should be made between emotions, attitudes, and beliefs (see also
Edwards, 1990; Green, 1993). Emotions are transient positive and negative
responses triggered by one’s immediate experiences (e.g., while studying
mathematics or statistics, or while facing a certain probabilistic situation, such as
receiving medical information about the chances of side effects of a proposed
treatment). Attitudes are relatively stable, intense feelings that develop through
gradual internalization of repeated positive or negative emotional responses over
time. Attitudes are expressed along a positive–negative continuum (like–dislike,
pleasant–unpleasant), and may represent, for example, feelings toward objects,
actions, or topics (“I don’t like polls and pollsters, they always confuse me with
numbers”). Beliefs are individually held ideas or opinions, such as about a domain
(“government statistics are always accurate”), about oneself (“I am really naive
about statistical information,” “I am not a numbers person”), or about a social
context (“The government should not waste money on big surveys”; see Wallman,
70
IDDO GAL
1993). Beliefs take time to develop and cultural factors play an important part in
their development. They have a larger cognitive component and less emotional
intensity than attitudes, and are stable and quite resistant to change compared to
attitudes.
Adults should develop a positive view of themselves as individuals capable of
statistical and probabilistic reasoning as well as a willingness and interest to “think
statistically” in relevant situations. This assumes that adults hold some appreciation
for the power of statistical processes, and accept that properly planned studies have
the potential to lead to better or more valid conclusions than those obtained by
relying on anecdotal data or personal experiences (Moore, 1998). Broader
metacognitive capacities that are considered part of people’s general intellectual
functioning can further support statistically literate behavior, such as having a
propensity for logical reasoning, curiosity, and open-minded thinking (Baron, 1988).
Gal, Ginsburg, and Schau (1997) examined the role of attitudes and beliefs in
statistics education, and argued that to enable productive problem solving, learners
need to feel safe to explore, conjecture, and feel comfortable with temporary
confusion or a state of uncertainty. It was argued earlier that reading contexts, where
people are data consumers, differ in several ways from those encountered in inquiry
contexts such as those addressed by Gal et al. (1997). Yet, some commonality
between these two contexts does exist regarding the required beliefs that support
action. Even in reading contexts adults have to feel safe to explore and hypothesize,
feel comfortable being in the role of a critical reader or listener, and believe in their
ability to make sense of messages (Gal, 1994), as a condition for developing and
sustaining their motivation for critical action.
Finally, we come to a point where “critical stance” and “beliefs and attitudes”
mesh together. For a critical stance to be maintained, adults should develop a belief
in the legitimacy of critical action. Readers should uphold the idea that it is
legitimate to be critical about statistical messages or arguments, whether they come
from official or other sources, respectable as they may be. Adults should agree that it
is legitimate to have concerns about any aspect of a reported study or a proposed
interpretation of its results, and to raise pertinent “worry questions,” even if they
have not learned much formal statistics or mathematics, or do not have access to all
needed background details.
DISCUSSION AND IMPLICATIONS
This paper’s main goal was to propose a conceptualization of statistical literacy
and describe its key components. Given the patchy literature on statistical literacy,
the availability of such a model was seen as a necessary prefatory step before further
scholarly discussion can ensue regarding the issues involved in developing or
studying adult statistical literacy. Statistical literacy was portrayed in this paper as
the ability to interpret, critically evaluate, and if needed communicate about
statistical information, arguments, and messages. It was proposed that statistically
literate behavior requires the joint activation of five interrelated knowledge bases
STATISTICAL LITERACY
71
(literacy, statistical, mathematical, context/world, and critical), yet that such
behavior is predicated on the presence of a critical stance and supporting beliefs and
attitudes.
The proposed conceptualization highlights the key role that nonstatistical factors
and components play in statistical literacy, and reflects the broad and often
multifaceted nature of the situations in which statistical literacy may be activated.
That said, several observations should be made. First, the five knowledge bases
discussed in this paper were sketched in broad strokes to clarify the key categories
of knowledge to be considered when thinking of what adults need to know to be
statistically literate. Each could be modified or elaborated, depending on the cultural
context of interest, and on the sophistication of statistical literacy expected of
citizens or workers in a given country or community. As with conceptions of other
functional skills, the particulars viewed as essential for statistical literacy in a
specific country will be dynamic and may have to change along with technological
and societal progress.
Secondly, although five knowledge bases and a cluster of beliefs, attitudes, and a
critical stance were proposed as jointly essential for statistical literacy, it does not
necessarily follow that a person should fully possess all of them to be able to
effectively cope with interpretive tasks in all reading and listening contexts.
Following current conceptions of adult literacy (Wagner et al., 1999) and numeracy
(Gal, 2000), statistical literacy should be regarded as a set of capacities that can exist
to different degrees within the same individual, depending on the contexts where it
is invoked or applied. Descriptions of what constitutes statistical literacy may differ
in work contexts, in personal/home contexts, in public discourse contexts, and in
formal learning contexts.
In light of the centrality of statistical literacy in various life contexts, yet also its
complex nature, educators, statisticians, and professionals interested in how well
citizens can interpret and communicate about statistical messages face numerous
challenges and responsibilities. Below is a preliminary discussion regarding two key
areas, education for statistical literacy, and suggested research in this area.
Educational Challenges
Several countries and organizations have introduced programs to improve
school-level education on data analysis and probability, sometimes called data
handling, stochastics, or chance (Australian Education Council, 1991; NCTM,
2000). Yet, at the school level, where most individuals will receive their only formal
exposure to statistics (Moore, 1998), these topics overall receive relatively little
curricular attention compared to other topics in the mathematical sciences. The most
credible information in this regard comes from the curriculum analysis component
of TIMSS, the Third International Mathematics and Science Study (Schmidt,
McKnight, Valverde, Houang, & Wiley, 1997), which examined curriculum
documents and textbooks and consulted with expert panels from over 40 countries.
TIMSS data also pointed to an enormous diversity in curricular frameworks.
Various gaps have been documented by TIMSS between the intended and
72
IDDO GAL
implemented curriculum, (i.e., between curriculum plans and what actually appears
in mainstream textbooks, which tend to be conservative).
TIMSS tests included few statistics items; hence, it was not possible to create a
separate scale describing student performance in statistics. However, achievement
on individual statistical tasks was problematic. For example, Mullis, Martin, Beaton,
Gonzalez, Kelly, & Smith (1998) reported performance levels of students in their
final year of schooling (usually grade 12) on a task directly related to statistical
literacy: Explain whether a reporter’s statement about a “huge increase” was a
reasonable interpretation of a bar graph showing the number of robberies in two
years that was manipulated to create a specific impression. The graph included a bar
for each year but a truncated scale, causing a small difference between years to
appear large. Performance levels varied across countries; on average, less than half
of all graduating students appeared to be able to cope (at least partially) with this
task that exemplifies one of the most basic skills educators usually use as an
example for a statistical literacy skill expected of all citizens—the ability to detect a
discrepancy between displayed data and a given interpretation of these data.
Keeping in mind that in many countries a sizable proportion of students drop out or
leave before the final year of high school, the overall percentage of all school
leavers who can cope with such tasks is bound to be even lower.
Efforts to improve statistics education at the secondary or postsecondary levels
examine needed changes in a range of areas, including in content and methods,
teacher preparation and training, assessments, and the use of technology (e.g., Cobb,
1992; Pereira-Mendoza, 1993; Gal & Garfield, 1997; Lajoie, 1998). Yet a crucial
question is, to what extent can such efforts develop students’ interpretive and
statistical literacy skills? To appreciate the complexity of the issues implicated by
this question, consider the situation in the related area of scientific literacy.
Eisenhart, Finkel, & Marion (1996) have argued that the broad, progressive, and
inclusive vision of scientific literacy in reform proposals is being implemented in
narrow and conventional ways; hence reform efforts may not lead to significant
changes in national scientific literacy. To help define educational goals, it may be
possible to identify levels of statistical literacy (Watson, 1997; Watson & Moritz,
2000) in a similar fashion to the continuum proposed to describe levels of scientific
literacy (Shamos, 1995).
This paper argues that statistical literacy depends on possession of elements
from all five different knowledge bases; and that literacy skills, contextual
knowledge, critical skills, and needed dispositions play a significant role in this
regard. It is not at all clear that learning statistical facts, rules, and procedures, or
gaining personal statistical experience through a data-analysis project in a formal
classroom enquiry context can in itself lead to an adequate level of statistical
literacy.
Calls to change traditional approaches to teaching statistics have been repeatedly
made in recent years, and met with some success (Moore & Cobb, 2000). Yet,
educators have to distinguish between teaching more statistics (or teaching it better)
and teaching statistics for a different (or additional) purpose. Literacy demands
facing students who are learning statistics are more constrained than those described
in the section on “Literacy skills” as characterizing reading contexts. When students
STATISTICAL LITERACY
73
who learn statistics read or listen to project reports created by their fellow students
(Starkings, 1997), or when they read academic research papers, findings and
conclusions are likely to be shared through language that is less varied than what
appears in real-world sources. This may happen because academic conventions
inhibit or channel the type of expressions and styles that authors, students, and
teachers are expected to use, or due to logistical limitations in large introductory
statistics courses that restrict the richness and scope of classroom discourse that
teachers can afford to conduct (Wild, Triggs, & Pfannkuch, 1997). Unlike
consumers of the media, when students encounter an unfamiliar or ambiguous term,
they can clarify its interpretation by talking with their teacher. The upshot is that the
literacy demands in statistics classes do not necessarily represent the heterogeneous
communicative environment within which adults in general have to cope with
statistical messages.
To develop statistical literacy, it may be necessary to work with learners, both
younger students and adults, in ways that are different from, or go beyond,
instructional methods currently in use. To better cover all knowledge bases
supporting statistical literacy, topics and skills that are normally not stressed in
regular statistics modules or introductory courses, for lack of time or teacher
preparation, may have to be addressed. Some examples are
•
•
•
•
•
Understanding results from polls, samples, and experiments (Landwehr,
Swift, & Watkins, 1987; MacCoun, 1998) as reported in newspapers or
other media channels
Understanding probabilistic aspects of statements about risk and side effects
(Clemen & Gregory, 2000) as reported in newspapers or other media
channels
Learning about styles, conventions, and biases in journalistic reporting or
advertisements
Gaining familiarity with “worry questions” (Table 3), coupled with
experience in applying them to real examples (such as one-sided messages,
misleading graphs), or seeing someone else (e.g., a teacher) model their
application
Developing a critical stance and supporting beliefs, including positive beliefs
and attitudes about the domain (usefulness of statistical investigations) and
oneself
TIMSS reports on curriculum planning and other school-related variables imply
that young people who will be leaving schools in coming years may continue to
have insufficient preparation in data analysis and probability. An important and
presently much larger population is that of adults in general. The majority of the
current adult population in any country has not had much if any formal exposure to
the statistical or mathematical knowledge bases described earlier, given known
education levels across the world (Statistics Canada & OECD, 1996; UNESCO,
2000). As IALS (OECD & Human Resources Development Canada, 1997) and
other studies have shown, even in industrialized countries, literacy levels of many
adults are low. This paper argues that literacy skills, including document literacy
74
IDDO GAL
skills, are an important component of the knowledge base needed for statistical
literacy. It follows that achieving the vision of “statistical literacy for all” will
require a concerted effort by various educational and other systems, both formal and
informal.
Large numbers of adult learners receive important educational services from
adult basic education centers, adult literacy programs, workplace learning and
union-based programs, and continuing education or tertiary institutions. These
services have an important role in promoting statistical literacy of adults, and some
have began to formally recognize the need to attend to statistical issues and to
critical evaluation of messages as part of designing curricula for adult learners
(European Commission, 1996; Curry et al., 1996; Stein, 2000). Yet, media
organizations and media professionals (Orcutt & Turner, 1993), public and private
agencies and institutes that communicate with the public on statistical matters, such
as national statistical offices (Moore, 1997b), and even marketers and advertisers
(Crossen, 1994), all have some responsibility in this regard. All the above
stakeholders will have to devise innovative and perhaps unorthodox ways in order to
jointly reach and increase statistical literacy in the general population.
Research and Assessment Challenges
As pointed out earlier, the current knowledge base about statistical literacy of
school or university students and of adults in general is patchy. In the absence of
solid empirical information, the speculative ideas raised in this paper may not
translate into action by decision makers who are in a position to allocate resources to
educational initiatives. Three related areas where further research is needed are as
follows.
Research on Students’ and Adults’ Statistical Literacy Skills
Studies such as TIMSS (aimed at school students) and IALS (aimed at adults)
provided useful but only preliminary data on restricted aspects of people’s statistical
literacy, mainly because their main thrust was planned to address other mathematical
topics. Many knowledge elements basic to statistical literacy were left out of these
assessments (e.g., understanding of averages and medians, knowledge about
sampling or experimental designs, or understanding of chance-related statements).
New international large-scale assessments, such as OECD’s Program for
International Student Achievement (http://www.pisa.oecd.org), or the Adult Literacy
and Lifeskills survey (http://nces.ed.gov) will include broader coverage of statistical
matters, in line with expanded notions of mathematical literacy and numeracy
developed for these projects. However, given the restrictions on testing time in
large-scale studies and the number of domains competing for item coverage, focused
studies are needed that can provide more comprehensive information on statistical
literacy skills and related attitudes, and on gaps in this regard. Qualitative studies
should further enable in-depth examination of thinking processes, comprehension,
and effects of instruction in this regard.
STATISTICAL LITERACY
75
Research on Statistical Literacy Demands of Various Functional Environments
The Joram et al. (1995) findings reported earlier shed some light on the range of
ways in which selected statistical and numerical information can be conveyed to
readers of magazines, and point to the strong linkage between literacy and statistical
elements in print media. Yet, little is known about the demands facing consumers of
other media channels, such as daily newspapers, workplace materials, or TV
broadcasts, and with regard to a range of statistical and probabilistic topics beyond
rational numbers. The absence of credible data from which to establish the statistical
literacy requirements in the full range of domains where adults have to function is
alarming. Research in this area, taking into account variation both within and
between countries, is a prerequisite for designing effective and efficient instruction
that aims at different levels of statistical literacy.
Research on Dispositional Variables
This paper argued that a view of statistical literacy as an action-oriented set of
interrelated knowledge bases and skills, one which people will actually use in
everyday contexts, must consider people’s inclination to apply a critical stance and
the motivations, beliefs, and attitudes that affect or support statistically literate
behavior. However, the conceptualization and assessment of these variables present
many challenges (Gal et al., 1997). Development of research methods in this regard
is essential for understanding the forces that shape statistically literate behavior in
different contexts. Changes in dispositions should be measured as part of evaluating
the impact of educational interventions aimed at improving statistical literacy of
people in all walks of life.
REFERENCES
American Association for the Advancement of Science (AAAS) (1995). Benchmarks for science literacy.
Washington, DC: Author.
Australian Education Council (1991). A national statement on mathematics for Australian schools.
Carlton, Victoria: Curriculum Corporation.
Baron. J. (1988). Thinking and deciding. New York: Cambridge University Press.
Bowen, D. E., & Lawler, E. E. (1992, Spring). The empowerment of service workers: What, why, how,
and when. Sloan Management Review, 31–39.
Bright, G. W., & Friel, S. N. (1998). Graphical representations: Helping students interpret data. In S. P.
Lajoie (Ed.), Reflections on statistics: Learning, teaching, and assessment in grades K–12 (pp. 63–
88). Mahwah, NJ: Erlbaum.
Carnevale, A. P., Gainer, L. J., & Meltzer, A. S. (1990). Workplace basics: The essential skills employers
want. San Francisco: Jossey-Bass.
Cobb, G. W. (1992). Teaching statistics. In L. A. Steen (Ed.), Heeding the call for change: Suggestions
for curricular action (pp. 3–43). Washington, DC: Mathematical Association of America.
Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and teaching. American Mathematical
Monthly, 104, 801–823.
Cocking, R. R., & Mestre, J. P. (Eds.). (1988). Linguistic and cultural influences on learning
mathematics. Hillsdale, NJ: Erlbaum.
76
IDDO GAL
Clemen, R., & Gregory, R. (2000). Preparing adult students to be better decision makers. In I. Gal (Ed.),
Adult Numeracy Development: Theory, Research, Practice. Cresskill, NJ: Hampton Press.
Crossen, C. (1994). Tainted truth: The manipulation of fact in America. New York: Simon & Schuster.
Curcio, F. R. (1987). Comprehension of mathematical relationships expressed in graphs. Journal for
Research in Mathematics Education, 18, 382–393.
Curry, D., Schmitt, M. J., & Waldron, W. (1996). A Framework for adult numeracy standards: The
mathematical skills and abilities adults need to be equipped for the future. Final report from the
System Reform Planning Project of the Adult Numeracy Network. Washington, DC: National
Institute for Literacy. Available online at http://www.std.com/anpn/
Edwards, K. (1990). The interplay of affect and cognition in attitude formation and change. Journal of
Personality and Social Psychology, 59, 202–216.
Eisenhart, M., Finkel, E., & Marion, S. F. (1996). Creating the conditions for scientific literacy: A reexamination. American Educational Research Journal, 33(2), 261–295.
European Commission. (1996). White paper on education and training: Teaching and learning—towards
the learning society. Luxembourg: Office for official publications of the European Commission.
Frankenstein, M. (1989). Relearning mathematics: A different “R”—radical mathematics. London: Free
Association Books.
Freire, P. (1972). Pedagogy of the oppressed. New York: Penguin.
Friel, S. N., Russell, S., & Mokros, J. R. (1990). Used numbers: Statistics: middles, means, and
in-betweens. Palo Alto, CA: Dale Seymour Publications.
Gal, I. (1994, September). Assessment of interpretive skills. Summary of working group, Conference on
Assessment Issues in Statistics Education. Philadelphia, PA.
Gal, I. (1995). Statistical tools and statistical literacy: The case of the average. Teaching Statistics, 17(3),
97–99.
Gal, I., & Baron, J. (1996). Understanding repeated simple choices. Thinking and Reasoning, 2(1), 1–18.
Gal, I. (1997). Numeracy: Reflections on imperatives of a forgotten goal. In L. A. Steen (Ed.),
Quantitative literacy (pp. 36–44). Washington, DC: College Board.
Gal, I. (1998). Assessing statistical knowledge as it relates to students’ interpretation of data. In S. Lajoie
(Ed.), Reflections on statistics: Learning, teaching, and assessment in grades K–12 (pp. 275–295).
Mahwah, NJ: Erlbaum.
Gal, I. (1999). Links between literacy and numeracy. In D. A. Wagner, R. L. Venezky, and B. Street
(Eds.), Literacy: An international handbook (pp. 227–231). Boulder, CO: Westview Press.
Gal, I. (2000). The numeracy challenge. In I. Gal (Ed.), Adult numeracy development: Theory, research,
practice (pp. 1–25). Cresskill, NJ: Hampton Press.
Gal, I., & Garfield, J. (Eds.). (1997). The assessment challenge in statistics education. Amsterdam,
Netherlands: International Statistical Institute/IOS Press.
Gal, I., Ginsburg, L., & Schau, C. (1997). Monitoring attitudes and beliefs in statistics education. In I. Gal
& J. B. Garfield (Eds.), The assessment challenge in statistics education (pp. 37–54). Amsterdam,
Netherlands: International Statistical Institute/IOS Press.
Garfield, J. B., & Gal, I. (1999). Assessment and statistics education: Current challenges and directions.
International Statistical Review, 67(1), 1–12.
Green, K. E. (1993, April). Affective, evaluative, and behavioral components of attitudes toward
statistics. Paper presented at the annual meeting of the American Educational Research Association,
Atlanta, GA.
Hooke, R. (1983). How to tell the liars from the statisticians. New York: Marcel Dekker.
Huff, D. (1954). How to lie with statistics. New York: Norton.
Jenkins, E. W. (1996). Scientific literacy: A functional construct. In D. Baker, J. Clay, & C. Fox (Eds.),
Challenging ways of knowing in English, maths, and science (pp. 43–51). London: Falmer Press.
Joram, E., Resnick, L., & Gabriele, A. J. (1995). Numeracy as a cultural practice: An examination of
numbers in magazines for children, teenagers, and adults. Journal for Research in Mathematics
Education, 26(4), 346–361.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.) (1982). Judgment under uncertainty: Heuristics and
biases. New York: Cambridge University Press.
Kolata, G. (1997). Understanding the news. In L. A. Steen (Ed.), Why numbers count: Quantitative
literacy for tomorrow’s America (pp. 23–29). New York: The College Board.
Kirsch, I., & Mosenthal, P. (1990). Understanding the news. Reading Research Quarterly, 22(2), 83–99.
STATISTICAL LITERACY
77
Kirsch, I. S., Jungeblut, A., Jenkins, L., & Kolstad, A. (1993). Adult literacy in America: A first look at
the results of the National Adult Literacy Survey. Washington, DC: National Center for Education
Statistics.
Kirsch, I. S., Jungeblut, A., & Mosenthal, P. B (1998). The measurement of adult Literacy. In S. T.
Murray, I. S. Kirsch, & L. B. Jenkins (Eds.), Adult literacy in OECD countries: Technical report on
the first International Adult Literacy Survey (pp. 105-134). Washington, DC: National Center for
Education Statistics, U.S. Department of Education.
Konold, C. E. (1989a). Informal conceptions of probability. Cognition and Instruction, 6, 59–98.
Kosonen, P. & Winne, P. H. (1995). Effects of teaching statistical laws on reasoning about everyday
problems. Journal of education psychology, 87(1), 33-46.
Laborde, C. (1990). Language and mathematics. In P. Nesher & J. Kilpatrick (Eds.), Mathematics and
cognition (pp. 53–69). New York: Cambridge University Press.
Lajoie, S. P. (Ed.). (1998). Reflections on statistics: Learning, teaching, and assessment in grades K–12.
Mahwah, NJ: Erlbaum.
Landwehr, J. M., Swift, J., & Watkins, A. E. (1987). Exploring surveys and information from samples.
(Quantitative literacy series). Palo Alto, CA: Dale Seymour Publications.
MacCoun, R. J. (1998). Biases in the interpretation and use of research results. Annual Review of
Psychology 49, 259–287.
McLeod, D. B. (1992). Research on affect in mathematics education: A reconceptualization. In D. A.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 575–596). New
York: Macmillan.
Mellers, B. A., Schwartz, A., & Cooke, D. J. (1998). Judgment and decision making. Annual Review of
Psychology, 49, 447–477.
Moore, D. S. (1990). Uncertainty. In L. A. Steen (Ed.), On the shoulders of giants: New approaches to
numeracy (pp. 95–137). Washington, DC: National Academy Press.
Moore, D. S. (1997a). New pedagogy and new content: The case of statistics. International Statistical
Review, 65(2), pp. 123–165.
Moore, D. S. (1997b). Statistics: Concepts and Controversies. San Francisco: Freeman.
Moore, D. S. (1998). Statistics among the liberal arts. Journal of the American Statistical Association,
93(444), 1253–1259.
Moore, D. S., & Cobb, G. W. (2000). Statistics and mathematics: Tension and cooperation. American
Mathematical Monthly, 107(7), 615–630
Mosenthal, P. B., & Kirsch, I. S. (1998). A new measure for assessing document complexity: The
PMOSE/IKIRSCH document readability formula. Journal of Adolescent and Adult Literacy, 41(8),
638–657.
Mullis, I. V. S., Martin, M. O., Beaton, A. E., Gonzalez, E. J., Kelly, D. L., & Smith, T. A. (1998).
Mathematics and science achievement in the final year of secondary school: IEA’s Third
International Mathematics and Science Study (TIMSS). Boston: Center for the Study of Testing,
Evaluation, and Educational Policy.
National Council of Teachers of Mathematics (NCTM). (2000). Principles and standards for school
mathematics. Reston, VA: Author.
Orcutt, J. D., & Turner, J. B. (1993). Shocking numbers and graphic accounts: Quantified images of drug
problems in the print media. Social Problems, 40(2), 190–206.
Organization for Economic Co-operation and Development (OECD) and Human Resources Development
Canada (1997). Literacy for the knowledge society: Further results from the International Adult
Literacy Survey. Paris and Ottawa: OECD and Statistics Canada.
Packer, A. (1997). Mathematical competencies that employers expect. In L. A. Steen (Ed.), Why numbers
count: Quantitative literacy for tomorrow’s America (pp. 137–154). New York: The College Board.
Parker, M., & Leinhardt, G. (1995). Percent: A privileged proportion. Review of Educational Research,
65(4), 421–481.
Paulos, J. A. (1995). A mathematician reads the newspaper. New York: Anchor Books/Doubleday.
Pereira-Mendoza, L. (Ed.). (1993). Introducing data-analysis in the schools: Who should teach it and
how? Voorburg, Holland: International Statistical Institute.
Rutherford, J. F. (1997). Thinking quantitatively about science. In L. A. Steen (Ed.), Why numbers count:
Quantitative literacy for tomorrow’s America (pp. 69–74). New York: The College Board.
Shamos, M. H. (1995). The myth of scientific literacy. New Brunswick, NJ: Rutgers University Press.
78
IDDO GAL
Scheaffer, R. L., Watkins, A. E., & Landwehr, J. M. (1998). What every high-school graduate should
know about statistics. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching and
assessment in grades K–12 (pp. 3–31). Mahwah, NJ: Lawrence Erlbaum.
Schmidt, W. H., McKnight, C. C., Valverde, G. A., Houang, R. T., & Wiley, D. E. (1997). Many visions,
many aims (Vol. 1): A cross-national investigation of curricular intentions in school mathematics.
Dordrecht, The Netherlands: Kluwer.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D. A.
Grouws, (Ed.), Handbook of research on mathematics teaching and learning (pp. 465–494). New
York: Macmillan.
Shaughnessy, J. M., Garfield, J. B., Greer, B. (1997). Data handling. In A. Bishop (Ed.), International
handbook on mathematics education (pp. 205–237). Dordrecht, The Netherlands: Kluwer.
Starkings, S. (1997). Assessing student projects. (1997). In I. Gal & J. Garfield, (Eds.), The assessment
challenge in statistics education (pp. 139–151). Voorburg, The Netherlands: International Statistical
Institute and IOS Press.
Statistics Canada and Organization for Economic Co-operation and Development (OECD). (1996).
Literacy, economy, and society: First results from the International Adult Literacy Survey. Ottawa,
Ontario: Author.
Steen, L. A. (Ed.). (1997). Why numbers count: Quantitative literacy for tomorrow’s America. New
York: The College Board.
Stein, S (2000). Equipped for the future content standards: What adults need to know and be able to do in
the 21st century. Washington, DC: National Institute for Literacy. Retrieved January 1, 2001, from
http://www.nifl.gov/lincs/collections/eff/eff_publications.html
Tufte, E. R. (1997). Visual explanations: Images and quantities, evidence and narrative. Cheshire, CT:
Graphics Press.
UNESCO. (1990). Final Report on the World Conference on Education for All (Jomtien, Thailand).
Paris: Author.
UNESCO. (2000). World Education Report: The right to education—Towards education for all
throughout life. Paris: Author.
Wagner, D. A. (1991). Literacy: Developing the future. (International Yearbook of Education, vol.
XLIII). Paris: UNESCO.
Wagner, D. A., Venezky, R. L., & Street, B. V. (Eds.). (1999). Literacy: An International Handbook.
Boulder, CO: Westview Press.
Wainer, H. (1992). Understanding graphs and tables. Educational Researcher, 21(1), 14–23.
Wallman, K. K. (1993). Enhancing statistical literacy: Enriching our society. Journal of the American
Statistical Association, 88, 1–8.
Wallsten, T. S., Fillenbaum, S., & Cox, J. A. (1986). Base rate effects on the interpretations of probability
and frequency expressions. Journal of Memory and Language, 25, 571–587.
Wanta, W. (1997). The public and the national agenda: How people learn about important issues.
Mahwah, NJ: Lawrence Erlbaum.
Watson, J. (1997). Assessing statistical literacy through the use of media surveys. In I. Gal & J. Garfield,
(Eds.), The assessment challenge in statistics education (pp. 107–121). Amsterdam, The Netherlands:
International Statistical Institute/IOS Press.
Watson, J. M., & Moritz, J. B. (2000). Development of understanding of sampling for statistical literacy.
Journal of Mathematical Behavior, 19, 109–136.
Wild, C., Triggs, C., & Pfannkuch, M. (1997). Assessment on a budget: Using traditional methods
imaginatively. In I. Gal & J. B. Garfield (Eds.), The assessment challenge in statistics education (pp.
205–220). Amsterdam, The Netherlands: International Statistical Institute/IOS Press.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–256.
Chapter 4
A COMPARISON OF MATHEMATICAL AND
STATISTICAL REASONING
Robert C. delMas
University of Minnesota, USA
INTRODUCTION
The focus of this chapter is on the nature of mathematical and statistical reasoning.
The chapter begins with a description of the general nature of human reasoning.
This is followed by a description of mathematical reasoning as described by
mathematicians along with recommendations by mathematics educators regarding
educational experiences to improve mathematical reasoning. The literature on
statistical reasoning is reviewed and findings from the general literature on
reasoning are used to identify areas of statistical reasoning that students find most
challenging. Statistical reasoning and mathematical reasoning are compared and
contrasted, and implications for instruction and research are suggested.
THE NATURE OF HUMAN REASONING
While human beings are very intelligent and have produced notable advances of
mind over the millennia, people are still prone to systematic errors of judgment.
Wason and Johnson-Laird (1972) reported on a variety of studies that systematically
explored conditions under which people make reasoning errors. One of the
difficulties faced by researchers of human reasoning is a lack of agreement in the
definition of the phenomenon. Wason and Johnson-Laird (1972) state, “There is, of
course, no clear boundary surrounding this topic. … In our view, it is fruitless to
argue about definitions of terms, and we shall be concerned with how humans draw
explicit conclusions from evidence” (p. 1). In a review of the literature on reasoning
research, Galotti (1989) argues that this lack of agreement on what constitutes
reasoning produces some problems for the interpretation of results. Galotti points
out that “reasoning” is often used interchangeably with terms such as thinking,
problem solving, decision making, critical thinking, and brain storming. The
confusion is compounded in that these different types of thinking are considered to
involve common processes and mental activity, such as the transformation of given
79
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 79–95.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
80
ROBERT C. DELMAS
information on the basis of stored knowledge in order to draw an inference or a
conclusion.
Galotti (1989) offers a definition of reasoning that attempts to distinguish it from
other forms of thinking. According to Galotti, reasoning involves mental activity
that transforms given information, is focused on at least one goal (typically to make
an inference or draw a conclusion), is consistent with initial premises (modified or
unmodified), and is consistent with systems of logic when all premises are specified.
She also adds some caveats: The mental activity does not have to be self-contained
(i.e., the premises may be modified by the reasoner) and the conclusions do not have
to be deductively valid. Therefore, when conducting research on reasoning, it is
important to determine whether or not a person has modified the premises and to
judge the quality of the reasoning accordingly.
Errors in Human Reasoning
Despite the potential for disagreement on the phenomenon being investigated,
there has been a long history of research on the degree to which humans are
naturally rational thinkers. Most of the studies have looked at performance on
abstract, formal reasoning tasks (e.g., syllogisms; tasks solved by propositional or
predicate calculus) where all necessary information is provided (Evans, 1989;
Evans, Newstead, & Byrne, 1993; Oaksford & Chater, 1998; Wason & JohnsonLaird, 1972). Some studies have looked at practical and informal reasoning where
the purpose is more functional and situation specific (Evans et al., 1993; Galotti,
1989). Some general findings that can be summarized from reviews of the literature
(e.g., Evans, 1989; Evans et al., 1993; Galotti, 1989; Gilovich, Griffin, &
Kahneman, 2002; Wason & Johnson-Laird, 1972) are as follows:
•
•
•
•
People have difficulty with drawing a valid conclusion by denying a
negatively stated assumption. In general, people find it hard to track the
effect of double negation in an argument (Evans, 1989; Evans, Newstead, &
Byrne, 1993; Galotti, 1989).
People often change the nature or meaning of premises, even when explicitly
trained in the interpretation of premises (Galotti, Baron, & Sabini, 1986).
When presented with a conditional statement, people act as if a causal
relationship is implied between the antecedent (“If the skies are clear
tonight”) and consequent (“it will be cold tomorrow morning”). Therefore,
they incorrectly believe the antecedent is true if the consequent is affirmed:
“It is very cold this morning, therefore, the skies must have been clear last
night” (Wason & Johnson-Laird, 1972).
Human reasoning is deductive, but it tends to be of a practical nature.
People, in general, do not reason well with purely abstract information.
People show impressive reasoning abilities with complex tasks, but primarily
when they are highly familiar with the materials and situation (Evans,
Newstead, & Byrne, 1993; Kahneman & Tversky, 1982; Wason & JohnsonLaird, 1972).
MATHEMATICAL AND STATISTICAL REASONING
•
•
•
•
•
•
81
When given an abstract task, people inadvertently modify the given
information or premises by including personal knowledge that may or may
not be relevant (Wason & Johnson-Laird, 1972).
While human reasoning is deductive in nature and quite powerful, it does not
seem to act in full accordance with the truth-functional relations of the
propositional calculus in formal logic (Galotti, 1989).
People do not tend to consider all possible interpretations of a premise
(Erickson, 1978; Johnson-Laird, 1983) or multiple ways of combining
premises (Johnson-Laird, 1983). This leads to a consideration of information
and implications that are not exhaustive, which in turn may lead to erroneous
conclusions (Baron, 1985). One particular example of this is confirmation
bias (Evans, 1989; Nisbett & Ross, 1980; Ross & Anderson, 1982; Wason,
1977), which is the tendency to look only for confirmatory evidence and not
to consider evidence that could potentially discredit an argument. People
readily accept conclusions they believe to be true and have difficulty
accepting conclusions they believe to be false.
There is evidence that some biases in reasoning can be overcome if feedback
produces a conceptual inconsistency for an individual (Nisbett & Ross,
1980). People tend to adjust their reasoning when they encounter
contradictory evidence, although not all the time.
Reasoning is easily affected by factors that, from a logical standpoint, should
not have an effect. For example, people provide higher frequency estimates
if asked to recall only a few instances (e.g., 3) of an event and lower
estimates when asked to recall many instances (e.g., 9 to 12) relative to just
being asked for a frequency estimate. These effects can be further mediated
by having people consider their level of expertise in an area or by
manipulations that increase or decrease motivation (Schwarz & Vaughn,
2002).
Possibly as a result of confirmation bias and recall effects, people tend to be
overconfident in the validity of their reasoning (Fischhoff, 1982;
Lichtenstein, Fischhoff, & Phillips, 1982).
These general observations have implications for the nature of mathematical and
statistical reasoning. Of most importance are the observations that people have
difficulty with abstract reasoning, that people can reason well in highly familiar
situations, that personal knowledge often intrudes when reasoning, and that people
often fail to consider all possibilities. One possible explanation for biases in human
reasoning is offered by two-system theories of reasoning (Evans, 1995; Evans &
Over, 1996; Sloman, 2002). These theories propose that two separate but interactive
systems of reasoning are employed for most reasoning tasks. One system is
associative in nature and uses regularities in perceived characteristics and temporal
structures to produce automatic responses. The other system, being rule-based in
nature, is more deliberate and systematic (Evans & Over, 1996), which allows it to
override some output from the associative system (Stanovich & West, 2002). When
a person is faced with a problem, both systems may be activated and arrive at
82
ROBERT C. DELMAS
separate responses. While the two responses may be the same (or at least supportive)
in most cases, it is possible for the responses to conflict. In addition, the more
automatic associative system may finish first, producing a response before the rulebased system has a chance to check its validity. Even when both systems run to
conclusion, the associative response can interfere with the output of the rule-based
system (Sloman, 2002). In this way, first impressions can govern a decision before
more rule-based, logical operations are brought into play, producing invalid or
irrelevant conclusions under certain conditions. Evidence for systematic,
nonnormative biases in reasoning that are consistent with predictions from a twosystem reasoning process is found even when factors such as cognitive ability are
accounted for (Stanovich & West, 2002).
THE NATURE OF MATHEMATICAL REASONING
It has been argued that mathematical ideas are essentially metaphorical in nature;
therefore mathematics should not be taught only as methods of formal proof or a set
of calculation techniques. According to Lakoff and Nunez (1997), mathematics “is
all about ideas, and it should be taught as being about ideas” (p. 85). They argue that
the metaphorical nature of mathematics must be taught if instruction is to affect
students’ mathematical reasoning. Lakoff and Nunez believe that an emphasis on
metaphorical thinking can counter the idea that mathematics exists independent of
human minds (because reasoning by metaphor is a characteristic of human
intelligence). However, equating mathematical reasoning solely with metaphorical
reasoning can be taken as evidence that mathematics is a product of mind, a product
that does not necessarily have to correspond to objects or events in the objective
world.
Mathematics, Symbols, and Language
As a discipline, mathematics can be viewed as the study of patterns; therefore,
mathematical reasoning involves reasoning about patterns. Devlin (1998) notes that
mathematics deals with abstract patterns that are distilled from the real world or
“from the inner workings of the human mind” (p. 3). Adding to the level of
abstraction is a reliance of modern mathematics on the use of abstract notation (e.g.,
algebraic expressions). Mathematical fields develop abstract notation systems in
order to work with patterns in efficient and facile ways, but at the cost of added
complexity and a high degree of remoteness from everyday experience and
knowledge. Modern computers can help students visualize some of the notational
representation, but only to a certain extent since a relatively small portion of modern
mathematics lends itself to computer simulation.
Symbolic notation, in and of itself, is not mathematics. To have meaning, the
symbols require mental models of real mathematical entities to serve as referents
(Devlin, 1998). This aspect of mathematics, Devlin argues, is often overlooked
MATHEMATICAL AND STATISTICAL REASONING
83
because of an emphasis on procedures and computations in mathematics instruction.
Nonetheless, he sees mathematics as a purely human creation built of entities that do
not exist in the physical world for they are “pure abstractions that exist only in
humanity’s collective mind” (p. 9). Working with the highly abstract content of
mathematics has proven difficult even for talented mathematicians. Devlin notes that
Newton and Leibniz developed the calculus because they were able to represent
processes of motion and change as functions, and then work with those functions as
mathematical entities. The calculus was derived from a process of successive
approximations and the idea of a limit, a concept for which they could not provide
an acceptable definition. It took some 200 years of development in mathematical
thinking before Weierstrauss conceived of the process of successive approximations
as an entity and presented a precise definition for a limit.
Both language and mathematics can be considered abstract artifacts of human
intellect and culture. Devlin (2000) argues that the mental facilities that humans use
to process language are the very facilities needed to carry out abstract mathematical
thought. Even if this is the case, there may be several reasons why a facility with
language does not directly translate to a facility with mathematics. While language
is abstract in nature (e.g., references can be made to objects in the past and future),
its reference base is often concrete. Even when abstract concepts are the referents
(e.g., love, happiness, despair), there are still human counterparts in emotion and
experience that provide a foundation for meaning. Mathematical thought seems to
require the individual to create mental referents, a process that can result in mental
entities with no physical counterparts. Another factor that may add to difficulties in
mathematical thinking is that it often requires the use of a mathematical proof. The
study of mathematical proof has essentially produced systems of formal logic,
which, as noted earlier, many people find difficult to employ.
Instruction and Mathematical Reasoning
Due to the highly abstract nature of mathematics, modern researchers in
mathematics education place a strong emphasis on instructional methods that help
students learn abstract mathematical concepts by relating them to familiar concepts
and processes. The importance of image-based reasoning in mathematics is well
documented (Devlin, 1998; English, 1997). Mathematicians often find that image or
graphic representations facilitate their reasoning more than other types of symbolic
representation do. However, the ultimate goal is to move the student from “actual
reality” to what Sfard (2000) calls “virtual reality” discourse. Actual reality
discourse can be bounded and mediated by real-world referents. For example,
someone could state, “Her name is Cedar” and point to his dog. By pointing to his
pet the speaker makes it clear that he is not referring to a tree that he thinks is
female. The discourse object exists independent of the concept and can be used to
perceptually mediate the discussion. However, Sfard (2000) argues that a statement
such as 1 4 is equal to 312 is an instance of virtual reality discourse because
perceptual mediation is enacted, at best, with real-world objects that substitute for,
but do not fully represent the concept under discussion. Sfard sees virtual reality
84
ROBERT C. DELMAS
discourse as the primary mode of mathematical communication. As such,
mathematical discourse may not carry a direct impact on human action or reality.
This can create a setting of freedom and exploration for some; but it can also render
mathematics as meaningless, of little importance, and of little interest to others
(Sfard, 2000).
Modern mathematics curriculums recognize that human reasoning is pragmatic
and incorporates real-world problems as devices for making mathematical concepts
and structures meaningful. English (1997) states that “mathematical reasoning
entails reasoning with structures that emerge from our bodily experiences as we
interact with our environment” (p. 4). According to English, four “vehicles of
thought” are used in mathematical reasoning: analogy, metaphor, metonymy, and
imagery. They constitute generic mental devices that are not exclusively used in
mathematical reasoning. All four of the mental devices provide a way to map
concrete experience to mental models or representations of the environment. She
argues that humans require experience with mapping structural information from
concrete experience to a mathematically abstract mental representation (the
foundation of analogy and metaphor) in order to develop mathematical reasoning.
Sfard (2000) notes that both actual and virtual reality discourse are object mediated.
She sees virtual reality discourse as emerging from actual reality discourse in a
process that reminds one of object-oriented programming in computer science; if
actual reality discourse is considered the root for all other discourse, then virtual
reality discourse is seen to inherit templates and properties from real-world referents
through iterative extensions of a concept to abstract contexts. This is similar to
Thompson’s (1985) development of instructional approaches that go beyond
teaching skills and procedures and motivate students to develop abstract, figurative
imagery that encapsulates the structural relationships, operations, and
transformations that apply to mathematical objects. As such, mathematical discourse
can be difficult because there may be no physical referent to serve as the focus of
reasoning and communication. Ultimately, the purpose of mathematical inquiry is to
develop an understanding of mathematical objects that is independent of real-world
contexts (Cobb & Moore, 1997).
Statistical Reasoning and Thinking
In recent years, statisticians have pointed out distinctions between statistics and
mathematics in order to establish statistics as a separate and unique discipline (e.g.,
Moore, 2000; Cobb & Moore, 1997). Statistics may be viewed as similar to
disciplines such as physics that utilize mathematics, yet have developed methods
and concepts that set it apart from mathematical inquiry. Unlike mathematical
reasoning, statistical inquiry is dependent on data (Chance, 2002) and typically
grounded within a context (Cobb & Moore, 1997; Moore, 1998; Pfannkuch & Wild,
2000; Wild & Pfannkuch, 1999). A practicing statistician may use mathematics to
assist in solving a statistical problem, but only after considerable work has been
done to identify the question under investigation, explore data for both patterns and
MATHEMATICAL AND STATISTICAL REASONING
85
exceptions, produce a suitable design for data collection, and select an appropriate
model for data analysis (see Chapter 2).
Statistical thinking and statistical reasoning have often been used
interchangeably to represent the same types of cognitive activity. If reasoning in
general is considered a type of thinking, then how are statistical reasoning and
statistical thinking related? Recent work by Wild and Pfannkuch (1999) has helped
provide a model for statistical thinking that allows it to be distinguished from
statistical reasoning. Lovett (2001) defines statistical reasoning as “the use of
statistical tools and concepts … to summarize, make predictions about, and draw
conclusions from data” (p. 350). This definition does not distinguish statistical
reasoning because it is too similar to the depiction of statistical thinking offered by
Pfannkuch and Wild (see Chapter 2) and Chance (2002). Garfield (2002) offered a
similar definition, but with more emphasis on the “ways” statistical knowledge is
used to make sense of data. Nonetheless, Garfield found that there is very little
consensus on what is involved in statistical reasoning and that research on statistical
reasoning is still in a state of development.
It can be argued that both statistical thinking and reasoning are involved when
working the same task, so that the two types of mental activity cannot necessarily be
distinguished by the content of a problem (delMas, 2002). However, it may be
possible to distinguish the two by the nature of the task. For example, a person who
knows when and how to apply statistical knowledge and procedures demonstrates
statistical thinking. By contrast, a person who can explain why results were
produced or why a conclusion is justified demonstrates statistical reasoning. This
treatment of statistical reasoning is consistent with the definition presented earlier by
Galotti (1989). Examples of statistical reasoning are likely to be found at stages in
people’s thinking where they are asked to state implications, justify a conclusion, or
make an inference. Given this perspective, statistical reasoning is demonstrated
when a person can explain why a particular result is expected or has occurred, or
explain why it is appropriate to select a particular model or representation. Statistical
reasoning is also expressed when a selected model is tested to see if it represents a
reasonable fit to a specified context. This type of explanation typically requires an
understanding of processes that produce data. When students develop an
understanding of processes that produce samples and, consequently, statistics
derived from samples, they may be better prepared to predict the behavior of
sampling distributions and understand procedures that are based on the behavior of
samples and statistics (see Chapter 13).
With this type of understanding, the student can provide reasons and justification
for the statistical methodology that is applicable in a context (i.e., they can think
statistically). These justifications, however, are not context free, and require an
interplay between the concrete and the abstract as the statistical thinker negotiates
the best approach to take in solving a problem. In this way, statistics differs from
mathematical reasoning in that the latter is most often context free (i.e., independent
of the objective world).
86
ROBERT C. DELMAS
DIFFICULTIES IN STATISTICAL REASONING
It seems reasonable to argue that because statistical thinking always occurs
within a concrete context, students should have very little difficulty with statistical
reasoning. This might be expected given the general findings from research on
reasoning that people tend to draw valid conclusions when working with familiar
and concrete materials even when they draw invalid conclusions for isomorphic
problems rendered purely in the abstract (see Evans et al., 1993). Yet, most
instructors of statistics find that students have difficulty with statistical content, let
alone statistical reasoning. Why is this the case?
The Abstract Nature of Statistical Content
The answer may be that many of the concepts used in statistics are abstract in
nature, let alone unfamiliar, and reasoning about abstract content is difficult for
many. One source of abstraction comes from the mathematical content of statistics.
For example, mathematical procedures that are used to calculate the mean for a set
of data are likely to produce a value that does not exist in the data set. Many
students may find it difficult to develop an understanding for something that does
not necessarily exist. Just as in mathematics, statistics instruction can use analogies,
metaphors, and images to represent abstract concepts and processes to help students
foster meaning. A common metaphor for the mean is the process of moving a
fulcrum along a beam to balance weights, where the fulcrum plays the counterpart
of the mean. Just as in mathematics, developing an appropriate mental model of the
statistical mean may require extensive experience with the balance beam metaphor.
This type of understanding, therefore, is akin to the mathematical reasoning
presented in the previous section. It should not be surprising that statistics students
have as much difficulty with these aspects of their statistical education as they do
with the abstract content of mathematics.
Even though statistical reasoning may involve an understanding of data and
context, this does not mean that all statistical concepts are concrete and accessible.
A great deal of statistical content requires the type of virtual reality thinking
described by Sfard (2000). It has been suggested that statistics instruction begin with
exploratory data analysis because its hands-on, concrete nature is more accessible
(Cobb & Moore, 1997). Even at this elementary level, students are expected to
understand and reason with numerous abstractions. Instruction in exploratory data
analysis presents a variety of graphical techniques that are used to represent and
explore trends and patterns in data. While many aspects of these graphical
techniques are nonmathematical, using them to identify patterns may require a level
of abstraction that students find just as difficult as the abstract patterns encountered
in mathematics. Although graphic representations are based on real data imbedded
within a context, they are nonetheless abstractions that highlight certain
characteristics of the data and ignore others.
Data analysis is dependent on data that is generated by taking measurements. A
measurement can be a very abstract entity (e.g., what does IQ measure?) or very
MATHEMATICAL AND STATISTICAL REASONING
87
unfamiliar (e.g., nitrous oxide concentrations in the blood), so it can be important to
begin instruction with concrete or familiar measurements (e.g., city and highway
miles per gallon [mpg] ratings of automobiles). Even when the data are familiar, a
measurement is an abstraction that represents only one aspect of a complex entity.
Focusing attention on only one “measurement of interest” may be difficult for some
students who are familiar with a context and find it difficult not to consider aspects
they see as more important or more interesting.
Students move to another level of abstraction when asked to graph the data. A
stem-and-leaf plot often requires students to separate the data from the context (e.g.,
the car make and model are not represented in a graph of mpg), and they often lose
some of the measurement detail in order to construct a visual picture of the
distribution. Stems are separated from leaves, and leaves often do not represent all
of the remaining information in a numerical value (e.g., the stem represents the digit
in the one-hundreds place, the leaf represents the digit in the tens place, and the digit
in the ones place is not used at all). Further abstraction can result if the graph is
expanded or contracted in order to search for a meaningful pattern in the data. This
is likely to be a very unfamiliar type of representation for many students, and the
level of abstraction may compound difficulties with attempts to reason from graphic
displays.
Another level of abstraction is created when students are asked to further explore
a data set with a box plot. The box plot is a graphic display commonly used for the
comparison of two or more data sets (see Cobb & Moore, 1997 [p. 89] for an
illustrative example). Box plots remove much of the detail from a data set to make
certain features stand out (e.g., central tendency, variability, positive or negative
skew). Understanding how the abstract representation of a “box” can stand for an
abstract aspect of a data set (a specific, localized portion of its variability) is no
small task. The student must build a relationship between the signifier and the
signified as described by Sfard (2000), yet both the signifier and the signified are
based on abstract constructions of mind. It seems reasonable to expect that many
students will find it difficult to understand graphical representations, even though
the devices appear basic and elementary to the seasoned statistician.
Logic Errors and Statistical Reasoning
As noted earlier, people do not tend to generate multiple possibilities for a given
situation and are prone to confirmation bias. It is reasonable to expect, therefore,
that some students will find it difficult to identify exceptions to trends in order to
test a model, an ability that is associated with sound statistical thinking. This same
difficulty is likely to express itself when students are asked to generate alternatives
during the interrogative cycle of statistical thinking as described by Wild and
Pfannkuch (1999), as well as when instructors try to promote a disposition of
skepticism in their students.
Cobb and Moore (1997) identify several other areas of statistics instruction that
are nonmathematical and uniquely define statistics as a discipline. Experimental
design is a topic found in statistics (and other disciplines) that is typically not part of
88
ROBERT C. DELMAS
the mathematics curriculum. This is an area requiring very little mathematical
background, and it is highly dependent on context. Experimental design does,
however, follow a particular logic. Typically, several assumptions (i.e., premises)
are adhered to; for example, a control condition and a treatment condition differ on
only one characteristic, with all other aspects of the two conditions being equal. If a
reliable difference between two conditions is found in a controlled experiment, then
the difference is attributable to the difference in the characteristic on which the
conditions vary. Although the preceding is certainly an oversimplification of the
details that go into the design of any experiment, it is sufficient for considering how
conclusions are drawn from experimental results. If a reliable difference between
conditions is found, affirmation of the antecedent occurs from which the conclusion
follows that the varied characteristic was responsible. In formal logic this is known
as modus ponens (see Evans et al., 1993). Conversely, if the characteristic that is
varied is not a causal agent, then logic dictates that a reliable difference between the
two conditions will not be found. This is referred to as modus tollens. While people
appear to handle modus ponens reasoning naturally, many have difficulty with
modus tollens (Evans et al., 1993). Students are likely to have similar difficulty
understanding the logic of experimental design.
Formal Inference in Statistics
Formal inference is typically introduced in a first course of statistics. Formal
inference involves rules for drawing conclusions about the characteristics of a
population based on empirical observations of samples taken from the population.
This is often taught using one (or both) of two approaches: confidence intervals or
significance tests (Cobb & Moore, 1997). Either approach requires the disposition
that Wild and Pfannkuch (1999) refer to as “being logical.” Both approaches derive,
in part, from probability theory; but they also involve a logic that is statistical in
nature. Because a complete understanding of these approaches requires logical and
mathematical thinking, many students will find this topic difficult to understand.
The type of logical thinking involved may provide additional insight as to why
formal inference is problematic for many students. As described by Cobb and Moore
(1997), a significance test starts by assuming that an effect of interest is not present
in a population. The reasoning goes something like this: If there is no effect in the
population, then the probability for the size of the effect observed in the sample data
will be high. Conversely, if the effect in the sample data is determined to be of a
sufficiently low probability, this is taken as evidence that the original premise is
false and that the effect does exist in the population.
Mathematics provides knowledge about the expected probability distribution of
observed sample effects when there is no effect in the population. Statistics adds a
probabilistic determination for the cutoff point that establishes when a probability is
sufficiently low. The reasoning that follows is provided by the formal logic of
predicate calculus. The logic of significance tests involves a negative statement in
the premise, a situation that typically results in poorer performance on formal
reasoning tasks. The logical reasoning that establishes evidence of an effect in the
MATHEMATICAL AND STATISTICAL REASONING
89
population follows from modus tollens (i.e., negation of the consequent validates
negation of the antecedent). As noted earlier, people find modus tollens to be a
difficult type of reasoning. On both accounts, students will find the logic of
significance tests difficult to follow. The logic could be made easier by using an
example where negation of the consequent matches commonsense understanding for
a very familiar setting. However, under this condition people may draw a valid
conclusion simply because they “know it is so” and not because they understand the
underlying logic.
Reasoning with Confidence Intervals
A confidence interval takes a different approach to formal inference by
providing an interval estimate of a population characteristic. The interval is based on
data from a single sample and, therefore, is not guaranteed to capture the true value
of the population characteristic due to sampling variability. Probability theory can
provide an estimate of how likely (or how often) a random sample drawn from a
population will capture the population value. This probability is taken as the level of
confidence. Therefore, the meaning of a 95% confidence interval is based on the
understanding that there is a 95% chance that a single randomly selected sample will
be one of the samples that provides a confidence interval that captures the
population characteristic. This understanding requires a complex mental model of
several related concepts, which alone may make reasoning from confidence intervals
difficult. In addition, formal inference based on confidence intervals appears to
follow the same logic as significance tests. The confidence interval has a reasonably
high probability of capturing the true population characteristic. Under the
assumption of no effect in the population (e.g., two groups really come from the
same population, so the difference between the two groups should be zero), the
confidence interval is very likely to contain no effect (i.e., to capture zero). The
conclusion that there is an effect in the population follows if the confidence interval
does not contain zero (i.e., the consequent is negated). Once again, the situation
requires reasoning based on a negated premise and modus tollens.
COMPARISON OF STATISTICAL REASONING AND MATHEMATICAL
REASONING
It is reasonable to ask at this point how mathematical and statistical reasoning
compare and contrast with each other. Mathematical and statistical reasoning should
place similar demands on a student and display similar characteristics when the
student is asked to reason with highly abstract concepts and relationships. When
students are asked to reason primarily with abstract concepts, a great deal of
concentration and persistence may be required to find relationships among the
concepts. This can lead to erroneous judgments and conclusions if a student is
unable to sustain the effort. Solutions may be based on the output of associative
90
ROBERT C. DELMAS
processes that fall short of the reflection and integration needed for a complete
understanding.
A statistical problem can provide an illustrative example for both mathematical
and statistical reasoning. A problem on a statistics exam might present a bivariate
plot, summary statistics including the value of the correlation, and formulas for
calculating the slope and ordinate of the y-intercept. When asked to find the slope
and y-intercept, many students will not use the formulas that are provided. Instead,
they may pick two plotted points that seem “typical” of the bivariate plot, derive a
value for the slope using procedures learned in linear algebra, and subsequently
calculate a value for the ordinate of the y-intercept. This “reasoning” may not be
reasoning at all, but merely the result of well-rehearsed associative memory where
“find the slope” retrieves a familiar procedure without questioning the fit of the
procedure to the context. A student acting in this fashion seems to lack either a
rudimentary mathematical understanding (e.g., that the model requires all points to
form a straight line) or statistical understanding (e.g., that the model must take into
account the inherent variability in the bivariate plot).
When students work within very familiar contexts or with well-rehearsed
concepts and procedures, very few difficulties and errors are expected to occur,
regardless of whether the content is statistical or mathematical. The previous
example illustrates a common misunderstanding among students that, when
recognized, provides an opportunity to help students develop a deeper understanding
of both mathematical and statistical concepts by promoting an understanding of the
contexts under which it is appropriate to apply the respective models. Once ample
opportunity is provided to distinguish between the mathematical and statistical
contexts, and to apply the respective procedures, errors are more likely to be
mechanical than rational in nature.
While mathematical and statistical reasoning appear similar, there are some
differences in the common practices of each discipline that may result in different
sources of reasoning difficulty. Model abstraction is a general task that is common
to both disciplines. The nature of the task, however, is somewhat different between
statistics and mathematics. In mathematics, context may or may not play a large
role. Initially, mathematics instruction may use familiar contexts to motivate and
make accessible the underlying structure of abstract concepts. During this period of
instruction, students might be misled by aspects of the context that are familiar yet
irrelevant to an understanding of the underlying mathematical concept. Through
guided inquiry or constructivist approaches that require the student to test models
and assumptions against feedback derived from the context, students may eventually
develop well-structured mental models of the mathematical object. At that point, the
student may no longer require problem contextualization to reason with the
mathematical concept. Further work with the concept may be conducted in a purely
imaginary, figurative, and abstract way that does not require the student to relate
back to any of the original contexts used to promote understanding. At this point, the
student manipulates mathematical concepts and coordinates multiple relationships in
a purely mental world that may have no real-world referents other than symbolic
representations. This can produce significant cognitive demands that make the
mathematical reasoning quite difficult.
MATHEMATICAL AND STATISTICAL REASONING
91
In the practice of statistics, model abstraction always begins with a context.
When this practice is taught in the statistics classroom, the student is dependent on
the characteristics of the context to guide model selection and development. In some
respects, this may be a more difficult task than the purely mental activity required in
mathematical reasoning. During model selection and construction, the student faces
some of the same cognitive demands that are required by abstract reasoning while
having to check the model’s validity against the context. As demonstrated in
numerous studies, reasoning from a context can produce a variety of errors.
Therefore, no matter how practiced and skilled the student (or statistician), she must
always guard against the intrusion of everyday knowledge that is irrelevant or
misleading. She must also guard against the use of heuristic, associative processes
that may naturally come into play, yet lead to erroneous interpretations or the
perception of relationships that do not actually exist. If the student successfully
navigates these pitfalls, statistical analyses suggested by the model can be
conducted. The student must then take the results and relate them back to the
original context. This translation or mapping represents another potential source of
error as multiple relationships must be tracked and validated, and context once again
has an opportunity to influence reasoning.
In summary, it is likely that many aspects of statistical and mathematical
reasoning are highly similar. The task demands of each discipline, however, may
produce different sources of reasoning error. While instruction can be driven and
facilitated by contextualization in both disciplines, statistical practice is highly
dependent on real-world context whereas mathematical practice tends to be removed
from real-world context (Cobb & Moore, 1997). The dependence on context in
statistical reasoning may lead to errors in reasoning, some of which are difficult to
overcome even for well-educated and experienced professionals.
IMPLICATIONS FOR STATISTICS EDUCATION AND RESEARCH
Instruction
Statistical reasoning needs to become an explicit goal of instruction if it is to be
nourished and developed. Just as in mathematics instruction, experiences in the
statistics classroom need to go beyond the learning of procedures to methods that
require students to develop a deeper understanding of stochastic processes. Given
that there is mathematical content in statistics along with the abstract nature of many
statistical concepts, research on the use of analogy, metaphor, and imagery by
mathematics educators should not be overlooked (e.g., English, 1997a; Thompson,
1985). Such approaches may help students map data and processes between abstract
representations and context, and help them to generate and test their own
representations. Both mathematics (e.g., Kelly & Lesh, 2000) and statistics
educators (Cobb & Moore, 1997) recommend instruction that is grounded in
92
ROBERT C. DELMAS
concrete, physical activities to help students develop an understanding of abstract
concepts and reasoning.
To promote statistical reasoning, students must experience firsthand the process
of data collection and explore the behavior of data, experiences that everyday events
do not readily provide (Moore, 1998). This should help students gain familiarity and
understanding with concepts that are difficult to experience in everyday life (e.g.,
the sampling distribution of a statistic). These experiences should include the
opportunity to ask why and how data is produced, why and how statistics behave,
and why and how conclusions can be drawn and supported (delMas, 2002). Students
will more than likely need extensive experience with recognizing implications and
drawing conclusions in order to develop a disposition for “being logical.” Methods
for presenting statistical content in ways that match natural ways of thinking and
learning should be sought. One promising approach involves instruction that is
based on frequency representations of situations (e.g., Sedlmeier, 1999), which can
be seen as a natural extension of incorporating data and data production into
instruction. Another promising approach is the use of predict-and-test activities
(e.g., delMas, Garfield, & Chance, 1999), which provide students the opportunity to
confront and correct misunderstandings about stochastic processes.
Statistics Education Research
The past decade witnessed the initiation of a reform movement in statistics
education that focuses on statistical thinking, conceptual understanding, use of
technology, authentic assessment, and active learning (e.g., Cobb, 1992). Much of
this movement has been motivated by research in mathematics education, education,
and psychology (e.g., Garfield, 1995), and there appears to have been significant
impact on teaching practices from these recommendations (Garfield, Hogg, Schau,
& Whittinghill, 2002). Statistics is being taught to increasing numbers of students at
all ages as quantitative reasoning is seen as essential for effective citizenship (e.g.,
National Council of Teachers of Mathematics [NCTM] Standards, 2000). The
content, pedagogy, and use of technology in introductory statistics courses have
been modernized to focus on concepts, real data, effective use of technology, and
statistical thinking (e.g., Cobb, 1992; Moore, 1997). New resources are now
available to enable instructors to implement these changes (e.g., Moore, 2001).
However, while statistics instruction has seen dramatic growth and attention,
research devoted exclusively to issues in statistics education has not. One of the
most neglected areas is research devoted to understanding students’ statistical
reasoning. For example, a great deal is known about the errors and misconceptions
that students make when reasoning about problems in probability (e.g., Gilovich,
Griffin, & Kahneman, 2002; Kahneman, Slovic, & Tversky, 1982; Sedlmeier, 1999;
Shaughnessy, 1992). Most of these studies use forced-choice items in comparative
studies as measures of students’ thinking. Very few studies use clinical methods to
document and model students’ thought processes as they reason (Shaughnessy,
1992), although there are certainly some exceptions (e.g., Konold, 1989; Mokros &
Russell, 1995).
MATHEMATICAL AND STATISTICAL REASONING
93
The research programs presented at the Statistical Reasoning, Thinking, and
Literacy forums (SRTL-1 and SRTL-2) indicate that classroom research and clinical
interview methodologies are starting to be utilized in the study of students’
statistical thinking. These methodologies have developed to a point where they can
provide considerable insight into students’ reasoning (e.g., Kelly & Lesh, 2000).
Future research needs to go beyond the documentation of errors and
misunderstandings to probing for an understanding of the processes and mental
structures that support both erroneous and correct statistical reasoning. The previous
section discussed areas of statistics instruction where students are likely to encounter
difficulty in understanding the expected statistical reasoning. While it may make
sense to expect such difficulties, empirical evidence is needed to establish if
difficulties exist and to explicate their nature. A deeper understanding of students’
mental models and processes will improve the design of educational approaches for
developing students’ statistical reasoning. More detailed descriptions of the
cognitive processes and mental structures that students develop during instruction
should provide a richer foundation from which to interpret the effects of
instructional interventions.
REFERENCES
Baron, J. (1985). Rationality and intelligence. Cambridge, UK: Cambridge University Press.
Chance, B. L. (2002). Components of statistical thinking and implications for instruction and assessment.
Journal
of
Statistics
Education,
10(3).
Retrieved
April
7,
2003,
from
http://www.amstat.org/publications/jse/
Cobb, G. (1992). Teaching statistics. In L. A. Steen (Ed.), Heeding the call for change: Suggestions for
curricular action (Notes: vol. 22, 3–43). Washington, DC: Mathematical Association of America.
Cobb, G. W., & Moore, D. (1997). Mathematics, statistics, and teaching. American Mathematical
Monthly, 104, 801–823.
delMas, R. (2002). Statistical literacy, reasoning, and thinking: A commentary. Journal of Statistics
Education, 10(3). Retrieved April 7, 2003 from http://www.amstat.org/publications/jse/
delMas, R., Garfield, J., & Chance, B. (1999). A model of classroom research in action: Developing
simulation activities to improve students’ statistical reasoning. Journal of Statistics Education, 7(3).
Retrieved April 7, 2003 from http://www.amstat.org/publications/jse/
Devlin, K. (1998). The language of mathematics: Making the invisible visible. New York: Freeman.
Devlin, K. (2000). The math gene: Why everyone has it, but most people don't use it. London: Weidenfeld
& Nicolson.
English, L. D. (1997). Analogies, metaphors, and images: Vehicles for mathematical reasoning. In L. D.
English (Ed.), Mathematical reasoning: Analogies, metaphors, and images. Hove, UK: Erlbaum, 3–
18.
Erickson, J. R. (1978). Research on syllogistic reasoning. In R. Revlin & R. E. Mayer (Eds.), Human
reasoning. Washington, DC: Winston.
Evans, J. St. B. T. (1989). Bias in human reasoning: Causes and consequences. Hillsdale, NJ: Erlbaum.
Evans, J. St. B. T. (1995). Relevance and reasoning. In S. E. Newstead & J. St. B. T. Evans (Eds.),
Perspectives on thinking and reasoning: Essays in honour of Peter Wason. Hillsdale, NJ: Erlbaum,
147–171.
Evans, J. St. B. T., Newstead, S. E., & Byrne, R. M. J. (1993). Human reasoning: The psychology of
deduction. Hillsdale, NJ: Erlbaum.
Evans, J. St. B. T., & Over, D. E. (1996). Rationality and reasoning. Hove, England: Psychology Press.
Fischhoff, B. (1982). For those condemned to study the past: Heuristics and biases in hindsight. In D.
Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases. New
York: Cambridge University Press, 335–351.
94
ROBERT C. DELMAS
Galotti, K. M. (1989). Approaches to studying formal and everyday reasoning. Psychological Bulletin,
105(3), 331–351.
Galotti, K. M., Baron, J., & Sabini, J. P. (1986). Individual differences in syllogistic reasoning: Deduction
rules or mental models? Journal of Experimental Psychology: General, 115, 16–25.
Garfield, J. (1995). How students learn statistics. International Statistical Review, 63, 25–34.
Garfield, J. B. (2002). The challenge of developing statistical reasoning. Journal of Statistics Education,
10(3), Retrieved April 7, 2003 from http://www.amstat.org/publications/jse/.
Garfield, J., Hogg, B., Schau, C., and Whittinghill, D. (2002). First courses in statistical science: The
status of educational reform efforts. Journal of Statistics Education, 10(2). Retrieved April 7, 2003
from http://www.amstat.org/publications/jse/
Gilovich, T., Griffin, D., & Kahneman, D. (Eds.) (2002). Heuristics and biases: The psychology of
intuitive judgment. New York: Cambridge University Press.
Johnson-Laird, P. N. (1983). Mental models. Cambridge, MA: Harvard University Press.
Kahneman, D., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. In D.
Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases. New
York: Cambridge University Press, 3–20.
Kahneman, D., Slovic, P., & Tversky, A. (eds.) (1982). Judgment under uncertainty: Heuristics and
biases. New York: Cambridge University Press.
Kelly, A. E., & Lesh, R. A. (Eds.). (2000). Handbook of research design in mathematics and science
education. Mahwah, NJ: Erlbaum.
Konold, C. (1989). Informal conceptions of probability. Cognition and Instruction, 6, 59–98.
Lakoff, G., & Nunez, R. E. (1997). The metaphorical structure of mathematics: Sketching out cognitive
foundations for a mind-based mathematics. In L. D. English (Ed.), Mathematical reasoning:
Analogies, metaphors, and images. Hove, UK: Erlbaum, 21–89.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The state of the art to
1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and
biases. New York: Cambridge University Press, 306–334.
Lovett, M. (2001). A collaborative convergence on studying reasoning processes: A case study in
statistics. In S. M. Carver & D. Klahr (Eds.), Cognition and instruction: Twenty-five years of
progress. Hillsdale, NJ: Erlbaum, 347–384.
Mokros, J., & Russell, S. J. (1995). Children’s concepts of average and representativeness. Journal for
Research in Mathematics Education, 26(1), 20–39.
Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistical
Review, 65, 123–137.
Moore, D. (1998). Statistics among the liberal arts. Journal of the American Statistical Association,
1253–1259.
Moore, D. (2000). Statistics and mathematics: Tension and cooperation. American Mathematical
Monthly, 615–630.
Moore, D. (2001). Undergraduate programs and the future of academic statistics. American Statistician,
55(1), 1–6.
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics.
Reston, VA: NCTM.
Nisbett, R., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment.
Englewood Cliffs, NJ: Prentice-Hall.
Oaksford, M., & Chater, N. (1998). Rational models of cognition. New York: Oxford University Press.
Pfannkuch, M., & Wild, C. J. (2000). Statistical thinking and statistical practice: Themes gleaned from
professional statisticians. Statistical Science, 15(2), 132–152.
Ross, L., & Anderson, C. A. (1982). Shortcomings in the attribution process: On the origins and
maintenance of erroneous social assessments. In D. Kahneman, P. Slovic, & A. Tversky (Eds.),
Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press, 129–
152.
Schwarz, N., & Vaughn, L. A. (2002). The availability heuristic revisited: Ease of recall and content of
recall as distinct sources of information. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics
and biases: The psychology of intuitive judgment. New York: Cambridge University Press, 103–119.
Sedlmeier, P. (1999). Improving statistical reasoning: Theoretical models and practical implications.
Hillsdale, NJ: Erlbaum.
MATHEMATICAL AND STATISTICAL REASONING
95
Sfard, A. (2000). Symbolizing mathematical reality into being—or how mathematical discourse and
mathematical objects create each other. In P. Cobb, E. Yackel, & K. McClain (Eds.), Symbolizing and
communicating in mathematics classrooms: Perspectives on discourse, tools, and instructional
design. Mahwah, NJ: Erlbaum, 37–98.
Shaughnessy, M. (1992). Research in probability and statistics: Reflections and directions. In A. Grouws
(Ed.), Handbook of research on mathematics teaching and learning. New York: Macmillan, 465–494.
Sloman, S. A. (2002). Two systems of reasoning. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.),
Heuristics and biases: The psychology of intuitive judgment. New York: Cambridge University Press,
379–396.
Stanovich, K. E., & West, R. F. (2002). Individual differences in reasoning. In T. Gilovich, D. Griffin, &
D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment. New York:
Cambridge University Press, 421–440.
Thompson, P. W. (1985). Experience, problem solving, and learning mathematics: Considerations in
developing mathematics curricula. In E. A. Silver (Ed.), Teaching and learning mathematical
problem solving: Multiple research perspectives. Hillsdale, NJ: Erlbaum, 189–243.
Wason, P. C. (1977). On the failure to eliminate hypotheses—a second look. In P. N. Johnson-Laird & P.
C. Wason (eds.), Thinking. Cambridge, UK: Cambridge University Press, 89–97.
Wason, P. C., & Johnson-Laird, P. N. (1972). Psychology of reasoning: Structure and content.
Cambridge, MA: Harvard University Press.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–265.
Chapter 5
MODELS OF DEVELOPMENT IN STATISTICAL
REASONING
Graham A. Jones1, Cynthia W. Langrall2, Edward S. Mooney2,
and Carol A. Thornton2
Griffith University, Gold Coast Campus, Australia1,and Illinois State University, USA 2
OVERVIEW
In recent years, key reform groups in school mathematics (e.g., Australian Education
Council [AEC], 1994; National Council of Teachers of Mathematics [NCTM], 1989,
2000; Department of Education and Science and the Welsh Office [DFE], 1995)
have focused on the importance of students’ thinking and reasoning in all areas of
the mathematics curriculum including statistics. Consistent with this perspective, our
chapter examines cognitive models of development in statistical reasoning and the
role they can play in statistical education. The cognitive models we will describe and
analyze examine statistical reasoning processes like decision making, prediction,
inference, and explication as they are applied to the exploration of both univariate
and multivariate data.
As a preface to our analysis of models of development in statistical reasoning we
consider models of development from a psychological perspective and then look at
how models of statistical reasoning have evolved historically from models of
development in probability. Our survey of the research literature begins with
comprehensive models of cognitive development that deal with multiple processes in
statistical reasoning and suggest that school students’ statistical reasoning passes
through a number of hierarchical levels and cycles. Subsequently, the chapter
focuses on models of cognitive development that characterize students’ statistical
reasoning as they deal with specific areas of statistics and data exploration: data
modeling, measures of center and variation, group differences, bivariate
relationships, sampling, and sampling distributions.
The models of development in statistical reasoning documented in this chapter
have been formulated through structured interviews, clinical studies, and teaching
97
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 97–117.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
98
GRAHAM A. JONES ET AL.
experiments. Research studies involving teaching experiments are especially
powerful because they enable researchers and teachers to trace students’ individual
and collective development in statistical reasoning during instruction. Because the
cognitive models provide a coherent picture of students’ statistical reasoning, they
have implications for curriculum design, instruction, and assessment. We will
discuss these implications, particularly those relating to the role that models of
statistical reasoning can play in providing a knowledge base for teachers in designing
and implementing instruction.
THE MEANING OF MODELS OF DEVELOPMENT IN STATISTICAL
REASONING
The psychology of cognitive development has focused on understanding the
structure and dynamics of change in people’s understanding of mathematics and
other domains since the time of Piaget (1954, 1962). This strong psychological focus
on the dynamics of change in people’s understanding of the world has been
accompanied by controversial debate on the issue of whether children’s intellectual
growth passes through a sequence of stages. More specifically, there has always been
tension in Piagetian theory between its constructivist framework and its structuralist
stage model. On the one hand, constructivism characterizes the acquisition of
knowledge as a product of the child’s creative self-organizing activity in particular
environments. In other words, Piaget’s perspective on constructivism affords some
recognition of the presence of environment and of educational intervention. On the
other hand, the stage model depicts knowledge in terms of biologically driven
universal structures that are independent of specific contexts or are context neutral.
That is, environment and educational intervention seemingly have no role in the
evolving cognitive developmental stages.
Subsequent research by neo-Piagetian cognitive development theorists (Bidell &
Fischer, 1992; Biggs & Collis, 1982, 1991; Case, 1985; Case & Okamoto, 1996;
Fischer, 1980) has strengthened the place of stage-theory models but has also
resulted in the replacement of Piaget’s universal stage model with domain-specific
theories. According to domain-specific theories, knowledge is not organized in
unitary structures that cut across all kinds of tasks and situations; rather, knowledge
is organized within specific domains defined by particular content or tasks such as
those involved in data exploration and statistical reasoning. Moreover, contemporary
neo-Piagetian theories connect rather than separate organism and environment. For
example, the research studies of Biggs and Collis and those of Case have examined
the process of cognitive development as it occurred in everyday environments
including school settings.
The discussion of cognitive models of development in this chapter recognizes
that contemporary models of cognitive development deal with domain-specific
knowledge such as statistical reasoning and are essentially seamless with respect to
organism and environment. Hence our use of the term cognitive models of
MODELS OF DEVELOPMENT
99
development will incorporate both organism and environmental effects; or as Reber
(1995) states, “maturational and interactionist effects” (p. 749). For us, the term
cognitive model of development in statistical reasoning refers to a theory suggesting
different levels or patterns of growth in statistical reasoning that result from
maturational or interactionist effects in both structured and unstructured learning
environments.
AN INFLUENTIAL GENERAL MODEL OF COGNITIVE DEVELOPMENT
In the previous section we referred to several neo-Piagetian models that focus on
the development of domain-specific knowledge, including various aspects of
mathematical knowledge. For example, models like Biggs & Collis (1982, 1991),
Case (1985), Case & Okamoto, (1996), and Fischer (1980) have been consistently
used as the research base for studying students’ mathematical thinking and reasoning
in number, number operations, geometry, and probability. In this section we examine
the Biggs and Collis model in more detail because it has been widely used in
developing cognitive models of development in students’ statistical reasoning (e.g.,
Chance, delMas, & Garfield, Chapter 13 in this text; see also Jones et al., 2000;
Mooney, 2002; Watson, Collis, Callingham, & Moritz, 1995).
The Biggs and Collis model has been an evolutionary one beginning with the
structure of observed learning outcomes (SOLO) taxonomy (Biggs & Collis, 1982).
The SOLO taxonomy postulated the existence of five modes of functioning
(sensorimotor—from birth, ikonic—from around 18 months, concrete-symbolic—
from around 6 years, formal—from around 14 years, and postformal—from around
20 years) and five cognitive levels (prestructural, unistructural, multistructural,
relational, and extended abstract) that recycle during each mode and represent shifts
in complexity of students’ reasoning. Later extensions to the SOLO model (Biggs &
Collis, 1989, 1991; Collis & Biggs, 1991; Pegg & Davey, 1998) acknowledged the
existence and importance of multimodal functioning in many types of learning. That
is, rather than earlier-developed modes being subsumed by later modes, development
in earlier modes actually supports development in later modes. In fact, growth in
later modes is often linked with actions or thinking associated with the earlier ones.
As the models of statistical reasoning discussed later in this chapter cover students
from elementary through college, we will be interested in all modes of functioning
and interactions between these modes.
As noted earlier, this multimodal functioning also incorporates, within each
mode, a cycle of learning that has five hierarchical levels (Biggs & Collis, 1982,
1989, 1991; Biggs, 1992; Watson, Collis, & Callingham et al., 1995). At the
prestructural (P) level, students engage a task but are distracted or misled by an
irrelevant aspect belonging to an earlier mode. For the unistructural (U) level, the
student focuses on the relevant domain and picks up on one aspect of the task. At the
multistructural (M) level, the student picks up on several disjoint and relevant
aspects of a task but does not integrate them. In the relational (R) level, the student
100
GRAHAM A. JONES ET AL.
integrates the various aspects and produces a more coherent understanding of the
task. Finally, at the extended abstract (EA) level, the student generalizes the structure
to take in new and more abstract features that represent thinking in a higher mode of
functioning. Within any mode of operation, the middle three levels are most
important because, as Biggs and Collis note, prestructural responses belong in the
previous mode and extended abstract responses belong in the next.
The levels of the Biggs and Collis learning cycle have provided a powerful
theoretical base for situating research on students’ statistical reasoning from the
elementary school years through college (Chapter 13; Jones et al., 2000; Mooney,
2002; Watson, Collis, & Callingham et al., 1995). Even though Biggs and Collis
highlight the importance of the three middle levels, some researchers have developed
characterizations of students’ statistical reasoning that are consistent with the first
four levels (Jones et al., 2000, Mooney, 2002) while others have characterized
students’ statistical reasoning according to all five levels (Chapter 13). These studies
also reveal that statistical reasoning operates across different modes in accord with
the multimodal functioning of the Biggs and Collis model; this is especially
noteworthy in relation to the modal shifts associated with the ikonic and concretesymbolic modes.
Recent studies in mathematics, science, and statistical reasoning have identified
the existence of two U-M-R cycles operating within the concrete-symbolic mode
(Callingham, 1994; Campbell, Watson, & Collis, 1992; Levins & Pegg, 1993; Pegg,
1992; Pegg & Davey, 1998; Watson, Collis, & Campbell, 1995; Watson, Collis, &
Callingham et al., 1995). More specifically, these researchers have identified two
cycles when students engage in reasoning about fractions, volume measurement, and
higher order statistical thinking. The first of these cycles is associated with the
development of a concept and the second with the consolidation and application of
the concept (Watson, Collis, Callingham et al., p. 250).
At opportune times in later sections of this chapter, we refer to the Biggs and
Collis model in considering various models of development in statistical reasoning.
Other authors in this book (e.g., Reading & Shaughnessy, Chapter 9; Watson,
Chapter 12) will also elaborate on how their research has been situated in the work
of Biggs and Collis.
A HISTORICAL PERSPECTIVE ON MODELS OF DEVELOPMENT IN
STOCHASTICS
Cognitive models of development have frequented the literature on stochastics (a
term commonly used in Europe when referring to both probability and statistics
[Shaughnessy, 1992]) from the time of Piaget and Inhelder’s (1951/1975) seminal
work on probability. As their clinical studies demonstrated, probability concepts are
acquired in stages that are in accord with Piaget’s more general theory of cognitive
development. Since the Piaget and Inhelder studies, there has been a strong focus on
cognitive models in stochastics, most of them focused on probabilistic rather than
MODELS OF DEVELOPMENT
101
statistical reasoning (Fischbein, 1975; Fischbein & Gazit, 1984; Fischbein &
Schnarch, 1997; Green, 1979, 1983; Jones, Langrall, Thornton, & Mogill, 1997;
Olecka, 1983; Polaki, Lefoka, & Jones, 2000; Tarr & Jones, 1997; Watson, Collis,
& Moritz, 1997, Watson & Moritz, 1998). Some of these models on probabilistic
reasoning have been situated in neo-Piagetian theories such as those of Biggs and
Collis (e.g., Jones, Langrall, Thornton, & Mogill; Watson, Collis, & Moritz; Watson
& Moritz) and Case (e.g., Polaki, Lefoka, & Jones). Scholz (1991) presented a
review of psychological research on probability that included developmental models
like those of Piaget and Fischbein. He also described his own information-processing
model of probabilistic thinking that was predicated on giving students time to solve
and reflect on probability tasks. Scholz’s emphasis on reflection rather than on
intuitive probabilistic reasoning seems to have influenced research on probabilistic
reasoning in the latter part of the 1990s, and it may well have influenced the research
on statistical reasoning that we discuss later in this chapter.
One cognitive development model (Shaughnessy, 1992) described stochastic
conceptions in a way that has relevance for both statistical and probabilistic
reasoning. Shaughnessy’s broad characterization identified four types of
conceptions: non-statistical (responses are based on beliefs, deterministic models, or
single-outcome expectations); naïve-statistical (nonnormative responses based on
judgmental heuristics or experience that shows little understanding of probability);
emergent-statistical (responses are based on normative mathematical models and
show evidence that the respondent is able to distinguish between intuition and a
model of chance); and pragmatic-statistical (responses reveal an in-depth
understanding of mathematical models and an ability to compare and contrast
different models of chance). Shaughnessy did not claim that these four conceptions
are linearly ordered or mutually exclusive; however, he did see the third and fourth
conceptions resulting from instructional invention, and he noted that few people
reach the pragmatic-statistical stage.
The research on cognitive models in probabilistic reasoning was undoubtedly the
forerunner to research on models of development in statistical reasoning. However,
research endeavors in statistical reasoning have also been stimulated by instructional
models postulating that teachers can facilitate mathematical thinking and learning by
using research-based knowledge of how students think and learn mathematics
(Carpenter, Fennema, Peterson, Chiang, & Loef, 1989). Such instructional models
have led researchers like Cobb et al. (1991) and Resnick (1983) to advocate the need
for detailed cognitive models of students’ reasoning to guide the planning and
development of mathematics instruction. According to Cobb and Resnick, such
cognitive models should incorporate key elements of a content domain and the
processes by which students grow in their understanding of the content within that
domain. Hence, in the case of statistical reasoning, it appears that we should be
focusing on cognitive models that incorporate processes like decision making,
prediction, and inference as they occur when students collect and explore data and
begin to deal with the existence of variation, data reduction through summaries and
displays, population parameters by considering samples, the logic of sampling
102
GRAHAM A. JONES ET AL.
processes, estimation and control of errors, and causal factors (Gal & Garfield,
1997).
COMPREHENSIVE MODELS OF DEVELOPMENT IN STATISTICAL
REASONING
Several researchers have formulated models of cognitive development that
incorporate multiple statistical processes (Jones et al., 2000; Mooney, 2002, Watson,
Collis, Callingham, & Moritz, 1995). Jones et al. (2000) and Mooney (2002)
characterize elementary and middle school students’ statistical reasoning according
to four processes: describing data, organizing and reducing data, representing data,
and analyzing and interpreting data. Watson, Collis, & Callingham et al. (1995)
characterize middle school students’ higher order statistical reasoning as they engage
in a data-card task that incorporated processes like organizing data, seeking
relationships and associations, and making inferences.
Jones et al. and Mooney Models
The related research programs of Jones et al. (2000, 2001) and Mooney (2002)
have produced domain-specific frameworks characterizing the development of
elementary and middle school students’ statistical reasoning from a more
comprehensive perspective. These researchers’ frameworks are grounded in a
twofold theoretical view. First, it is recognized that for students to exhibit statistical
reasoning, they need to understand data-handling concepts that are multifaceted and
develop over time. Second, in accord with the general developmental model of Biggs
and Collis (1991), it is assumed that students’ reasoning can be characterized as
developing across levels that reflect shifts in the complexity of their reasoning. From
this theoretical perspective, Jones et al. and Mooney describe students’ statistical
reasoning with respect to the four statistical processes listed earlier. They assert that
for each of these four processes, students’ reasoning can be characterized as
developing across four levels of reasoning referred to as idiosyncratic, transitional,
quantitative, and analytical.
The four key statistical processes described in the Jones et al. (2000, 2001) and
Mooney (2002) frameworks coincide with elements of data handling identified by
Shaughnessy, Garfield, and Greer (1996) and reflect critical areas of research on
students’ statistical reasoning. These four processes are described as follows.
Describing Data
This process involves the explicit reading of raw data or data presented in tables,
charts, or graphical representations. Curcio (1987) considers “reading the data” as
the initial stage of interpreting and analyzing data. The ability to read data displays
MODELS OF DEVELOPMENT
103
becomes the basis for students to begin making predictions and discovering trends.
Two subprocesses relate to describing data: (a) showing awareness of display
features and (b) identifying units of data values.
Organizing Data
This process involves arranging, categorizing, or consolidating data into a
summary form. As with the ability to describe data displays, the ability to organize
data is vital for learning how to analyze and interpret data. Arranging data in clusters
or groups can illuminate patterns or trends in the data. Measures of center and
dispersion are useful in making comparisons between sets of data. Three
subprocesses pertain to organizing data: (a) grouping data, (b) summarizing data in
terms of center, and (c) describing the spread of data.
Representing Data
This process involves displaying data in a graphical form. Friel, Curcio, and
Bright (2001) stated that the graphical sense involved in representing data “includes
a consideration of what is involved in constructing graphs as tools for structuring
data and, more important, what is the optimal choice for a graph in a given situation”
(p. 145). Representing data, like the previous two processes, is important in
analyzing and interpreting data. The type of display used and how the data are
represented will determine the trends and predictions that can be made. Also,
different data displays can communicate different ideas about the same data. Two
subprocesses underlie representing data: (a) completing or constructing a data
display for a given data set and (b) evaluating the effectiveness of data displays in
representing data.
Analyzing and Interpreting Data
This process constitutes the core of statistical reasoning. It involves recognizing
patterns and trends in the data and making inferences and predictions from data. It
incorporates two subprocesses that Curcio (1987) refers to using the following
descriptors: (a) reading between the data and (b) reading beyond the data. The
former involves using mathematical operations to combine, integrate, and compare
data (interpolative reasoning); the latter requires students to make inferences and
predictions from the data by tapping their existing schema for information that is not
explicitly stated in the data (extrapolative reasoning). Some examples of tasks that
relate to reading between and beyond the data are presented in the next few pages
when we examine the elementary and middle school statistical reasoning
frameworks.
With regard to levels of statistical reasoning, the Jones et al. (2000, 2001) and
Mooney (2002) statistical reasoning frameworks characterize students’ reasoning
across four levels: idiosyncratic, transitional, quantitative, analytical. At the
104
GRAHAM A. JONES ET AL.
idiosyncratic level, students’ reasoning is narrowly and consistently bound to
idiosyncratic or subjective reasoning that is unrelated to the given data and often
focused on personal experiences or subjective beliefs. This level corresponds to the
prestructural level described by Biggs and Collis (1991). Students reasoning at this
level may be distracted or misled by irrelevant aspects of a problem situation. At the
transitional level students begin to recognize the importance of reasoning
quantitatively, but are inconsistent in their use of such reasoning. Students reasoning
at this level engage a task in a relevant way but generally focus on only one aspect of
the problem situation. In the Biggs and Collis model, this is the unistructural level.
At the quantitative level, students’ reasoning is consistently quantitative in that they
can identify the mathematical ideas of the problem situation and are not distracted or
misled by the irrelevant aspects. However, students who reason at this level do not
necessarily integrate these relevant mathematical ideas when engaged in the task.
Biggs and Collis consider this the multistructural level. At the analytical level,
students’ reasoning is based on making connections between the multiple aspects of
a problem situation. Their reasoning at this level can integrate the relevant aspects of
a task into a meaningful structure (e.g., creating multiple data displays, or making a
reasonable prediction); this is what Biggs and Collis refer to as the relational level.
The Jones et al. (2000) framework characterizes the development of elementary
school children’s statistical reasoning across the four levels just described. For each
of the four statistical processes, their framework provides specific descriptors of
children’s reasoning at each level. In Figure 1, we have shown that part of the Jones
et al. framework that pertains to analyzing and interpreting data. There are four
descriptors, relating to each of the four levels, for the two subprocesses reading
between the data and reading beyond the data. For reading between the data, a
relevant task is to compare the number of students who attended a butterfly garden
display before 1 p.m. with those who attended after 1 p.m., when we know each
student’s name and the time she attended. In the case of reading beyond the data, a
relevant task is to predict the number of friends who would visit a boy named Sam in
a month, when the students are given data on the number of friends who visited Sam
each day of one week.
Mooney’s framework (Mooney, 2002; Mooney, Langrall, Hofbauer, & Johnson,
2001) characterizes the development of middle school students’ statistical reasoning
across the same four levels and processes as described in the Jones et al. framework.
The part of Mooney’s framework that pertains to analyzing and interpreting data is
presented in Figure 2. There are descriptors pertaining to the two subprocesses
reading between and beyond the data as well an additional subprocess involving the
use of relative and proportional reasoning. For reading between the data, a relevant
task is to compare the number of medals won by five countries when given data on
the number of gold, silver, and bronze medals won by each country. A reading
beyond the data task is to ask students to compare the concert tours of several groups
when given the number of cities where they performed, number of shows performed,
and total concert earnings (see Figure 3). This latter inferential task requires
proportional reasoning.
MODELS OF DEVELOPMENT
Process
Level 1
Level 2
Idiosyncratic
Transitional
Reading Between the Data
Analyzing & Interpreting Data
Gives an
idiosyncratic or
invalid response
when asked to
make comparisons.
Makes some
comparisons
between single data
values, but does not
look at global
trends.
Level 3
Quantitative
105
Level 4
Analytical
Makes local or
global
comparisons, but
does not link
comparisons.
Makes both local
and global
comparisons and
relates comparisons
to each other.
Uses the data in a
consistent way to
engage in sensemaking
predictions.
Uses both the data
and the context to
make complete and
consistent
predictions.
Reading Beyond the Data
Gives an
idiosyncratic or
invalid response
when asked to
make predictions.
Gives vague or
inconsistent
predictions that are
not well linked to
the data.
Figure 1. Elementary framework descriptors for analyzing and interpreting data.
Process
Level 1
Level 2
Idiosyncratic
Transitional
Reading Between the Data
Analyzing & Interpreting Data
Makes incorrect
comparisons within
and between data
sets.
Makes a single
correct comparison
or a set of partially
correct
comparisons within
or between data
sets.
Level 3
Quantitative
Level 4
Analytical
Makes local or
global comparisons
within and between
data sets.
Makes local and
global comparisons
within and between
data sets.
Makes inferences
primarily based on
the data. Some
inferences may be
only partially
reasonable.
Makes reasonable
inferences based on
data and the
context.
Reading Beyond the Data
Makes inferences
that are not based
on the data or
inferences based on
irrelevant issues.
Makes inferences
that are partially
based on the data.
Some inferences
may be only
partially
reasonable.
Using Proportional Reasoning Where Necessary
Does not use
relative thinking.
Uses relative
thinking
qualitatively.
Uses relative and
proportional
reasoning in an
incomplete or
invalid manner.
Uses relative and
proportional
reasoning.
Figure 2. Middle school framework descriptors for analyzing and interpreting data.
To illustrate these descriptors of students’ statistical reasoning and to contrast the
statistical reasoning of elementary students with middle school students, we look at
student responses to the Best Concert Tour problem—a task that required students to
analyze and interpret data. The task is presented in Figure 3; typical responses for
106
GRAHAM A. JONES ET AL.
elementary and middle school students at each of the four levels of the respective
frameworks are presented in Table 1.
Task: Here are three graphs showing information on concert tours for Barbra
Streisand, the Rolling Stones, Boyz II Men, and the Eagles. Who had the most
successful concert tour? Justify your decision.
Total Concert Earnings
$140
$120
Millions of Dollars
$100
$80
$60
$40
$20
$0
Barbra Streisand
Boyz II Men
Eagles
Rolling Stones
Performers
Number of Shows Performed
160
140
Number of Shows
120
100
80
60
40
20
0
Barbra Streisand
Boyz II Men
Eagles
Rolling Stones
Performers
Number of Cities Shows Were Performed
140
120
Number of Cities
100
80
60
40
20
0
Barbra Streisand
Boyz II Men
Eagles
Rolling Stones
Performers
Figure 3. Best concert tour problem.
MODELS OF DEVELOPMENT
107
Table 1. Typical student responses at each level of reasoning on the best concert tour task
Level
Idiosyncratic
Elementary Responses
Boyz II Men, I went to
one of their concerts
Transitional
Boyz II Men, the bars are
tall.
Quantitative
I looked at each of the
graphs and picked this
one [the total concert
earnings graph] and
decided that the Rolling
Stones are best because
they got more money.
Boyz II Men performed a
lot of shows but they
didn’t make much
money. The Rolling
Stones made a lot of
money but didn’t
perform as many shows.
I’d go with the Rolling
Stones.
Analytical
Middle School Responses
If you took these bars [for each performer]
and put them on top of each other and you
stacked them all up, Boyz II Men would be
the tallest and most successful.
The Rolling Stones performed three times as
many shows as Barbara Streisand but only
make twice as much money as she did. I
think she did better.
For Barbara Streisand it was 60 [total
concert earnings] to 20 [number of shows]
or 3 to 1. I don’t need to look at Boyz II
Men. The Eagles is about 2 to 1. For the
Rolling Stones it is exactly 2 to 1. That
makes Barbara Streisand the best.
I calculated the earnings per show for each
of the performers. Streisand is about 2.8
million dollars per show. Boyz II Men is
about 0.3 million, the Eagles are about 1.45
million, and the Rolling Stones are about 2
million. I’d go with Barbara Streisand but
there are some other things you would want
to know, like how many people are in the
band and the size of the audience.
At the idiosyncratic level, elementary students tend to base their reasoning on
their own data sets (I went to one of their concerts), while middle school students
often use the given data but in an inappropriate way (combine all the bars).
Elementary and middle school students who exhibit transitional reasoning tend to
focus on one aspect of the data, for example, the height of the bars in the case of the
elementary student and ratios that are not fully connected in the case of the middle
school student. The middle school student applies more sophisticated mathematical
ideas than the elementary student, but neither student provides a complete
justification. At the quantitative level, both elementary and middle school students
make multiple quantitative comparisons but have difficulty linking their ideas. For
example, the elementary student compares the data in the three graphs and then
makes a local comparison within the “best” data set (total concert earnings); the
middle school student makes multiple comparisons based on total earnings versus
number of shows, but does not actually link the ratios to the context. The main
difference between the elementary and middle school students’ responses at this
level is that the middle school student has access to proportional reasoning. Students
who exhibit analytical reasoning use local and global comparisons of data and
knowledge of the context to make valid inferences. For example, both the elementary
and the middle school students recognize the need to relate money earned with
108
GRAHAM A. JONES ET AL.
number of shows performed; the main difference is that the middle school student
actually determines and compares appropriate rates derived from the context. In fact,
the middle school student even raises some additional factors that may act as
limitations to the solution presented.
The differences between the responses of typical elementary and middle school
students, at the four levels of the frameworks, can be related to the SOLO model
(Biggs & Collis, 1991). These differences seem to reflect statistical reasoning that is
associated with two different cycles in the concrete-symbolic mode (see Pegg &
Davey, 1998; Watson, Collis, & Callingham et al., 1995). In essence, the cycle
associated with the elementary students’ statistical reasoning deals with the
conceptual development of statistical concepts while the second cycle, demonstrated
in the reasoning of the middle school students, deals with the application of
statistical and mathematical concepts and procedures that have already been learned.
Watson and her colleagues examine statistical reasoning associated with two
developmental cycles in more detail in the next comprehensive model.
Watson et al. Model
Watson, Collis, Callingham, & Moritz (1995) used the Biggs and Collis (1991)
cognitive development model to characterize middle school students’ higher order
statistical reasoning. More specifically, these researchers hypothesized that students’
higher order statistical reasoning could be characterized according to two
hierarchical unistructural-multistructural-relational [U-M-R] cycles, the first dealing
with the development of statistical concepts and the second with the consolidation
and application of these statistical concepts.
There were two parts to the study: clinical interviews with six 6th-grade students
and one 9th-grade student and three instructional sessions with two 6th-grade classes
working largely in groups. An interview protocol based on a set of 16 data cards
containing information like student’s name, age, favorite activity, eye color, weight,
and number of fast-food meals per week was developed by the authors for use in
both parts of the study.
In the clinical interview, students were asked to think of some interesting
questions that could be answered using the cards; they were further prompted to
imagine they were doing a school project with the cards. Following the analysis of
the interview data, the researchers adapted the data-card task for use in the
instructional setting. In the first class session, the students were introduced to ideas
about looking for statistical associations and were then given a project that asked
them to look for interesting questions and connections in the data. During the second
session, the students were introduced to methods of displaying data (e.g., graphs)
using examples that were unrelated to the data cards. The students continued
working on their projects during the rest of the session and for part of the third
session. They then presented their projects in the form of reports and posters.
The findings from this study demonstrated that the statistical reasoning of all 7
students in the interviews could be characterized according to the first U1-M1-R1
MODELS OF DEVELOPMENT
109
cycle: students at the U1 level focused on individual data with imaginative
speculation on what caused certain data values; students at the M1 level sorted the
cards into different groups, focused on one variable at a time, and described that
variable; students at the R1 level sorted the cards into different groups, focused on
more than one variable at a time, and appreciated the need to relate variables. Three
students were classified as reasoning at U1, 3 students at M1, and 1 student at R1. By
contrast, during the instructional program, two U-M-R cycles were needed to
characterize students’ statistical reasoning. Moreover, all of the group or individual
projects were classified beyond U1. The characterizations of the second U2-M2-R2
cycle moved into reasoning that involved justification and application: students at
the U2 level recognized the need to justify conjectured associations but did not
proceed beyond that; students at the M2 level used tables or graphs to support claims
of association or cause, and students at the R2 level used statistics such as the mean
to support claims of association. Watson, Collis, & Callingham et al. (1995) also
noted some evidence of multimodal functioning with ikonic intuitions and
perceptions supporting students’ reasoning and decision making in the concretesymbolic mode. Both the learning cycle model and multimodal functioning have
implications for informing instruction and enhancing teachers’ knowledge of how
students might respond to contextual data exploration tasks.
From a research perspective, it is interesting that Watson, Collis, & Callingham
et al. (1995) uncovered two learning cycles in building models of higher order
statistical reasoning; whereas Jones et al. (2000), working with elementary students,
and Mooney (2002), working with middle school students, each found that one
learning cycle was sufficient to characterize students’ statistical reasoning. On the
one hand, this difference may result from Watson and her colleagues’ intimate
knowledge of the Biggs and Collis model and their caveat that the additional cycles
appear “when student understanding is viewed in considerable detail” (p. 250). On
the other hand, it is possible that the two learning cycles identified by Jones et al.
and Mooney represented two different cycles within the concrete symbolic mode—
the first focusing on conceptual development of statistical concepts and the second
incorporating applications of statistical concepts. Notwithstanding these possible
rationalizations, there is clearly a need for researchers involved in formulating
models of development in statistical reasoning to be aware of emerging research that
suggests the existence of multiple learning cycles within a mode of operation like
concrete-symbolic or formal (Callingham, 1994; Campbell, Watson, & Collis, 1992;
Pegg & Davey, 1998; Watson, Collis, & Campbell, 1995).
COGNITIVE MODELS OF DEVELOPMENT FOR SPECIFIC STATISTICAL
CONCEPTS AND PROCESSES
In this section we survey the research literature focusing on models of cognitive
development that relate to specific statistical concepts. In particular, we focus on the
following key concepts and processes: data modeling, measures of center and
110
GRAHAM A. JONES ET AL.
variation, group differences, covariation and association, and sampling and sampling
distributions. In examining these models we do not claim to have exhausted all
models of development in the field; rather, our review presages the concepts and
processes that are considered in more detail in the following chapters of this book.
Data Modeling
Many researchers have examined patterns of growth in statistical reasoning when
students have been engaged in data-modeling problems or model-eliciting problems
that involve data (Ben-Zvi, Chapter 6; Ben-Zvi & Arcavi, 2001; Doerr, 1998; Doerr
& Tripp, 1999; Lehrer & Romberg, 1996; Lehrer & Schauble, 2000; Lesh, Amit, &
Schorr, 1997; Wares, 2001). Because of their inherent nature, data-modeling
problems provide a distinctive context for observing students’ statistical reasoning in
open-ended situations. Modeling problems focus on organizing and representing
data, pattern building, and seeking relationships (Lesh & Doerr, 2002), and they
involve students in statistical reasoning such as decision making, inference, and
prediction. Moreover, data-modeling problems often reveal students’ innermost
conceptual ideas about statistical reasoning—especially fundamental processes like
dealing with variation, transforming data, evaluating statistical models, and
integrating contextual and statistical features of the problem (Wild & Pfannkuch,
1999).
Measures of Center and Variation
Most of the research pertaining to measures of center has focused on the
concepts of average, representativeness, or mean. Several studies have described
students’ varying conceptions of measures of center (Bright & Friel, 1998; Konold
& Pollatsek, 2002; Konold & Pollatsek, Chapter 8; Morkros & Russell, 1995;
Strauss & Bichler, 1988) but have not necessarily traced the development of
students’ understandings. Two studies that have addressed developmental aspects of
students’ reasoning with measures of center are the work of Reading and Pegg
(1996) and Watson and Moritz (2000a). The few studies that have addressed the
concept of variation or spread have examined the development of students’
reasoning about variation (Shaughnessy, Watson, Moritz, & Reading, 1999; Reading
& Shaughnessy, 2001; Reading & Shaughnessy, Chapter 9; Torok & Watson, 2000).
Comparing Two Data Sets
Making statistical inferences is a key aspect of statistical reasoning, and the
importance of statistical inference is acknowledged in several curriculum documents
(AEC, 1991; NCTM, 2000; DFE, 1995). One way that students can be introduced to
statistical inference is by having them compare two or more sets of numerical data in
contexts where the number in each set may be equal or unequal. Various researchers
MODELS OF DEVELOPMENT
111
(Cobb, 1999; McClain & Cobb, 1998; Mooney, 2002; Watson & Moritz, 1999) have
produced models of development that characterize students’ reasoning as they make
statistical inferences involving the comparison of two data sets.
Bivariate Relationships
The study of correlation (association) and regression is important in statistics
because these processes are used to identify statistical relationships between two or
more variables and, where appropriate, to seek causal explanations. Accordingly, an
understanding of association and regression has become important in the school
mathematics curriculum (e.g., AEC, 1994; NCTM, 1989, 2000); thus, some
researchers have examined the development of students’ conceptions in relation to
association and regression (Batanero, Estepa, Godino, & Green, 1996; Ross &
Cousins, 1993; Wavering, 1989; Mevarech & Kramarsky, 1997). These studies have
foreshadowed the more definitive cognitive models of Moritz and Watson (2000),
Moritz (2001), and Mooney (2002).
Sampling and Sampling Distributions
The notion of sample is one of the most fundamental ideas in statistics, since
samples enable us to gain information about the whole by examining the part
(Moore, 1997). More specifically, sampling is used to make inferences about
populations, that is, to predict population parameters from sample statistics.
Processes like inference and prediction are grounded in the concept of sampling
distributions, which is a complex idea for students to grasp. Research in this area has
examined the development of students’ statistical reasoning, not only in relation to
the concept of sample, sample size, and sampling procedures (Watson, Chapter 12;
Watson & Moritz, 2000b) but also in relation to more sophisticated ideas like
sampling distributions (Chapter 13; Saldanha & Thompson, 2001) and the Central
Limit Theorem (Chapter 13; delMas, Garfield, & Chance, 1999).
IMPLICATIONS FOR STATISTICAL EDUCATION
In statistical education, as in mathematics education, there is a continuing drive
toward research that makes connections between the learning process and the
teaching process. This has been brought into even sharper focus with the advent of
constructivist approaches to learning and the need for pedagogies that facilitate
students’ mathematical constructions. The importance of this connection between
teaching and learning is evident across the international scene; curriculum
documents (AEC, 1991; NCTM, 1989, 2000; DFE, 1995) espouse reforms in
mathematics education that encourage teachers to focus on “understanding what
students know and need to know” and advocate that learners should “learn
112
GRAHAM A. JONES ET AL.
mathematics with understanding, actively building new knowledge from experience
and prior knowledge” (NCTM, 2000, p. 11).
Due to this increased emphasis on teaching and learning and the need to have
students actively building mathematical and statistical knowledge, powerful new
instructional models have emerged during the last 15 years: Realistic Mathematics
Education (RME; Gravemeijer, 1994), Cognitively Guided Instruction (CGI;
Carpenter et al., 1989), and the Mathematics Teaching Cycle (MTC; Simon, 1995).
Although these instructional models have many differences, they share the common
perspective that students’ learning is not only central to the instructional process; it
must drive the instructional process. For example, RME evolved in order to create a
shift from a mechanistic orientation to teaching and learning to an approach that
emphasized student learning through reconstructive activity grounded in reality and
sociocultural contexts (Streefland, 1991); CGI has as its major tenet the need to use
research-based knowledge of students’ reasoning to inform instruction; and MTC
stresses “the reflexive relationship between the teacher’s design of activities and
consideration of the reasoning that students might engage in as they participate in
those activities” (Simon, p. 133). All of these instructional theories highlight the
need for teachers to understand and use the reasoning that students bring to
mathematics classes.
Given these directions in teaching and learning, models of development in
statistical reasoning have a key role in statistical instruction. Because these models
incorporate domain-specific knowledge of students’ statistical reasoning across key
statistical concepts and processes, they arm teachers with the kind of knowledge that
can be used in the design, implementation, and assessment of instruction in statistics
and data exploration.
With respect to the design of instruction, cognitive models of development
provide a coherent picture of the diverse range of statistical reasoning that a teacher
might expect students to bring to the classroom. The use of cognitive models in
designing instruction can be amplified by examining Simon’s (1995) notion of
hypothetical learning trajectory. By hypothetical learning trajectory, Simon means
the formulation of learning goals, learning activities, and a conjectured learning
process. In the first instance, many of the cognitive models discussed in this chapter
identify key processes and concept goals, by their very nature indicating where
children might be in relation to these goals. For example, the Jones et al. (2000)
model identifies key processes like describing data, organizing data, representing
data, and analyzing and interpreting data; it also documents, through the level
descriptors, the kind of goals that might be appropriate for individual children or the
class as a whole. In considering learning activities, the research on cognitive models
invariably incorporates tasks and activities that have been used to engage students’
statistical reasoning. For example, tasks like those incorporated in the technology
simulation on sampling distributions (Chapter 13) have widespread potential in
college and high school instructional settings. Finally, in relation to conjecturing the
possible direction of the classroom learning process, the cognitive model provides a
database for the teacher on the range of statistical reasoning that he or she might
expect to find during instruction. For example, in Grade 3 instruction dealing with
MODELS OF DEVELOPMENT
113
sampling, the Watson and Moritz (2000b) model suggests that all children would be
small samplers with more than 50% of them using idiosyncratic methods of selecting
samples.
With respect to the implementation of instruction, models of development can
serve as a filter for analyzing and characterizing students’ responses. In our teaching
experiments (Jones et al., 2001), we have found that filtering students’ responses
using a model of development helps teachers to build a much richer knowledge base
than they would without such a filter. In particular, a model helps teachers to frame
questions and written tasks that accommodate the diversity of reasoning reflected in
a group or class. Such accommodation and sensitivity by the teacher may enable
children to develop more mature levels of reasoning. For example, a teacher who
was aware from earlier group work that one student was reasoning about the
dimensions of the sampling process in an integrated way (Level 5; see Chapter 13)
might use that student’s response as a focal point for a formative or summative
discussion on the dimensions of the sampling. Alternatively, the teacher might use
the response of a student who was using transitional reasoning (Level 2; see Chapter
13) on the dimensions of sampling as a means of focusing on the need for
completeness and connections.
Finally, we believe that models of development in statistical reasoning can be
helpful in assessing and monitoring students’ performances over time, as well as in
evaluating the effectiveness of classroom instruction. We are not suggesting that
middle school students, for example, might move in a linear way through the four
levels of Mooney’s (2002) model of development in relation to analyzing and
interpreting data. However, we are suggesting that teachers can observe differences
in middle school students’ collective and individual statistical reasoning that are
recognizable based on the levels of the Mooney model. In a similar way, teachers
can evaluate their instruction or instructional technology using models of
development. For example, Chance et al. (Chapter 13) have used their cognitive
model to evaluate and refine the simulation technology on sampling distributions. By
assessing and observing changes in students’ reasoning according to the model, they
have identified weaknesses in the technology, have further refined and changed the
technology, and have then reassessed the students’ reasoning. This cycle of
assessment and refinement has great potential in evaluating the pedagogical
effectiveness of technology, in particular the use of microworlds.
As the research in this chapter reveals, students’ statistical reasoning from
elementary through college is diverse and often idiosyncratic. Moreover, students’
statistical reasoning is constantly changing and hence is dynamic rather than static.
Notwithstanding the diversity and dynamics of students’ statistical reasoning,
recurring patterns or levels of statistical reasoning are consistently observable when
students are involved in key statistical processes like decision making, inferring, and
predicting; and when they deal with concepts like sampling, organizing and
representing data, center and variation, and analysis and interpretation. These
recurring patterns of statistical reasoning, and the models of development that have
evolved from them, offer a powerful resource for informing instructional programs
114
GRAHAM A. JONES ET AL.
that focus on having students learn statistical reasoning by building on or
reformulating the statistical ideas they bring to the classroom.
REFERENCES
Australian Education Council. (1991). A national statement on mathematics for Australian schools.
Carlton, VIC: Curriculum Corporation.
Australian Education Council. (1994). Mathematics: A curriculum profile for Australian schools.
Carlton, VIC: Curriculum Corporation.
Batanero, C., Estepa, A., Godino, J. D., & Green, D. R. (1996). Intuitive strategies and preconceptions
about association in contingency tables. Journal for Research in Mathematics Education, 27, 151–
169.
Ben-Zvi, D., & Arcavi, A. (2001). Junior high school students’ construction of global views of data and
data representation. In J. Garfield, D. Ben-Zvi, & C. Reading (Eds.), Background Readings of the
Second International Research Forum on Statistical Reasoning, Thinking, and Literacy (pp. 73–
110). Armidale, Australia: Centre for Cognition Research in Learning and Teaching, University of
New England.
Bidell, T. R., & Fischer, K. W. (1992). Cognitive development in educational contexts: Implications of
skill theory. In A. Demetriou, M. Shayer, & A. Efklides (Eds.), Neo-Piagetian theories of cognitive
development: Implications and applications for education (pp. 11–30). London: Routledge.
Biggs, J. B. (1992). Modes of learning, forms of knowing, and ways of schooling. In A. Demetriou, M.
Shayer, & A. Efklides (Eds.), Neo-Piagetian theories of cognitive development: Implication and
applications for education (pp. 31–51). London: Routledge.
Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York:
Academic.
Biggs, J. B., & Collis, K. F. (1989). Towards a model of school-based curriculum development and
assessment using the SOLO taxonomy. Australian Journal of Education, 33, 151–163.
Biggs, J. B., & Collis, K. F. (1991). Multimodal learning and intelligent behavior. In H. Rowe (Ed.),
Intelligence: Reconceptualization and measurement (pp. 57–76). Hillsdale, NJ: Erlbaum.
Bright, G. W., & Friel, S. N. (1998, April). Interpretation of data in a bar graph by students in grades 6
and 8. Paper presented at annual meeting of the American Educational Research Association, San
Diego, CA.
Callingham, R. A. (1994). Teachers ’understanding of the arithmetic mean. Unpublished master’s
thesis. University of Tasmania, Hobart, Australia.
Campbell, K. J., Watson, J. M., & Collis, K. F. (1992). Volume measurement and intellectual
development. Journal of Structural Learning and Intelligent Systems, 11, 279–298.
Carpenter, T. P., Fennema, E., Peterson, P. L., Chiang, C. P., & Loef, M. (1989). Using knowledge of
children’s mathematical thinking in classroom teaching: An experimental study. American
Educational Research Journal, 264, 499–532.
Case, R. (1985). Intellectual development: A systematic reinterpretation. New York: Academic Press.
Case, R., & Okamoto, Y. (1996). The role of conceptual structures in the development of children’s
thought. Monographs of the Society for Research in Child Development, 61 (1–2, Serial No. 246).
Cobb, P., Wood, T., Yackel, E., Nicholls, J., Wheatley, G., Trigatti, B., & Perlwitz, M. (1991).
Assessment of a problem-centered second-grade mathematics project. Journal for Research in
Mathematics Education, 22, 3–29.
Cobb, P. (1999). Individual and collective mathematical development: The case of statistical data
analysis. Mathematical Thinking and Learning, 1, 5–43.
Collis, K. F., & Biggs, J. B. (1991). Developmental determinants of qualitative aspects of school
learning. In G. Evans (Ed.), Learning and teaching cognitive skills (pp. 185–207). Melbourne:
Australian Council for Educational Research.
Curcio, F. R. (1987). Comprehension of mathematical relationships expressed in graphs. Journal for
Research in Mathematics Education, 18, 382–393.
MODELS OF DEVELOPMENT
115
delMas, R. C., Garfield, J., & Chance, B. (1999). A model of classroom research in action: Developing
simulation activities to improve students’ statistical reasoning [Electronic version]. Journal of
Statistics Education, 7(3), 1–16.
Department of Education and Science and the Welsh Office (DFE). (1995). Mathematics in the national
curriculum. London: Author.
Doerr, H. M. (1998). A modeling approach to non-routine problem situations. In S. Berenson, K.
Dawkins, M. Blanton, W. Coulombe, J. Kolb, K. Norwood, & L. Stiff (Eds.), Proceedings of the
nineteenth annual meeting, North American Chapter of the International Group for the Psychology
of Mathematics Education (Vol. 2, pp. 441–446). Columbus, OH: ERIC Clearinghouse for Science,
Mathematics, and Environmental Education.
Doerr, H. M., & Tripp, J. S. (1999). Understanding how students develop mathematical models.
Mathematical Thinking and Learning, 1, 231–254.
Fischbein, E. (1975). The intuitive sources of probabilistic thinking in children. Dordrecht, The
Netherlands: Reidel.
Fischbein, E., & Gazit, A. (1984). Does the teaching of probability improve probabilistic intuitions?
Educational Studies in Mathematics, 15, 1–24.
Fischbein, E., & Schnarch, D. (1997). The evolution with age of probabilistic, intuitively based
misconceptions. Journal for Research in Mathematics Education, 28, 96–105.
Fischer, K. W. (1980). A theory of cognitive development: The control and construction of hierarchies of
skill. Psychological Review, 87, 477–531.
Friel, S. N., Curcio, F. R., & Bright, G. W. (2001). Making sense of graphs: Critical factors influencing
comprehension and instructional implications. Journal for Research in Mathematics Education, 31,
124–158.
Gal, I., & Garfield, J. B. (1997). Curricular goals and assessment challenges in statistics education. In I.
Gal & J. B. Garfield (Eds.), The assessment challenge in statistical education (pp. 1–15).
Amsterdam, The Netherlands: IOS Press.
Gravemeijer, K. (1994). Educational development and developmental research. Journal for Research in
Mathematics Education, 25, 443–471.
Green, D. R. (1979). The chance and probability concepts project. Teaching Statistics, 1(3), 66–71.
Green, D. R. (1983). A survey of probability concepts in 3000 pupils aged 11–16. In D. R. Grey, P.
Holmes, V. Barnett, & G. M. Constable (Eds.), Proceedings of the first international conference on
Teaching Statistics (pp. 766–783). Sheffield, UK: Teaching Statistics Trust.
Jones, G. A., Langrall, C. W., Thornton, C. A., & Mogill, A. T. (1997). A framework for assessing and
nurturing children's thinking in probability. Educational Studies in Mathematics, 32, 101–125.
Jones, G. A., Langrall, C. W., Thornton, C. A., Mooney, E. S., Wares, A., Jones, M. R., Perry, B., Putt, I.
J., & Nisbet, S. (2001). Using students’ statistical thinking to inform instruction. Journal of
Mathematical Behavior, 20, 109–144.
Jones, G. A., Thornton, C. A., Langrall, C. W., Mooney, E. S., Perry, B., & Putt, I. J. (2000). A
framework for characterizing students’ statistical thinking. Mathematics Thinking and Learning, 2,
269–307.
Konold, C., & Pollatsek, A. (2002). Data analysis as the search for signals in noisy processes. Journal for
Research in Mathematics Education, 33, 259–289.
Lehrer, R., & Romberg, T. (1996). Exploring children’s data modeling. Cognition and Instruction, 14,
69–108.
Lehrer, R. & Schauble, L. (2000). Inventing data structures for representational purposes: Elementary
grade children's classification models. Mathematical Thinking and Learning, 2, 51–74.
Lesh, R., Amit, M., & Schorr, R. Y. (1997). Using “real-life” problems to prompt students to construct
statistical models for statistical reasoning. In I. Gal & J. Garfield (Eds.), The assessment challenge in
statistics education (pp. 65–84). Amsterdam, The Netherlands: IOS Press.
Lesh, R., & Doerr, H. (2002). Foundations of a models and modeling perspective. In R. Lesh & H. Doerr
(Eds.), Beyond constructivist: A models and modeling perspective on mathematics teaching,
learning and problem solving. Mahwah, NJ: Lawrence Erlbaum Associates.
Levins, L., & Pegg, J. (1993). Students’ understanding of concepts related to plant growth. Research in
Science Education, 23, 165–173.
116
GRAHAM A. JONES ET AL.
Mevarech, Z. A., & Kramarsky, B. (1997). From verbal descriptions to graphic representations: Stability
and change in students’ alternative conceptions. Educational Studies in Mathematics, 32, 229–263.
McClain, K., & Cobb, P. (1998). Supporting students’ reasoning about data. In S. Berenson, K. Dawkins,
M. Blanton, W. Coulombe, J. Kolb, K. Norwood, & L. Stiff (Eds.), Proceedings of the nineteenth
annual meeting, North American Chapter of the International Group for the Psychology of
Mathematics Education (Vol. 1, pp. 389–394). Columbus, OH: ERIC Clearinghouse for Science,
Mathematics, and Environmental Education.
Mokros, J., & Russell, S. J. (1995). Children's concepts of average and representativeness. Journal for
Research in Mathematics Education, 26, 20–39.
Mooney, E. S. (2002). A framework for characterizing middle school students’ statistical thinking.
Mathematical Thinking and Learning, 4, 23–63.
Mooney, E. S., Langrall, C. W., Hofbauer, P. S., & Johnson, Y. A. (2001). Refining a framework on
middle school students’ statistical thinking. In R. Speiser, C. A. Maher, & C. N. Walter (Eds.),
Proceedings of the twenty-third annual meeting of the North American Chapter of the International
Group for the Psychology of Mathematics Education (Vol.1, pp. 439–447). Columbus, OH: ERIC
Clearinghouse for Science, Mathematics, and Environmental Education.
Moore, D. S. (1997). Statistics: Concepts and controversies (4th ed.). New York: Freeman.
Moritz, J. B. (2001). Graphical representations of statistical associations by upper primary students. In J.
Garfield, D. Ben-Zvi, & C. Reading (Eds.), Background Readings of the Second International
Research Forum on Statistical Reasoning, Thinking, and Literacy (pp. 256–264). Armidale,
Australia: Centre for Cognition Research in Learning and Teaching, University of New England.
Moritz, J. B., & Watson, J. (2000). Representing and questioning statistical associations. Unpublished
manuscript, University of Tasmania, Hobart, Australia.
National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school
mathematics. Reston, VA: Author.
National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics.
Reston, VA: Author.
Olecka, A. (1983). An idea of structuring probability teaching based on Dienes’ six stages. In D. R. Grey,
P. Holmes, V. Barnett, & G. M. Constable (Eds.), Proceedings of the First International Conference
on Teaching Statistics (pp. 727–737). Sheffield, UK: Teaching Statistics Trust.
Pegg, J. (1992). Assessing students’ understanding at the primary and secondary level in the
mathematical sciences. In J. Izard & M. Stephens (Eds.), Reshaping assessment practice: Assessment
in the mathematical sciences under challenge (pp. 368–365). Melbourne, VIC: Australian Council of
Educational Research.
Pegg, J., & Davey, G. (1998). Interpreting student understanding in geometry: A synthesis of two models.
In R. Lehrer & D. Chazan (Eds.), Designing learning environments for developing understanding of
geometry and space (pp. 109–135). Mahwah, NJ: Erlbaum.
Piaget, J. (1954). The construction of reality in the child. New York: Basic Books.
Piaget, J. (1962). The origins of intelligence in the child. New York: Norton.
Piaget, J., & Inhelder, B. (1975). The origin of the idea of chance in children (L. Leake, P. Burrell, & H.
D. Fischbein, Trans.). New York: Norton. (Original work published 1951.)
Polaki, M. V., Lefoka, P. J., & Jones, G. A. (2000). Developing a cognitive framework for describing and
preparing Basotho students’ probabilistic thinking. Boleswa Educational Research Journal, 17, 1–
20.
Reading, C., & Pegg, J. (1996). Exploring understanding of data reduction. In L. Puig & A. Gutierrez
(Eds.), Proceedings of the 20th conference of the International Group for the Psychology of
Mathematics Education (Vol. 4, pp. 187–194). Valencia, Spain: University of Valencia.
Reading, C., & Shaughnessy, J. M. (2001). Student perceptions of variation in a sampling situation. In J.
Garfield, D. Ben-Zvi, & C. Reading (Eds.), Background Readings of the Second International
Research Forum on Statistical Reasoning, Thinking, and Literacy, (pp. 119–126). Armidale,
Australia: Centre for Cognition Research in Learning and Teaching., University of New England
Reber, A. S. (1995). The Penguin dictionary of psychology (2nd ed.). London: Penguin.
Resnick, L. B. (1983). Developing mathematical knowledge. American Psychologist, 44, 162–169.
Ross, J. A., & Cousins, J. B. (1993). Patterns of students’ growth in reasoning about correlational
problems. Journal of Educational Psychology, 85, 49–65.
MODELS OF DEVELOPMENT
117
Saldanha, L. A., & Thompson, P. (2001). Students’ reasoning about sampling distributions and statistical
inference. In J. Garfield, D. Ben-Zvi, & C. Reading (Eds.), Background Readings of the Second
International Research Forum on Statistical Reasoning, Thinking, and Literacy (pp. 291–296).
Armidale, Australia: Centre for Cognition Research in Learning and Teaching, University of New
England.
Scholz, R. W. (1991). Psychological research on the probability concept and its acquisition. In R.
Kapadia & M. Borovcnik (Eds.), Chance encounters: Probability in education (pp. 213–254).
Dordrecht, The Netherlands: Kluwer Academic.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D. A.
Grows (Ed.), Handbook of research on mathematics teaching and learning (pp. 465–494). New
York: Macmillan.
Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data handling. In A. J. Bishop, K. Clements, C.
Keitel, J. Kilpatrick, & C. Laborde (Eds.), International handbook of mathematics education (Part 1,
pp. 205–238). Dordrecht, The Netherlands: Kluwer Academic.
Shaughnessy, J. M., Watson, J. M., Moritz, J. B., & Reading, C. (1999, April). School mathematics
students’ acknowledgment of statistical variation. Paper presented at the 77th annual conference of
the National Council of Teachers of Mathematics, San Francisco, CA.
Simon, M. A. (1995). Restructuring mathematics pedagogy from a constructivist perspective. Journal for
Research in Mathematics Education, 26, 114–145.
Strauss, S., & Bichler, E. (1988). The development of children’s concept of the arithmetic average.
Journal for Research in Mathematics Education, 19, 64–80.
Streefland, L. (1991). Fractions in Realistic Mathematics Education—A paradigm of developmental
research. Dordrecht, The Netherlands: Kluwer Academic.
Tarr, J. E., & Jones, G. A. (1997). A framework for assessing middle school students’ thinking in
conditional probability and independence. Mathematics Education Research Journal, 9, 39–59.
Torok, R., & Watson, J. (2000). Development of the concept of statistical variation: An exploratory
study. Mathematics Education Research Journal, 12, 147–169.
Wares, A. (2001). Middle school students’ construction of mathematical models. Unpublished doctoral
dissertation, Illinois State University, Normal.
Watson, J. M., Collis, K. F., Callingham, R. A., & Moritz, J. B. (1995). A model for assessing higher
order thinking in statistics. Educational Research and Evaluation, 1, 247–275.
Watson, J. M., Collis, K. F., & Campbell, K. J. (1995). Developmental structure in the understanding of
common and decimal fractions. Focus on Learning Problems in Mathematics, 17(1), 1–24.
Watson, J. M., Collis, K. F., & Moritz, J. B. (1997). The development of chance measurement.
Mathematics Education Research Journal, 9, 60–82.
Watson, J. M., & Moritz, J. B. (1998). Longitudinal development of chance measurement. Mathematics
Education Research Journal, 10, 103–127.
Watson, J. M., & Moritz, J. B. (1999). The beginning of statistical inference: Comparing two data sets.
Educational Studies in Mathematics, 37, 145–168.
Watson, J. M., & Moritz, J. B. (2000a). The longitudinal development of understanding of average.
Mathematical Thinking and Learning, 2, 11–50.
Watson, J. M., & Moritz, J. B. (2000b). Developing concepts of sampling. Journal for Research in
Mathematics Education 31, 44–70.
Wavering, J. (1989). Logical reasoning necessary to make line graphs. Journal of Research in Science
Teaching, 26, 373–379.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67, 223–265.
PART II
STUDIES OF STATISTICAL REASONING
Chapter 6
REASONING ABOUT DATA ANALYSIS
Dani Ben-Zvi
University of Haifa, Israel
OVERVIEW
The purpose of this chapter is to describe and analyze the ways in which middle
school students begin to reason about data and come to understand exploratory data
analysis (EDA). The process of developing reasoning about data while learning
skills, procedures, and concepts is described. In addition, the students are observed
as they begin to adopt and exercise some of the habits and points of view that are
associated with statistical thinking. The first case study focuses on the development
of a global view of data and data representations. The second case study
concentrates on design of a meaningful EDA learning environment that promotes
statistical reasoning about data analysis. In light of the analysis, a description of
what it may mean to learn to reason about data analysis is proposed and educational
and curricular implications are drawn.
THE NATURE OF EXPLORATORY DATA ANALYSIS
Exploratory data analysis (EDA), developed by Tukey (1977), is the discipline
of organizing, describing, representing, and analyzing data, with a heavy reliance on
visual displays and, in many cases, technology. The goal of EDA is to make sense of
data, analogous to an explorer of unknown lands (Cobb & Moore, 1997). The
original ideas of EDA have since been expanded by Mosteller and Tukey (1977) and
Velleman and Hoaglin (1981); they have become the accepted way of approaching
the analysis of data (Biehler, 1990; Moore, 1990, 1992).
According to Graham (1987), and Kader and Perry (1994), data analysis is
viewed as a four-stage process: (a) pose a question and formulate a hypothesis, (b)
collect data, (c) analyze data, and (d) interpret the results and communicate
conclusions. In reality however, statisticians do not proceed linearly in this process,
but rather iteratively, moving forward and backward, considering and selecting
possible paths (Konold & Higgins, 2003). Thus, EDA is more complex than the
four-stage process: “data analysis is like a give-and-take conversation between the
121
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 121–145.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
122
DANI BEN-ZVI
hunches researchers have about some phenomenon and what the data have to say
about those hunches. What researchers find in the data changes their initial
understanding, which changes how they look at the data, which changes their
understanding” (Konold & Higgins, 2003, p. 197).
EDA employs a variety of techniques, mostly graphical in nature, to maximize
insight into a data set. Exploring a data set includes examining shape, center, and
spread; and investigating various graphs to see if they reveal clusters of data points,
gaps, or outliers. In this way, an attempt is made to uncover underlying structure and
patterns, test underlying assumptions, and develop parsimonious models. Many
EDA graphical techniques are quite simple, such as stem-and-leaf plots and box
plots. Computers support EDA by making it possible to quickly manipulate and
display data in numerous ways, using statistical software packages such as Data
Desk (Velleman, 2003), Fathom (Finzer, 2003), and Tabletop (TERC, 2002).
However, the focus of EDA is not on a set of techniques, but on making sense of
data, how we dissect a data set; what we look for; how we look; and how we
interpret. EDA postpones the classical “statistical inference” assumptions about
what kind of model the data follow with the more direct approach of “let the
numbers speak for themselves” (Moore, 2000, p. 1), that is, allowing the data itself
to reveal its underlying structure and model.
This complete and complex picture of data analysis should be reflected in the
teaching of EDA and in the research on students’ statistical reasoning. Simplistic
views can lead to the use of recipe approaches to data analysis instruction and to
research that does not go beyond the surface understanding of statistical techniques.
EDA in School Curriculum
EDA provides a pedagogical opportunity for open-ended data exploration by
students, aided by educational technology. Allowing students to explore data is
aligned with current educational paradigms, such as, teaching and learning for
understanding (Perkins & Unger, 1999), inquiry-based learning (Yerushalmy,
Chazan, & Gordon, 1990), and project-based learning (Evensen & Hmelo, 2000).
However, the complexity of EDA raises numerous instructional challenges, for
example, how to teach methods in a new and changing field, how to compensate for
the lack of teachers’ prior experience with statistics, and how to put together an
effective K–12 curriculum in statistics that incorporates EDA.
Elements of EDA have been integrated into the school mathematics curriculum
in several countries, such as Australia (Australian Education Council, 1991, 1994),
England (Department for Education and Employment, 1999), New Zealand
(Ministry of Education, 1992), and the United States (National Council of Teachers
of Mathematics, 1989, 2000). In recently developed curricula—for example, Chance
and Data (Lovitt & Lowe, 1993), The Connected Mathematics Project (Lappan,
Fey, Fitzgerald, Friel, & Phillips, 1996), Data: Kids, Cats, and Ads (Rubin &
Mokros, 1998), Data Handling (Greer, Yamin-Ali, Boyd, Boyle, & Fitzpatrick,
1995), Data Visualization (de Lange & Verhage, 1992), Exploring Statistics
(Bereska, Bolster, Bolster, & Scheaffer, 1998, 1999), The Quantitative Literacy
REASONING ABOUT DATA ANALYSIS
123
Series (e.g., Barbella, Kepner, & Schaeffer, 1994), and Used Numbers (e.g., Friel,
Mokros, & Russel, 1992)—there is growing emphasis on developing students’
statistical reasoning about data analysis; on graphical approaches; on students
gathering their own data and intelligently carrying out investigations; on the use of
educational software, simulations, and Internet; on a cross-curricular approach; and
on the exploration of misuses and distortions as points of departure for study.
Research on Reasoning about Data Analysis
Research on reasoning about data analysis is beginning to emerge as a unique
area of inquiry. In a teaching experiment conducted with lower secondary school
students by Biehler & Steinbring (1991), data analysis was introduced as “detective”
work. Teachers gradually provided students with a data “tool kit” consisting of
tasks, concepts, and graphical representations. The researchers concluded that all
students succeeded in acquiring the beginning tools of EDA, and that both the
teaching and the learning became more difficult as the process became more open.
There appears to be a tension between directive and nondirective teaching methods
in this study. A study by de Lange, Burrill, & Romberg (1993) reveals the crucial
need for professional development of teachers in the teaching of EDA in the light of
the difficulties teachers may find in changing their teaching strategy from expository
authority to guide. It is also a challenge for curriculum developers to consider these
pedagogical issues when creating innovative EDA materials. Recent experimental
studies in teaching EDA around key concepts (distribution, covariation) in middle
school classes have been conducted by Cobb (cf., 1999) with an emphasis on
sociocultural perspectives of teaching and learning.
Ben-Zvi and Friedlander (1997b) described some of the characteristic reasoning
processes observed in students’ handling of data representations in four patterns: (a)
uncritical thinking, in which the technological power and statistical methods are
used randomly or uncritically rather than “targeted”; (b) meaningful use of a
representation, in which students use an appropriate graphical representation or
measure in order to answer their research questions and interpret their findings; (c)
meaningful handling of multiple representations, in which students are involved in
an ongoing search for meaning and interpretation to achieve sensible results as well
as in monitoring their processes; and (d) creative thinking, in which students decide
that an uncommon representation or method would best express their thoughts, and
they manage to produce an innovative graphical representation, or self-invented
measure, or method of analysis.
124
DANI BEN-ZVI
THE CURRENT STUDY
Theoretical Perspectives
Research on mathematical cognition in the last decades seems to converge on
some important findings about learning, understanding, and becoming competent in
mathematics. Stated in general terms, research indicates that becoming competent in
a complex subject matter domain, such as mathematics or statistics, “may be as
much a matter of acquiring the habits and dispositions of interpretation and sense
making as of acquiring any particular set of skills, strategies, or knowledge”
(Resnick, 1988, p. 58). This involves both cognitive development and “socialization
processes” into the culture and values of “doing mathematics” (enculturation).
Many researchers have been working on the design of teaching in order to “bring the
practice of knowing mathematics in school closer to what it means to know
mathematics within the discipline” (Lampert, 1990, p. 29). This chapter is intended
as a contribution to the understanding of these processes in the area of EDA.
Enculturation Processes in Statistics Education
A core idea used in this study is that of enculturation. Recent learning theories in
mathematics education (cf., Schoenfeld, 1992; Resnick, 1988) include the process of
enculturation. Briefly stated, this process refers to entering a community or a
practice and picking up their points of view. The beginning student learns to
participate in a certain cognitive and cultural practice, where the teacher has the
important role of a mentor and mediator, or the enculturator. This is especially the
case with regard to statistical thinking, with its own values and belief systems and its
habits of questioning, representing, concluding, and communicating. Thus, for
statistical enculturation to occur, specific thinking tools are to be developed
alongside collaborative and communicative processes taking place in the classroom.
Statistical Thinking
Bringing the practice of knowing statistics at school closer to what it means to
know statistics within the discipline requires a description of the latter. Based on indepth interviews with practicing statisticians and statistics students, Wild and
Pfannkuch (1999, and Chapter 2) provide a comprehensive description of the
processes involved in statistical thinking, from problem formulation to conclusions.
They suggest that a statistician operates (sometimes simultaneously) along four
dimensions: investigative cycles, types of thinking, interrogative cycles, and
dispositions.
Based on these perspectives, the following research questions were used to
structure the case studies and the analysis of data collected:
REASONING ABOUT DATA ANALYSIS
•
•
125
How do junior high school students begin to reason about data and make
sense of the EDA perspective in the context of open-ended problem-solving
situations, supported by computerized tools?
How do aspects of the learning environment promote students’ statistical
reasoning about data analysis?
METHOD
This study employs a qualitative analysis method, to examine seventh-grade
students’ statistical reasoning about data in the context of two classroom
investigations. Descriptions of the setting, curriculum, and technology are followed
by a profile of the students, and then by methods of data collection and analysis.
The Setting
The study took place in three seventh-grade classes (13-year-old girls and boys)
in a progressive experimental school in Tel-Aviv. The classes were taught by skillful
and experienced teachers, who were aware of the spirit and goals of the curriculum
(described briefly later). They were part of the CompuMath curriculum development
and research team, which included several mathematics and statistics educators and
researchers from the Weizmann Institute of Science, Israel. The CompuMath Project
is a large and comprehensive mathematics curriculum for grades 7–9 (Hershkowitz,
Dreyfus, Ben-Zvi, Friedlander, Hadas, Resnick, Tabach, & Schwarz, 2002), which
is characterized by the teaching and learning of mathematics using open-ended
problem situations to be investigated by peer collaboration and classroom
discussions using computerized environments.
The Statistics Curriculum (SC)—the data component of the CompuMath
Project—was developed to introduce junior high school students (grade 7, age 13) to
statistical reasoning and the “art and culture” of EDA (described in more detail in
Ben-Zvi & Friedlander, 1997b). The design of the curriculum was based on the
creation of small scenarios in which students can experience some of the processes
involved in the experts’ practice of data-based enquiry. The SC was implemented in
schools and in teacher courses, and subsequently revised in several curriculum
development cycles.
The SC was designed on the basis of the theoretical perspectives on learning and
the expert view of statistical thinking just described. It stresses: (a) student’s active
participation in organization, description, interpretation, representation, and analysis
of data situations (on topics close to the students’ world such as sport records,
lengths of people’s names in different countries, labor conflicts, car brands), with a
considerable use of visual displays as analytical tools (in the spirit of Garfield, 1995,
and Shaughnessy, Garfield, & Greer, 1996); and (b) incorporation of technological
tools for simple use of various data representations and transformations of them (as
described in Biehler, 1993, 1997; Ben-Zvi, 2000). The scope of the curriculum is 30
126
DANI BEN-ZVI
periods spread over 2-1/2 months, and it includes student book (Ben-Zvi &
Friedlander, 1997a) and teacher guide (Ben-Zvi & Ozruso, 2001).
Technology
During the experimental implementation of the curriculum a spreadsheet
package (Excel) was used. Although Excel is not the ideal tool for data analysis
(Ben-Zvi, 2000), the main reasons for choosing this software were:
•
•
•
Spreadsheets provide direct access that allows students to view and explore
data in different forms, investigate different models that may fit the data,
manipulate a line to fit a scatter plot, etc.
Spreadsheets are flexible and dynamic, allowing students to experiment with
and alter representations of data. For instance, they may change, delete or
add data entries in a table and consider the graphical effect of the change or
manipulate directly data points on the graph and observe the effects on a line
of fit. Spreadsheets are adaptable by providing control over the content and
style of the output.
Spreadsheets are common, familiar, and recognized as a fundamental part of
computer literacy (Hunt, 1995). They are used in many areas of everyday
life, as well as in other domains of mathematics curricula, and are available
in many school computer labs. Hence, learning statistics with a spreadsheet
helps to reinforce the idea that this is something connected to the real world.
Participants
This study focuses mainly on two students—A and D (in the first case), and on A
and D and four of their peers (in the second case). A and D were above–average
ability students, very verbal, experienced in working collaboratively in computerassisted environments, and willing to share their thoughts, attitudes, doubts, and
difficulties. They agreed to participate in this study, which took place within their
regular classroom periods and included being videotaped and interviewed (after
class) as well as furnishing their notebooks for analysis.
When they started to learn this curriculum, A and D had limited in-school
statistical experience. However, they had some informal ideas and positive
dispositions toward statistics, mostly through exposure to statistics jargon in the
media. In primary school, they had learned only about the mean and the uses of
some diagrams. Prior to, and in parallel with, the learning of the SC they studied
beginning algebra based on the use of spreadsheets to generalize numerical linear
patterns (Resnick & Tabach, 1999).
The students appeared to engage seriously with the curriculum, trying to
understand and reach agreement on each task. They were quite independent in their
work, and called the teacher only when technical or conceptual issues impeded their
progress. The fact that they were videotaped did not intimidate them. On the
REASONING ABOUT DATA ANALYSIS
127
contrary, they were pleased to speak out loud; address the camera explaining their
actions, intentions, and misunderstandings; and share what they believed were their
successes.
Data Collection
To study the effects of the new curriculum, student behavior was analyzed using
video recordings, classroom observations, interviews, and the assessment of
students’ notebooks and research projects. The two students—A and D—were
videotaped at almost all stages (20 hours of tapes), and their notebooks were also
collected.
Analysis
The analysis of the videotapes was based on interpretive microanalysis (see, for
example, Meira, 1991, pp. 62–63): a qualitative detailed analysis of the protocols,
taking into account verbal, gestural and symbolic actions within the situations in
which they occurred. The goal of such an analysis is to infer and trace the
development of cognitive structures and the sociocultural processes of
understanding and learning.
Two stages were used to validate the analysis, one within the CompuMath
researchers’ team and one with four researchers from the Weizmann Institute of
Science, who had no involvement with the data or the SC (triangulation in the sense
of Schoenfeld, 1994). In both stages the researchers discussed, presented, and
advanced and/or rejected hypotheses, interpretations, and inferences about the
students’ cognitive structures. Advancing or rejecting an interpretation required: (a)
providing as many pieces of evidence as possible (including past and/or future
episodes, and all sources of data as described earlier); and (b) attempts to produce
equally strong alternative interpretations based on the available evidence. In most
cases the two analyses were in full agreement, and points of doubt or rejection were
refuted or resolved by iterative analysis of the data.
Case Study 1: Constructing Global Views of Data
The first case study concentrates on the growth and change of the students’
conceptions as they entered and learned the culture of EDA and started to develop
their reasoning about data and data representations. This study focused on the shift
between local observations and global observations. In EDA, local understanding of
data involves focusing on an individual value (or a few of them) within a group of
data (a particular entry in a table of data, a single point in a graph). Global
understanding refers to the ability to search for, recognize, describe, and explain
general patterns in a set of data (change over time, trends) by naked-eye observation
of distributions and/or by means of statistical parameters or techniques. Looking
globally at a graph as a way to discern patterns and generalities is fundamental to
128
DANI BEN-ZVI
statistics, and it includes the production of explanations, comparisons, and
predictions based on the variability in the data. By attending to where a collection of
values is centered, how those values are distributed or how they change over time,
statistics deals with features not inherent to individual elements but to the aggregate
that they comprise.
Learning to look globally at data can be a complex process. Studies in
mathematics education show that students with a sound local understanding of
certain mathematical concepts struggle to develop global views (cf., Monk, 1988;
Bakker, Chapter 7). Konold, Pollatsek, and Well (1997) observed that high school
students—after a yearlong statistics course—still had a tendency to focus on
properties of individual cases rather than on propensities of data sets.
The interplay between local and global views of data is reflected in the tools
statistics experts use. Among such tools, which support data-based arguments,
explanations, and (possibly) forecasts, are time plots, which highlight data features
such as trends and outliers, center, rates of change, fluctuations, cycles, and gaps
(Moore & McCabe, 1993). For the purpose of reflection (or even dishonest
manipulation), trends can be highlighted or obscured by changing the scales. For
example, in Cartesian-like graphs the vertical axis can be “stretched,” so that the
graph conveys the visual impression of a steep slope for segments connecting
consecutive points, giving a visual impression that the rate of change is large.
Experts propose standards in order to avoid such visual distortions (cf., Cleveland,
1994, pp. 66–67).
The Task
In the first activity of the SC, the Men’s 100 Meters Olympic Race, students
were asked to examine real data about the winning times in the men’s 100 meters
during the modern Olympic Games. Working in pairs, assisted by the spreadsheet,
they were expected to analyze the data in order to find trends and interesting
phenomena. This covariation problem concerned tables and graphical
representations (time plots) and formulating verbal statements as well as translating
among those representations. In the second part of this activity, a problem is
presented to students in the following way:
Two sports journalists argue about the record times in the 100 meters. One of them
claims that there seems to be no limit to human ability to improve the record. The
other argues that sometime there will be a record, which will never be broken. To
support their positions, both journalists use graphs.
One task of this investigation asks students to design a representation, using a
computer, to support different statements, such as: (a) The times recorded in the
Olympic 100 meters improved considerably; and (b) Throughout the years, the
changes in the Olympic times for the 100 meters were insignificant.
REASONING ABOUT DATA ANALYSIS
129
Analysis: Toward an Expert Reasoning
Students started their introduction to EDA by learning to make sense of general
questions normally asked in data exploration. They often offered irrelevant answers,
revealed an implicit sense of discomfort with these answers, asked for help, and
used the teacher’s feedback to try other answers. They worked on EDA tasks with
partial understanding of the overall goal. By confronting the same issues with
different sets of data and in different investigational contexts, they overcame some
of their difficulties. The teacher’s role included reinforcing the legitimacy of an
observation as being of the right “kind” despite not being fully correct, or simply
refocusing attention on the question. These initial steps in an unknown field are
regarded as an aspect of the enculturation process (e.g., Schoenfeld, 1992; Resnick,
1988).
At the beginning stage, students also struggled with how to read and make sense
of local (pointwise) information in tables and in graphs. This stage involved learning
to see each row in a table (Table 1) with all its details as one whole case out of the
many shown, and focusing their attention on the entries that were important for the
curricular goal of this activity: the record time, and the year it occurred. This view of
each single row, with its two most relevant pieces of information, was reinforced
afterward when students displayed the data in a time plot (Figure 1), since the graph
(as opposed to the table) displays just these two variables. Also, this understanding
of pointwise information served later on as the basis for developing a global view, as
an answer to “how do records change over time?”
Table 1. Part of the table of the men’s 100 meters winning times in the 23 Olympiads from
1896 to 1996
Year
1896
1900
1904
1908
1912
1920
1924
City
Athens
Paris
St. Louis
London
Stockholm
Antwerp
Paris
Athlete’s name
Thomas Burke
Francis Jarvis
Archie Hahn
Reginald Walker
Ralph Craig
Charles Paddock
Harold Abrahams
Country
USA
USA
USA
South Africa
USA
USA
UK
Time (sec.)
12.0
10.8
11.0
10.8
10.8
10.8
10.6
130
DANI BEN-ZVI
Figure 1. Time plot showing winning times for men’s 100 meters.
Instead of looking at the graph as a way to discern patterns in the data, students’
response focused first on the nature and language of the graph as a representation—
how it displays discrete data, rather than as a tool to display a generality, a trend.
When invited to use the line connecting the dots in the dot plot (Figure 1) as an
artifact to support a global view, they rejected it because it lacked any meaning in
light of the pointwise view they had just learned, and with which they felt
comfortable.
When A and D were asked to describe what they learned from the 100 meters
table (Table 1), they observed that “There isn’t anything constant here.” After the
teacher reinforced the legitimacy of their observation, they explained more clearly
what they meant by constancy in the following dialogue (the dialogues are translated
from Hebrew, therefore they may not sound as authentic as in the original):
D
A
D
A
D
A
Let’s answer the first question: “What do you learn from this table?”
There are no constant differences between …
We learn from this table that there are no constant differences between the
record times of … [looking for words]
The results of …
The record times of the runners in …
There are no constant differences between the runners in the different
Olympiads …
The students’ attention focused on differences between adjacent pairs of data
entries, and they noticed that these differences are not constant. These comparisons
presumably stemmed from their previous knowledge and experiences with a
REASONING ABOUT DATA ANALYSIS
131
spreadsheet in algebra toward finding a formula. In other words, one of the factors
that moved them forward toward observation of patterns was their application of
previous knowledge. Thus, the general pattern the students observed and were able
to express was that the differences were not constant. Maybe they implicitly began
to sense that the nature of these data in this new area of EDA, as opposed to algebra,
is disorganized, and it is not possible to capture it in a single deterministic formula.
After the two students had analyzed the 100 meters data for a while, they worked
on the next question: to formulate a preliminary hypothesis regarding the trends in
the data. They seemed to be embarrassed by their ignorance—not knowing what
trends mean, and asked for the teacher’s help.
A
T
A
T
D
T
A
T
A&D
T
A&D
T
D
T
A
T
D
T
A
T
A&D
T
D
T
A
T
A
T
What are trends? What does it mean?
What is a trend? A trend is … What’s the meaning of the word trend?
Ah … Yes, among other things, and what is the meaning in the question.
O.K. Let’s see: We are supposed to look at what?
At the table.
At the table. More specifically—at what?
At the records.
At the records. O.K. And now, we are asked about what we see: Does it
decrease all the time?
No.
No. Does it increase all the time?
No.
No. So, what does it do after all?
It changes.
It changes. Correct.
It generally changes from Olympiad to Olympiad. Generally, not always.
Sometimes it doesn’t change at all. Very nice! Still, it usually changes. And,
is there an overall direction?
No!
No overall direction?
There is no overall declining direction, namely, improvement of records.
But, sometimes there is deterioration …
Hold on. The overall direction is? Trend and direction are the same.
Increase, Increase!
The general trend is …
Improvement in records.
What is “improvement in records”?
Decline in running times.
Yes. Decline in running times. O.K. … But …
Sometimes there are bumps, sort of steps …
… But, this means that although we have deviations from the overall
direction here and there, still the overall direction is this … Fine, write it
down.
The students were unfamiliar with the term trends, and they were vague about
the question’s purpose and formulation. In response, the teacher gradually tried to
nudge the students’ reasoning toward global views of the data. Once they
understood the intention of the question, the students—who viewed the irregularity
132
DANI BEN-ZVI
as the most salient phenomenon in the data—were somehow bound by the saliency
of local values: They remained attached to local retrogressions, which they could not
overlook in favor of a general sense of direction/trend.
The teacher, who did not provide a direct answer, tried to help them in many
ways. First, she devolved the question (in the sense of Brousseau, 1997, pp. 33–35
and 229–235), and when this did not work, she rephrased the question in order to
refocus it: “We are supposed to look at what?” and “more specifically at what?” She
then hints via direct questions: “Does it increase all the time?” and “So, what does it
do after all?” In addition, she appropriated (in the sense of Moschkovich, 1989) the
students’ answers to push the conversation forward by using their words and
answers, for example: “It changes. Correct”; “increase”; “decrease.” At other times
she subtly transformed their language, such as changing bumps to deviations; or by
providing alternative language to rephrase the original question to: “Is there an
overall direction?”
After the interaction just presented, A and D wrote in their notebooks the
following hypothesis: “The overall direction is increase in the records, yet there
were occasionally lower (slower) results, than the ones achieved in previous
Olympiads.” At this stage, it seems that they understood (at least partially) the
meaning of trend, but still stressed (less prominently than before) those local
features that did not fit the pattern.
In the second part of the activity, the students were asked to delete an “outlying”
point (the record of 12 sec. in the first Olympiad, 1896) from the graph (Figure 1)
and describe the effect on its shape. The purpose of the curriculum was to lead
students to learn how to transform the graph in order to highlight trends. It was
found that by focusing on an exceptional point and the effect of its deletion directed
students’ attention to a general view of the graph. This finding seems consistent with
Ainley (1995), who also describes how an outlier supported students’ construction
of global meanings for graphs.
The following transcript describes the students’ comments on the effect of
changing the vertical scales of the original 100 meters graph from 0–12 (Figure 2) to
0–40 (Figure 3) as requested in the second part of the activity.
A
D
A
Now, the change is that the whole graph stayed the same in shape, but it went
down.
The same in shape, but much, much lower, because the column [the y-axis]
went up higher. Did you understand that? [D uses both hands to signal the
down and up movements of the graph and the y-axis respectively.]
Because now the 12, which is the worst record, is lower. It used to be once the
highest. Therefore, the graph started from very high. But now, it [the graph] is
already very low.
REASONING ABOUT DATA ANALYSIS
40
12
35
Olympic Time (sec.)
Olympic Time (sec.)
10
8
6
4
2
30
25
20
15
10
5
0
1880
133
0
1900
1920
1940
Year
1960
1980
Figure 2. The original 100 meters graph.
2000
1880
1900
1920
1940
Year
1960
1980
2000
Figure 3. The 100 meters graph after the
change of the y-scales.
The change of scales also focused the students’ attention on the graph as a
whole. They talked about the change in the overall relative position of the graph,
whereas they perceived the shape itself as “the same.” Their description included
global features of the graph (“The whole graph … went down”), attempts to make
sense of the change via the y-axis (“Because the column went up higher”), and
references to an individual salient point (“Because now the 12, which is the worst
record, is lower”). Student A wrote the following synthesis in his notebook: “The
graph remained the same in its shape, but moved downward, because before, 12—
the worst record—was the highest number on the y-axis, but now it is lower.”
However, the purpose of the rescaling was to enable the students to visualize the
graph as a whole in a different sense. In order to take sides in the journalists’ debate,
the transformation was aimed at visually supporting the position that there are no
significant changes in the records. Although the students’ focus was global, for them
the perceptually salient effect of the rescaling was on relative “location” of the
whole graph rather than on its trend.
When A and D were asked to design a graph to support the (opposite) statement:
“Over the years, the times recorded in the Olympic 100 meters improved
considerably,” they did not understand the task and requested the teacher’s help:
T
A
T
D
[Referring to the 0–40 graph displayed on the computer screen—see Figure 3.]
How did you flatten the graph?
[Visibly surprised.] How did we flatten it?
Yes, you certainly notice that you have flattened it, don’t you?
No. The graph was like that before. It was only higher up [on the screen].
The teacher and the students seemed to be at cross purposes. The teacher
assumed that the students had made sense of the task in the way she expected, and
that they understood the global visual effect of the scaling on the graph’s shape.
When she asked, “How did you flatten the graph?” she was reacting to what she
thought was their difficulty: how to perform a scale change in order to support the
claim. Thus, her hint consisted of reminding them of what they had already done
(scale change). However, the students neither understood her jargon (“flatten the
134
DANI BEN-ZVI
graph”) nor regarded what they had done as changing the graph’s shape (“The graph
was like that before”). Although this intervention is an interesting case of
miscommunication, it apparently had a catalytic effect, as reflected in the dialogue
that took place immediately afterward—after the teacher realized what might have
been their problem.
T
A
D
A
D
A
D
A
D
How would you show that there were very very big improvements?
[Referring to the 0–40 graph; see Figure 3.] We need to decrease it [the
maximum value of the y-axis]. The opposite of … [what we have previously
done].
No. To increase it [to raise the highest graph point, i.e., 12 sec.].
The graph will go further down.
No. It will go further up.
No. It will go further down.
What you mean by increasing it, I mean—decreasing.
Ahhh … Well, to decrease it … O.K., That’s what I meant. Good, I understand.
As a matter of fact, we make the graph shape look different, although it is
actually the same graph. It will look as if it supports a specific claim.
When the teacher rephrased her comment (“How would you show that there
were very very big improvements?”) the students started to make sense of her
remarks, although they were still attached to the up-down movement of the whole
graph. Student D began to discern that a change of scale might change the
perceptual impressions one may get from the graph. The teacher’s first intervention
(“How did you flatten the graph?”), although intended to help the students make
sense of the task, can be considered unfortunate. She did not grasp the nature of their
question, misjudged their position, and tried to help by reminding them of their
previous actions on scale changing. The students seemed comfortable with scale
changing, but their problem was that they viewed this tool as doing something
different from what the curriculum intended.
The miscommunication itself, and the teacher’s attempt to extricate herself from
it, contributed to their progress. At first, A and D were surprised by her description
of what they had done as flattening the graph. Then, they “appropriated” the
teacher’s point of view (in the sense of Moschkovich, 1989) and started directing
their attention to the shape of the graph rather than to its relative position on the
screen. They started to focus on scaling and rescaling in order to achieve the “most
convincing” design. Briefly stated, they transferred and elaborated, in iterative steps,
ideas of changing scales from one axis to the other until they finally arrived at a
satisfying graph (Figure 4) with no further intervention from the teacher. (See BenZvi, 1999, for a detailed description of this rescaling process.) Students A and D
flexibly and interchangeably relied on pointwise observations and global
considerations (both in the table and in the graph) in order to fix the optimal
intervals on the axes so that the figure would look as they wished.
Olympic Time (sec.)
REASONING ABOUT DATA ANALYSIS
135
12.0
11.8
11.6
11.4
11.2
11.0
10.8
10.6
10.4
10.2
10.0
9.8
1896
1916
1936 1956
Year
1976
1996
Figure 4. Graph designed to support the statement that the 100 meters times improved
considerably.
In summary, at the beginning of this episode the students interpreted the effect
of changing scales as a movement of the graph downward rather than as an effect on
its shape. Following the teacher’s intervention, they started to consider how scaling
of both axes affects the shape of the graph. Moreover, they were able to develop
manipulations for these changes to occur in order to achieve the desired shape. In
the process, they began to move between local and global views of the data in two
representations.
It is interesting to notice the students’ persistent invocation of “differences”
between values (“This way we actually achieved a result that appears as if there are
enormous differences”). However, their focus here is on the way these differences
are “blown up” by the scaling effect, rather than on them not being constant, as was
the case earlier when differences were invoked. The importance of their prior
knowledge appears to have been adapted to a new use and for a new purpose. The
differences, which were used to drive the way the students made sense of patterns in
the data, were being successfully used here as a powerful tool to evaluate their
success in designing a graph to visually support a certain claim about a trend in the
data.
Case Study 2: Students Taking a Stand
The second case study focused on the role of the SC learning environment in
supporting students’ reasoning about data analysis. The students in this study were
observed as they engaged in taking a stand in a debate on the basis of data analysis.
The purpose of the analysis was to advance the understanding of (a) how students
learn in such an environment, and (b) how can we be more aware of student
136
DANI BEN-ZVI
reasoning, in order to design “better” tasks. Better tasks are situations in which
students engage seriously, work and reflect, and advance their statistical reasoning
about data.
One SC activity was the Work dispute in a printing company. In this activity, the
workers are in dispute with the management, which has agreed to an increase in the
total salary amount by 10 percent. How this amount of money is to be divided
among the employees is a problem—and thereby hangs the dispute. The students
were given the salary list of the 100 employees, along with an instruction booklet to
guide them in their work. They also received information about the national average
and minimum salaries, Internet sites to look for data on salaries, and newspaper
articles about work disputes and strikes. In the first part of the activity, students
were required to take sides in the dispute and to clarify their arguments. Then, using
the computer, they described the distribution of salaries and used statistical
measures (e.g. median, mean, mode, and range) to support their position in the
dispute. The students learned the effects of grouping data and the different uses of
statistical measures in arguing their case. In the third part, the students suggested
alterations to the salary structure without exceeding the 10 percent limit. They
produced their proposal to solve the dispute, and designed representations to support
their position and refute opposing arguments. Finally the class met for a general
debate and voted for the winning proposal. The time spent on the full activity was
about seven class periods, or a total of six hours.
This task context was familiar to students since it provided interesting, realistic,
and meaningful data. The data were altered so that they were more manageable and
provided points of departure for addressing some key statistical concepts. For
example, the various central tendency measures were different, allowing students to
choose a representative measure to argue their case. It was arranged that the mean
salary (5000 IS) was above the real national averages (4350 IS—all employees,
4500 IS—printers only).
Students were expected to clarify their thoughts, learn to listen to each other, and
try to make sense of each other’s ideas. But, most importantly students were asked
to take sides in the conflict situation. Their actions (e.g. handling data, choosing
statistics, creating displays, and arguing) were all motivated, guided, and targeted by
the stand they chose. However, their actions sometimes caused them to change their
original stand.
The following transcript from a video recording of one of the experimental
classes illustrates the use of concepts, arguments, and statitical reasoning that the
task promoted. It is based on a group of students who chose to take the side of the
workers. After clarifying their arguments, they described the distribution of the
current salaries, guided by their position in the dispute. The student pairs prepared
various suggested alterations to the salary structure to favor workers (as opposed to
management), and then held a series of meetings with fellow student pairs (about 10
students in all), in which they discussed proposals, designed graphical
representations to support their position, and prepared themselves for the general
debate. This transcript is taken from the second “workers’ meeting.” It includes the
students A and D from the previous case study along with four other students
(referred to as S, N, M, and H).
REASONING ABOUT DATA ANALYSIS
137
D
OK, we have this pie [chart] and we plan to use it [See Figure 5]. Everybody
agrees?
Students
Yes, yes.
D
Let’s see what should we say here? Actually we see that … 60 percent of …
A
60 percent of the workers are under the average wage [4500 IS]. Now, by
adding 12 percent – there are far fewer [workers under the national average].
S
OK, but I have a proposal, that brings almost everybody above the average
wage. If we add 1000 shekel to the 49 workers, who are under the average …
N
It’s impossible. Can’t you understand that?
S
This [my proposal] will leave us with 1000 shekel, that can be divided among
the other workers, who are over [the average].
A
Then each of them will get exactly five shekel! …
M
But we don’t have any chance to win this way.
D
What is the matter with you? We’ll have a revolt in our own ranks. Do you
want that to happen at the final debate?
S
Anyway, this is my opinion! If there are no better proposals …
D
Of course there are: a rise of 12 percent on each salary [excluding the
managers] …
H
OK. Show me by how much will your proposal reduce the 60 percent.
N
I am printing now an amazing proposal—everybody will be above the
[national] average: No worker will be under the average wage! This needs a
considerable cut in the managers’ salaries …
Current salaries (IS)
11500-15000
3%
15000-18500
3%
8000-11500
7%
4500-8000
21%
1000-4500
66%
Figure 5. The “workers” description of the current salary distribution.
In this exchange, three different proposals for the alteration of the salary
structure were presented. The first, offered by A and D, suggested an increase of 12
percent for all workers but the managers’ salaries remained unchanged. The second
proposal, originated by S, suggested an equal (1000 IS) increase for each of the 49
workers earning less than the national average (4350 IS), the small remainder to be
divided among the other workers. Again the managers’ salaries remained
138
DANI BEN-ZVI
unchanged. The third proposal, presented by N, suggested a considerable cut in
managers’ salaries, and an increase for all workers under the national average, to
bring them above the average.
Central to students’ actions and motives is the stand to be taken by the workers.
For example, Figure 5 is grouped to emphasize the large proportion of salaries
below the printers’ national average. Moreover, the workers’ explanations for
choosing representative measures and graphical displays emerged from their stand in
the dispute. Taking a stand also made students check their methods, arguments, and
conclusions with extreme care. They felt it natural to face criticism and
counterarguments made by peers and teacher, and to answer them.
These observations suggest that students’ reasoning about data as well as their
interactions with data were strongly affected by the design of the problem situation,
which includes taking a stand. The students were able to:
•
•
•
•
•
•
Deal with a complex situation and the relevant statistical concepts (averages,
percentages, charts, etc.).
Select among measures of center, in relation to looking at graphs, which is
an important component of EDA reasoning.
Use critical arguments to confront conflicting alternatives.
Use statistical procedures and concepts with a purpose and within a context,
to solve problems, relying heavily on visual representations and computer.
Demonstrate involvement, interest, enthusiasm, and motivation in their
learning.
Create their own products (proposals and their representations).
DISCUSSION
The two case studies focused on students’ reasoning about data analysis as they
started to develop views (and tools to support them) that are consistent with the use
of EDA. Sociocultural and cognitive perspectives will now be considered in a
detailed analysis of the case studies. The sociocultural perspective focuses on
learning (of a complex domain, such as EDA) as the adoption of the viewpoint of a
community of experts, in addition to learning skills and procedures. Thus, this study
looked at learning as an enculturation process with two central components: students
engaged in doing, investigating, discussing and making conclusions; and teachers
engaged in providing role models by being representatives of the culture their
students are entering through timely interventions. The cognitive perspective
focuses on the development and change in students’ conceptions and the evolution
of their reasoning. Learning is perceived as a series of interrelated actions by the
learner to transform information to knowledge—such as collecting, organizing, and
processing information—to link it to previous knowledge and provide
interpretations (Davis, Maher, & Noddings, 1990).
It is not easy to tease out the two perspectives for this analysis. Conceptions and
reasoning evolve within a purposeful context in a social setting. On the other hand,
REASONING ABOUT DATA ANALYSIS
139
developing an expert point of view, and interacting with peers or with a teacher,
implies undergoing mental actions within specific tasks related to complex ideas.
These actions over time are a central part of the meaningful experience within which
the culture of the field is learned and the reasoning is developed. These perspectives
contribute to the analysis of the data, which revealed the following factors in the
process of developing students’ reasoning about data in the EDA environment.
The Role of Previous Knowledge
One of the strongest visible pieces of knowledge A and D applied and repeatedly
referred to was the difference between single pairs of data, which came from their
practices in the algebra curriculum. This background knowledge played several
roles. On the one hand, it gave these students the differences lens, which
conditioned most of what they were able to conclude for quite a while. On the other
hand, looking at differences helped them to refocus their attention from “pure”
pointwise observations toward more global conclusions (that the differences are not
constant). Also, looking at differences helped the students, in implicit and subtle
ways, to start getting accustomed to a new domain in which data do not behave in
the deterministic way that the students were used to in algebra, in which regularities
are captured in a single exact formula.
A and D’s focus on the differences served more than one function in their
learning. It was invoked and applied not only when they were asked to look for
patterns in the data but also in a very fruitful way when they spontaneously
evaluated the results of rescaling the graph. There, they used the differences in order
to judge the extent to which the re-scaled graph matched their goal of designing a
graph to support a certain claim about trends.
Thus A and D’s previous knowledge not only conditioned what they saw—
sometimes limiting them—but also, on other occasions, empowered them.
Moreover, their previous knowledge served new emerging purposes, as it evolved in
the light of new contextual experiences. In conclusion, this analysis illustrates the
multifaceted and sometimes unexpected roles prior knowledge may play, sometimes
hindering progress and at other times advancing knowledge in interesting ways.
Moving from a Local-Pointwise View toward a Flexible Combination of Local and
Global Views
In the first case study, A and D persistently emphasized local points and adjacent
differences. Their views were related to their “history” (i.e., previous background
knowledge about regularities with linear relationships in algebra). The absence of a
precise regularity in a set of statistical data (understanding variability) was their first
difficulty. When they started to adopt the notion of trend (instead of the regular
algebraic pattern expected), they were still attentive to the prominence of “local
deviations.” These deviations kept them from dealing more freely with global views
of data. Later on, it was precisely the focus on certain pointwise observations (for
140
DANI BEN-ZVI
example, the place and deletion of one outlying point) that helped them to direct
their attention to the shape of the (remaining) graph as a whole. During the scaling
process, A and D looked at the graph as a whole; but rather than focusing on the
trends, they discussed its relative locations under different scales. Finally, when they
used the scaling and had to relate to the purpose of the question (support of claims in
the journalists’ debate), they seemed to begin to make better sense of trends.
It is interesting to note that the local pointwise view of data sometimes restrained
the students from seeing globally, but in other occasions it served as a basis upon
which the students started to see globally. In addition, in a certain context, even
looking globally indicated different meanings for the students than for an expert
(i.e., noting the position of the graph rather than noticing a trend).
Appropriation: A Learning Process That Promotes Understanding
The data show that most of the learning took place through dialogues between
the students themselves and in conversations with the teacher. Of special interest
were the teacher’s interventions, at the students’ request (additional examples of
such interventions are described in Ben-Zvi & Arcavi, 2001). These interventions,
though short and not necessarily directive, had catalytic effects. They can be
characterized in general as “negotiations of meanings” (in the sense of Yackel &
Cobb, 1996). More specifically, they are interesting instances of appropriation as a
nonsymmetrical, two-way process (in the sense of Moschkovich, 1989). This
process takes place, in the zone of proximal development (Vygotsky, 1978, p. 86),
when individuals (expert and novices, or teacher and students) engage in a joint
activity, each with their own understanding of the task. Students take actions that are
shaped by their understanding; the teacher “appropriates” those actions—into her
own framework—and provides feedback in the form of her understandings, views of
relevance, and pedagogical agenda. Through the teacher’s feedback, the students
start to review their actions and create new understandings for what they do.
In this study, the teacher appropriated students’ utterances with several
objectives: to legitimize their directions, to redirect their attention, to encourage
certain initiatives, and implicitly to discourage others (by not referring to certain
remarks). The students appropriate from the teacher a reinterpretation of the
meaning of what they do. For example, they appropriate from her answers to their
inquiries (e.g., what trend or interesting phenomena may mean), from her
unexpected reactions to their request for explanation (e.g., “How did you flatten the
graph?”), and from inferring purpose from the teacher’s answers to their questions
(e.g., “We are supposed to look at what?”).
Appropriation by the teacher (to support learning) or by the students (to change
the sense they make of what they do) seems to be a central mechanism of
enculturation. As shown in this study, this mechanism is especially salient when
students learn the dispositions that accompany using the subject matter (data
analysis) rather than its skills and procedures.
REASONING ABOUT DATA ANALYSIS
141
Curriculum Design to Support Reasoning about Data
The example described in the second case study illustrates how curriculum
design can take into account new trends in subject matter (EDA)—its needs, values,
and tools—as well as student reasoning. Staging and encouraging students to take
sides pushed them toward levels of reasoning and discussion that have not been
observed in the traditional statistics classroom. They were involved in selecting
appropriate statistical measures, rather than just calculating them, and in choosing
and designing graphs to best dispaly their views. They showed themselves able to
understand and judge the complexities of the situation—engaged in preparing a
proposal that in their view was acceptable, rational, and just—and were able to
defend it.
Furthermore, students realized that data representations could serve rhetorical
functions, similar to their function in the work of statisticians, who select data,
procedures, tools, and representations that support their perspective. Thus, the
development of students’ reasoning about data is extended beyond the learning of
statistical mathods and concepts, to involve students in “doing” statistics in a
realistic context.
IMPLICATIONS
The learning processes described in this chapter took place in a carefully
designed environment. It is recommended that similar environments be created to
help students develop their reasoning about data analysis. The essential features of
such learning environments include
•
•
•
A curriculum built on the basis of EDA as a sequence of semi-structured (yet
open) leading questions within the context of extended meaningful problem
situations (Ben-Zvi & Arcavi, 1998)
Timely and nondirective interventions by the teacher as representative of the
discipline in the classroom (cf., Voigt, 1995)
Computerized tools that enable students to handle complex actions (change
of representations, scaling, deletions, restructuring of tables, etc.) without
having to engage in too much technical work, leaving time and energy for
conceptual discussions
In learning environments of this kind, students develop their reasoning about
data by meeting and working with, from the very beginning, ideas and dispositions
related to the culture of EDA. This includes making hypotheses, formulating
questions, handling samples and collecting data, summarizing data, recognizing
trends, identifying variability, and handling data representations. Skills, procedures
and strategies (e.g., reading graphs and tables, rescaling) are learned as integrated in
the context and at the service of the main ideas of EDA.
142
DANI BEN-ZVI
It can be expected that beginning students will have difficulties of the type
described when confronting the problem situations posed by the EDA curriculum.
However, what A and D experienced is an integral and inevitable component of their
meaningful learning process with long-lasting effects (cf., Ben-Zvi 2002). These
results suggest that students should work in environments such as the one just
described, which allows for:
•
•
•
Students’ prior knowledge to be engaged in interesting and surprising
ways—possibly hindering progress in some instances but making the basis
for construction of new knowledge in others
Many questions to be raised—some will either make little sense to them, or,
alternatively, will be reinterpreted and answered in different ways than
intended
Students’ work to be based on partial understandings, which will grow and
evolve
This study confirmed that even if students do not make more than partial sense
of the material with which they engage, appropriate teacher guidance, in-class
discussions, peer work and interactions, and more importantly, ongoing cycles of
experiences with realistic problem situations, will slowly support the building of
meanings and the development of statistical reasoning.
Multiple challenges exist in the assessment of outcomes of students’ work in
such a complex learning environment: the existence of multiple goals for students,
the mishmash between the contextual (real-world) and the statistical, the role of the
computer-assisted environment, and the group versus the individual work (Gal &
Garfield, 1997). It is recommended that extended performance tasks be used to
assess students’ reasoning about data, instead of traditional tests that focus on
definitions and computation. Performance tasks should be similar to those given to
students during the learning activities (e.g., open-ended questions, “complete” data
investigations), allowing students to work in groups and use technological tools.
In EDA learning environments of the kind described in these case studies,
teachers cease to be the dispensers of a daily dose of prescribed curriculum and must
respond to a wide range of unpredictable events. They can play a significant role in
their interactions with students by encouraging them to employ critical reasoning
strategies and use data representations to search for patterns and convey ideas;
expanding and enriching the scope of their proposed work; and providing reflective
feedback on their performance. Thus our challenge is to assist statistics educators in
their important role of mentors and mediators, or the enculturators.
Given that EDA is a challenging topic in statistics education and is part of the
mathematics curriculum in many schools today, it is important that teaching efforts
be guided not only by systematic research on understanding the core ideas in data
analysis but also by how reasoning about data analysis develops. Without this
research and the implementation of results, statistics classes will continue to teach
graphing and data-collection skills that do not lead to the ability to reason about data
analysis.
REASONING ABOUT DATA ANALYSIS
143
Many research questions need to be addressed, including those pertaining to the
development of students’ understanding and reasoning (with the assistance of
technological tools), the student-teacher and student-student interactions within
open-ended data investigation tasks, the role of enculturation processes in learning,
and the impact of learning environments similar to those described here. The
refinement of these ideas, and the accumulation of examples and studies, will
contribute to the construction of an EDA learning and instruction theory.
REFERENCES
Ainley, J. (1995). Re-viewing graphing: Traditional and intuitive approaches. For the Learning of
Mathematics, 15(2), 10–16.
Australian Education Council (1991). A national statement on mathematics for Australian schools.
Carlton, Vic.: Author.
Australian Education Council (1994). Mathematics—A curriculum profile for Australian schools. Carlton,
Vic.: Curriculum Corporation.
Barbella, P., Kepner, J., & Schaeffer, R. L. (1994). Exploring Measurement. Palo Alto, CA: Seymour
Publications.
Ben-Zvi, D. (1999). Constructing an understanding of data graphs. In O. Zaslavsky (Ed.), Proceedings of
the Twenty-Third Annual Conference of the International Group for the Psychology of Mathematics
Education (Vol. 2, pp. 97–104). Haifa, Israel: Technion—Institute of Technology.
Ben-Zvi, D. (2000). Toward understanding of the role of technological tools in statistical learning.
Mathematical Thinking and Learning, 2(1&2), 127–155.
Ben-Zvi, D. (2002). Seventh grade students’ sense making of data and data representations. In B. Phillips
(Ed.), Proceedings of the Sixth International Conference on Teaching of Statistics (on CD-ROM).
Voorburg, The Netherlands: International Statistical Institute.
Ben-Zvi, D., & Arcavi, A. (2001). Junior high school students’ construction of global views of data and
data representations. Educational Studies in Mathematics, 45, 35–65.
Ben-Zvi, D., & Arcavi, A. (1998). Toward a characterization and understanding of students’ learning in
an interactive statistics environment. In L. Pereira-Mendoza (Ed.), Proceedings of the Fifth
International Conference on Teaching Statistics (Vol. 2, 647–653). Voorburg, The Netherlands:
International Statistical Institute.
Ben-Zvi, D., & Friedlander, A. (1997a). Statistical investigations with spreadsheets—Student’s workbook
(in Hebrew). Rehovot, Israel: Weizmann Institute of Science.
Ben-Zvi, D., & Friedlander, A. (1997b). Statistical thinking in a technological environment. In J. B.
Garfield & G. Burrill (Eds.), Research on the Role of Technology in Teaching and Learning Statistics
(pp. 45–55). Voorburg, The Netherlands: International Statistical Institute.
Ben-Zvi, D., & Ozruso, G. (2001). Statistical investigations with spreadsheets—Teacher’s guide (in
Hebrew). Rehovot, Israel: Weizmann Institute of Science.
Bereska, C., Bolster, C. H., Bolster, L. C., & Scheaffer, R. (1998). Exploring statistics in the elementary
grades, Book 1 (Grades K–6). New York: Seymour Publications.
Bereska, C., Bolster, C. H., Bolster, L. C., & Scheaffer, R. (1999). Exploring statistics in the elementary
grades, Book 2 (Grades 4–8). New York: Seymour Publications.
Biehler, R. (1990). Changing conceptions of statistics: A problem area for teacher education. In A.
Hawkins (Ed.), Proceedings of the International Statistical Institute Round Table Conference (pp. 20–
38). Voorburg, The Netherlands: International Statistical Institute.
Biehler, R. (1993). Software tools and mathematics education: The case of statistics. In C. Keitel & K.
Ruthven (Eds.), Learning from computers: Mathematics education and technology (pp. 68–100).
Berlin: Springer-Verlag.
Biehler, R. (1997). Software for learning and for doing statistics. International Statistical Review 65(2),
167–189.
144
DANI BEN-ZVI
Biehler, R., & Steinbring, H. (1991). Explorations in statistics, stem-and-leaf, box plots: Concepts,
justifications, and experience in a teaching experiment (elaborated English version). Bielefeld,
Germany: Author.
Brousseau, G. (1997). Theory of didactical situations in mathematics (Edited and translated by N.
Balacheff, M. Cooper, R. Sutherland, & V. Warfield). Dordrecht, The Netherlands: Kluwer.
Cleveland, W. S. (1994). The elements of graphing data. Murray Hill, NJ: AT&T Bell Laboratories.
Cobb, P. (1999). Individual and collective mathematical learning: The case of statistical data analysis.
Mathematical Thinking and Learning, 1, 5–44.
Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and teaching. The American Mathematical
Monthly, 104(9), 801–823.
Davis, R. B., Maher, C. A., & Noddings, N. (Eds.) (1990). Constructivist views on the teaching and
learning of mathematics. Reston, VA: NCTM.
de Lange, J., Burrill, G., & Romberg, T. (1993). Learning and teaching mathematics in context—the
case: Data visualization. Madison, WI: National Center for Research in Mathematical Sciences
Education.
de Lange, J., & Verhage, H. (1992). Data visualization. Scotts Valley, CA: Wings for Learning.
Department for Education and Employment (1999). Mathematics: The national curriculum for England.
London: Author and Qualifications and Curriculum Authority.
Evensen, D. H, & Hmelo, C. E. (Eds.) (2000). Problem-based learning: A research perspective on
learning interactions. Mahwah, NJ: Erlbaum.
Finzer, B. (2003). Fathom: Dynamic Statistics Software for Deeper Understanding (Version 1.16).
Emeryville, CA: Key Curriculum Press. <http://www.keypress.com/fathom/>
Friel, S. N., Mokros, J. R., & Russell, S. (1992). Used numbers: Middles, means, and in-betweens. Palo
Alto, CA: Dale Seymour.
Gal, I., & Garfield, J. B. (1997). Curricular goals and assessment challenges in statistics education. In I.
Gal & J. B. Garfield (eds.), The assessment challenge in statistics education (pp. 1–13). Amsterdam,
Netherlands: IOS Press.
Garfield, J. (1995). How students learn statistics. International Statistical Review 63(1), 25–34.
Graham, A. (1987). Statistical investigations in the secondary school. Cambridge, UK: Cambridge
University Press.
Greer, B., Yamin-Ali, M., Boyd, C., Boyle, V., & Fitzpatrick, M. (1995). Data handling (six student
books in the Oxford Mathematics series). Oxford, UK: Oxford University Press.
Hershkowitz, R., Dreyfus, T., Schwarz, B., Ben-Zvi, D., Friedlander, A., Hadas, N., Resnick, T., &
Tabach, M. (2002). Mathematics curriculum development for computerized environments: A
designer-researcher-teacher-learner activity. In L. D. English (Ed.), Handbook of international
research in mathematics education (pp. 657–694). London: Erlbaum.
Hunt, D. N. (1995). Teaching Statistical Concepts Using Spreadsheets. In the Proceedings of the 1995
Conference of the Association of Statistics Lecturers in Universities. UK: The Teaching Statistics
Trust.
Kader, G. D., & Perry, M. (1994). Learning statistics. Mathematics Teaching in the Middle School 1(2),
130–136.
Konold, C., & Higgins, T. L. (2003). Reasoning about data. In J. Kilpatrick, G. Martin, & D. Schifter
(Eds.), A research companion to principles and standards for school mathematics (pp. 193–215).
Reston, VA: NCTM.
Konold, C., Pollatsek, A., & Well, A. (1997). Students analyzing data: Research of critical barriers. In J.
Garfield & G. Burrill (Eds.), Research on the role of technology in teaching and learning statistics
(pp. 151–168). Voorburg, The Netherlands: International Statistical Institute.
Lampert. M. (1990). When the problem is not the question and the solution is not the answer:
Mathematical knowing and teaching. American Educational Research Journal, 27, 29–63.
Lappan, G., Fey, J. T., Fitzgerald, W. M., Friel, S. N., & Phillips, E. D. (1996). Connected mathematics
project. Palo Alto, CA: Seymour Publications.
Lovitt, C., & Lowe, I. (1993). Chance and data investigations (Vol. 1 & 2). Carlton, Vic., Australia:
Curriculum Corporation.
Meira, L. R. (1991). Explorations of mathematical sense-making: An activity-oriented view of children’s
use and design of material displays (an unpublished Ph.D. dissertation). Berkeley: University of
California Press.
Ministry of Education (1992). Mathematics in the New Zealand curriculum. Wellington, NZ: Author.
REASONING ABOUT DATA ANALYSIS
145
Monk, G. S. (1988). Students’ understanding of functions in calculus courses. Humanistic Mathematics
Network Journal, 9, 21–27.
Moore, D. S. (1990). Uncertainty. In Lynn Steen (Ed.), On the shoulders of giants: A new approach to
numeracy (pp. 95–137). National Academy of Sciences.
Moore, D. S. (1992). Teaching statistics as a respectable subject. In F. & S. Gordon (Eds.), Statistics for
the 21st century (pp. 14–25). Washington, DC: Mathematical Association of America.
Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistical
Review 65(2), 123–165.
Moore, D. S. (2000). The basic practice of statistics (second edition). New York: Freeman.
Moore, D. S., & McCabe, G. P. (1993). Introduction to the practice of statistics (2nd ed.). New York:
Freeman.
Moschkovich, J. D. (1989). Constructing a problem space through appropriation: A case study of guided
computer exploration of linear functions (an unpublished manuscript, available from the author).
Mosteller, F., & Tukey, J. (1977). Data analysis and regression. Boston: Addison-Wesley.
National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school
mathematics. Reston, VA: Author.
National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics.
Reston, VA: Author.
Perkins, D., & Unger, C. (1999). Teaching and learning for understanding. In C. M. Reigeluth (Ed.),
Instructional-design theories and models (pp. 91–114). Hillsdale, NJ: Erlbaum.
Resnick, L. (1988). Treating mathematics as an ill-structured discipline. In R. Charles & E. Silver (Eds.),
The teaching and assessing of mathematical problem solving (pp. 32–60). Reston, VA: National
Council of Teachers of Mathematics.
Resnick, T., & Tabach, M. (1999). Touring the Land of Oz—Algebra with computers for grade seven (in
Hebrew). Rehovot, Israel: Weizmann Institute of Science.
Rubin, A., & Mokros, J. (1998). Data: Kids, cats, and ads (Investigations in number, data, and space
Series). New York: Seymour Publications.
Schoenfeld, A. H. (1992). Learning to think mathematically: Problem solving, metacognition, and sense
making in mathematics. In D. Grouws (Ed.), Handbook of research on mathematics teaching and
learning (pp. 334–370). New York: Macmillan.
Schoenfeld, A. H. (1994). Some notes on the enterprise (research in collegiate mathematics education,
that is). Conference Board of the Mathematical Sciences Issues in Mathematics Education, 4, 1–19.
Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data handling. In A. J. Bishop, K. Clements, C.
Keitel, J. Kilpatrick, & C. Laborde (eds.), International handbook of mathematics education (Vol. I,
pp. 205–237). Dordrecht, The Netherlands: Kluwer.
TERC (2002). Tabletop. Geneva, IL: Sunburst Technology.
<http://www.terc.edu/TEMPLATE/products/item.cfm?ProductID=39>
Tukey, J. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Velleman, P. (2003). Data Desk (Version 6.2). Ithaca, NY: Data Description Inc.
<http://www.datadesk.com/products/data_analysis/datadesk/>
Velleman, P., & Hoaglin, D. (1981). The ABC’s of EDA: Applications, basics, and computing of
exploratory data analysis. Boston, MA: Duxbury.
Voigt, J. (1995). Thematic patterns of interaction and sociomathematical norms. In P. Cobb & H.
Bauersfeld (Eds.), Emergence of mathematical meaning: Interaction in classroom cultures (pp. 163–
201). Hillsdale, NJ: Erlbaum.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. M. Cole, V.
John-Steiner, S. Scribner, & E. Souberman (Eds.). Cambridge, MA: Harvard University Press.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–265.
Yackel, E., & Cobb, P. (1996). Socio-mathematical norms, argumentation and autonomy in mathematics.
Journal for Research in Mathematics Education, 27(4), 458–477.
Yerushalmy, M., Chazan, D., & Gordon, M. (1990). Mathematical problem posing: Implications for
facilitating student inquiry in classrooms. Instructional Science, 19, 219–245.
Chapter 7
LEARNING TO REASON ABOUT
DISTRIBUTION
Arthur Bakker and Koeno P. E. Gravemeijer
Freudenthal Institute, Utrecht University, the Netherlands
OVERVIEW
The purpose of this chapter is to explore how informal reasoning about distribution
can be developed in a technological learning environment. The development of
reasoning about distribution in seventh-grade classes is described in three stages as
students reason about different representations. It is shown how specially designed
software tools, students’ created graphs, and prediction tasks supported the learning
of different aspects of distribution. In this process, several students came to reason
about the shape of a distribution using the term bump along with statistical notions
such as outliers and sample size.
This type of research, referred to as “design research,” was inspired by that of
Cobb, Gravemeijer, McClain, and colleagues (see Chapter 16). After exploratory
interviews and a small field test, we conducted teaching experiments of 12 to 15
lessons in 4 seventh-grade classes in the Netherlands. The design research cycles
consisted of three main phases: design of instructional materials, classroom-based
teaching experiments, and retrospective analyses. For the retrospective analysis of
the data, we used a constant comparative method similar to the methods of Glaser
and Strauss (Strauss & Corbin, 1998) and Cobb and Whitenack (1996) to
continually generate and test conjectures about students’ learning processes.
DATA SET AS AN AGGREGATE
An essential characteristic of statistical data analysis is that it is mainly about
describing and predicting aggregate features of data sets. Students, however, tend to
conceive a data set as a collection of individual values instead of an aggregate that
has certain properties (Hancock, Kaput, & Goldsmith, 1992; Konold & Higgins,
2002; Ben-Zvi & Arcavi, 2001; Ben-Zvi, Chapter 6). An underlying problem is that
middle-grade students generally do not see “five feet” as a value of the variable
147
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 147–168.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
148
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
“height,” but as a personal characteristic of, say, Katie. In addition to this view,
students should learn to disconnect the measurement value from the object or person
measured and consider data against a background of possible measurement values.
They should furthermore develop a notion of distribution, since that is an organizing
conceptual structure with which they can conceive the aggregate instead of just the
individual values (Cobb, 1999; Petrosino, Lehrer, & Schauble, 2003).
These learning goals formed the motivation to explore the possibilities for
students in early secondary education with little or no prior statistical knowledge to
develop an informal understanding of distribution. Such understanding could then be
the basis for more formal statistics in higher grades. The main question in this study
is therefore: How can seventh-grade students learn to reason about distribution in an
informal way?
DISTRIBUTION
To answer this question, we first analyze the relation between data and
distribution. Distinguishing between data as individual values and distribution as a
conceptual entity, we examine aspects of both data sets and distributions such as
center, spread, density, and skewness (Table 1). Measures of center include mean,
median, and midrange. Spread can be quantified with, for instance, range, standard
deviation, and interquartile range. The aspects and measures in the table should not
be seen as excluding each other; outliers and extreme values, for instance, influence
skewness, density, spread, and even most measures of center.
Table 1. Between data and distribution
distribution
(conceptual entity)
center
mean, median,
midrange, …
spread
range, standard
deviation, interquartile range, …
density
(relative) frequency,
majority, quartiles
skewness
position majority of
data
data
(individual values)
This structure can be read upward and downward. The upward perspective is
typical for novices in statistics: Students tend to see individual values, which they
can use to calculate, for instance, the mean, median, range, or quartiles. This does
not automatically imply that they see mean or median as a measure of center or as
representative of a group (Mokros & Russell, 1995; Konold & Pollatsek, Chapter 8).
In fact, students need a notion of distribution before they can sensibly choose
LEARNING TO REASON ABOUT DISTRIBUTION
149
between such measures of center (Zawojewski & Shaughnessy, 2000). Therefore,
students need to develop the downward perspective as well: conceiving center,
spread, and skewness as characteristics of a distribution, and looking at data with a
notion of distribution as an organizing structure or a conceptual entity. Experts in
statistics can easily combine the upward and downward perspectives. We might say
that the upward perspective leads to a frequency distribution of a data set. In the
downward perspective, we typically use probability distributions such as the normal
distribution to model data.
The table shows that the concept of distribution has a complex structure, but this
concept is also part of a larger structure consisting of big ideas such as variation and
sampling (Reading & Shaughnessy, Chapter 9; Watson, Chapter 12). Without
variation, there is no distribution, and without sampling there are mostly no data.
We therefore chose to deal informally and coherently with all these big ideas at the
same time with distribution in a central position. As Cobb (1999) notes, focusing on
distribution as a multifaceted end goal of instruction might bring more coherence in
the statistics curriculum. The question is how. Our answer is to focus on the
informal aspects of shape.
The shape of a distribution is influenced by various statistical aspects. A high
peak, for example, is caused by a high frequency of a certain class and long tails on
the left or right with the hill out of center indicate skewed distributions. This implies
that by reasoning with informal terms about the shape of a distribution, students may
already reason with aspects of that distribution. And indeed, students in this study
used informal words to describe density (crowded, empty, piled up, clumped, busy),
spread (spread out, close together), and shape (hill, bump). If students compare the
height distributions of two different grades, they might realize that the graphs have
the same shape but are shifted in location (Biehler, 2001). And they might see that
samples of different sizes still have similar shapes. We envisioned that reasoning
with shapes forms the basis for reasoning about distributions.
METHODOLOGY AND SUBJECTS
To answer the main question of how students can develop a notion of
distribution, we carried out developmental research, which is also called design
research (Freudenthal, 1991; Gravemeijer, 1994; Edelson, 2002; Cobb & McClain,
Chapter 16). Design research typically involves the design of instructional materials,
teaching experiments, and retrospective analyses. In line with the principles of
Realistic Mathematics Education (Freudenthal, 1991; Gravemeijer, 1994) and the
National Council of Teachers of Mathematics (NCTM) Standards (2000), we looked
for ways to guide students in being active learners dealing with increasingly
sophisticated means of support.
To assist students in exploring data and developing the concept of distribution,
we decided to use some specially designed Minitools (see Cobb, 1999). These web
applets were developed by reasoning backward from the intended end goal of
reasoning about distribution to possible starting points. One aspect of distribution,
150
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
shape, can be inferred from stacked dot plots. To understand what dots in a dot plot
stand for, students need to realize that a dot represents a value on some variable.
One way to help students develop this insight is to let them start with case-value
bars, which range from 0 to the corresponding value on the horizontal axis. We
presume that bars representing values are closer to students’ daily life reality than
dots on an axis, because they are used to bar graphs and because horizontal bars are
natural ways to symbolize certain variables such as the braking distance of cars, the
life span of batteries, or the wingspan of birds. For that reason, each case in Minitool
1 (Figure 1) is signified by a bar whose relative length corresponds to the value of
the case, and each case in Minitool 2 (Figure 2) is signified by a dot in a dot plot.
Figure 1. Minitool 1 (sorted by size and color).
To identify a baseline of what Dutch seventh-grade students already know about
statistics and how easily they would solve statistical problems using the two
Minitools, we interviewed 26 students about these issues. The students had
encountered no statistics before except the arithmetic mean and bar graphs. They
had almost no problems in reading off values from the Minitools, but they focused
on individual data values (Section 2). We then did a small field test and conducted
teaching experiments in 4 seventh-grade classes, which worked through a complete
sequence of 12 to 15 lessons of 50 minutes each. The experiments were carried out
during the school year 1999–2000, in a public school in a small town near Utrecht
(the Netherlands) that prepared about 800 students for university (vwo) or higher
LEARNING TO REASON ABOUT DISTRIBUTION
151
vocational education (havo). At that time about 15% of the Dutch students went to
the vwo level, 20% to the havo level, about 40% to the mavo level (for middle
vocational education), and the remaining 25% to lower vocational education (in the
meantime the last two levels have been merged). These percentages indicate that the
learning abilities of the vwo and havo students of our teaching experiments were
above average.
Figure 2. Minitool 2 (split colors and with vertical value bars).
The collected data include audio recordings, student work, field notes, and final
tests in all classes, as well as videotapes and pretests in the last two experiments (see
Table 2). The pretests were meant to find out if students already knew what we
wanted them to learn (they did not).
An essential part of the data corpus was a set of mini-interviews that were held
during lessons. Mini-interviews varied from about 20 seconds to 4 minutes and were
meant to find out what concepts and graphs meant for the students. We realize that
this influenced their learning, because the mini-interviews often stimulated
reflection. In our view, however, the validity of the research was not in danger: Our
aim was to find out how students could learn to reason with distribution, not whether
teaching the sequence in other seventh-grade classes would lead to the same results.
For the retrospective analysis of the fourth teaching experiment, we have read
the transcripts, watched the videotapes, and formulated conjectures on students’
152
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
learning based on the transcript and video episodes. The generated conjectures were
being tested at the other episodes and the rest of the collected data (student work,
field observations, and tests) in the next round of analysis (triangulation). Then the
whole generating and testing process was repeated. This method resembles Glaser
and Strauss’s constant comparative method (Strauss & Corbin, 1998; Cobb and
Whitenack, 1996). Important transcript fragments, including those in this chapter,
have been discussed with colleagues (peer examination).
Table 2. Overview of subjects, teaching experiments, data collection, number of lessons, and
levels of education
Subjects
(grade 7)
26 students
(1999)
Class A (25)
Class F (27)
Class E (28)
Class C (23)
(2000)
Class B (23)
12 classes
(2000–2002)
Type of Experiment
Exploratory interviews
(15 minutes for two
students)
Exploratory field test
First teaching
experiment
Second teaching
experiment
Third teaching
experiment
Fourth teaching
experiment
Implementation
Data Collection
audio
No. of
Lessons
—
4
12
mavo,
havo,
vwo
havo
vwo
15
vwo
12
havo
12
havo
144
havo
and
vwo
student work, final test,
field notes, audio
idem plus pretest and
video
e-mail reports of two
teachers, field notes from
incidental visits
Level
Furthermore, we have identified patterns of student answers that were similar in
all teaching experiments, and categorized the evolving learning trajectory in three
stages according to students’ reasoning with the representations used. The sections
describing stages 1 and 2 describe observations that were similar for all four
observed classes. In the first stage, students worked with graphs in which data were
represented by horizontal bars (Minitool 1, Figure 1). In the second stage, from
lesson 5 to 12, students mainly worked with dot plots (Minitool 2, Figure 2). In the
third stage students used both Minitools and came to reason with bumps; the
examples stem from the second teaching experiment. The students in this class had
good learning abilities (vwo) and had 15 lessons—three more than in the other
classes. The specific stages began to overlap each other when we started to stimulate
comparison of different graphs during the last two teaching experiments.
STAGE 1—DATA ARE REPRESENTED BY BARS
The aim of the first activities was to let students reason about different aspects of
distributions in an informal way such as about majority, center, extreme values,
LEARNING TO REASON ABOUT DISTRIBUTION
153
spread-out-ness, and consistency. In the second lesson, for example, students had to
prepare reports to Consumer Reports (a consumers’ journal) on the quality of two
battery brands. They were given a data set of 10 battery life spans of two brands in
Minitool 1; using different computer options, they could sort the data and split the
data, for instance of the two brands. In the beginning they used the vertical value bar
(Figure 3) to read off values, but later sometimes to estimate the mean visually.
Figure 3. Estimating the mean of brand D with the movable vertical value bar (life span in
hours).
During this battery activity, students in all teaching experiments could already
reason about aspects of distributions. “Brand K has outliers, but you have more
chance for a good one,” was one answer. “Brand D is more reliable, since you know
that it will last more than 80 hours,” was another. This notion of reliability formed a
good basis for talking about spread. Our observations resemble those of Cobb
(1999) and Sfard (2000), who analyzed students’ spontaneous use of the notion of
“consistency.”
The activities with Minitool 1 afforded more than informal reasoning about
majority, outliers, chance, and reliability; they also supported the visual estimation
of the mean (Figures 3 and 4). After this strategy had spontaneously emerged in the
exploratory interviews, we incorporated instructional activities to evoke this strategy
in other classes as well (Bakker, 2003). Minitool 1 supported the strategy with the
movable vertical value bar. Students said that they cut off the longer bars, and gave
the bits to the shorter bars. Several students in different classes could explain that
this approach was legitimate: The total stays the same, and the mean is the total
154
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
divided by the number. When students said that brand D is better because its mean is
higher, they used the mean to say how good the brand is. In that case, the mean is
not just a calculation on a collection of data, but refers to a whole subset of one
brand. As we intended, they learned to use the mean as a representative value for a
data set and to reason about the brand instead of the individual data values.
Figure 4. Scribblings on a transparency during class discussions after estimating means of
both brands. The mean of brand D is slightly higher than that of K.
To assess students’ understanding of distribution aspects and to establish a
tighter relationship between informal statistical notions and graphs, we decided to
“reverse” this battery task. In the last two teaching experiments, during the fourth
lesson, we asked students to invent their own data according to certain
characteristics such as “brand A is bad but reliable; brand B is good but unreliable;
brand C has about the same spread as brand A, but it is the worst of all brands.”
Many students produced a graph similar to the one in Figure 5 (in this case, the
variation of C is less than that of A). A sample response was:
Why is brand A better. Because it lives long. And it has little spread. Brand B is good
but unreliable. Because it has much spread. But it lives long. Brand C has little spread
but the life span is not very long.
LEARNING TO REASON ABOUT DISTRIBUTION
155
Figure 5. Invented data set according to certain features: Brand A is bad but reliable; brand B
is good but unreliable; brand C has about the same spread as brand A, but it is the worst of all.
With hindsight, we have come to see this back-and-forth movement between
interpreting graphs and constructing graphs according to statistical notions as an
important heuristic for instructional design in data analysis, for a number of reasons:
•
•
•
•
Students can express ideas with graphs that they cannot express in words
(Lemke, 2003). If students invent their own data and graphs, teachers and
researchers can better assess what students actually understand.
If students think of characteristics such as “good but not reliable,” the lack of
data prevents them from focusing on individual data, because it is
cognitively impossible to imagine many individual data points. With this
reverse activity, we create the need for a conceptual unity that helps in
imagining a collection of data with a certain property. The notion of
distribution serves that purpose (Section 3).
In many schoolbooks, students mainly interpret ready-made graphs (Friel et
al., 2001; Moritz, Chapter 10). And if students have to make graphs, the goal
is too often just to learn how to produce a particular graph. De Lange,
Burrill, Romberg, & van Reeuwijk (1993) and Meira (1995) strongly
recommend letting students invent their own graphs. We may assume that
students’ own graphs are meaningful and functional for them.
The importance of the back-and-forth movement between data and graphs
(or different graphs) is also indicated by the research on symbolizing.
Steinbring (1997), for example, distinguishes reference systems and symbol
systems. Students interpret a symbol system in the light of a better-known
reference system. Reference systems are therefore relatively well known and
symbol systems relatively unknown. In learning the relationship between a
symbol system and a reference system, students must go back and forth
between the two systems. A next step can then be that students use the
symbol system they have just learned to reason with (Minitool 1, for
example) as a reference system for a new symbol system (Minitool 2, for
example), and so on.
156
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
From the examples of the first stage, it is clear that students informally reasoned
about different aspects of distribution from the very start. They argued about the
mean (how good the battery is), spread (reliability), chance for outliers or extreme
values, and where the majority is (skewness). Without the bar representation the
students would probably not have developed a compensating strategy for finding the
mean. Their reasoning, however, was bound to one representation and two contexts.
STAGE 2—DOTS REPLACE BARS
Our next aim was to let students reason about shapes of distributions in suitable
representations and in different contexts. Additionally, we strove for quantification
of informal notions such as frequency and the majority and to prepare students for
using conventional aggregate plots such as histograms and box plots.
As mentioned in the previous section, Minitool 1 can be seen as a reference
system for the new symbol system of Minitool 2. When solving problems with
Minitool 1, the students reasoned with the endpoints of the bars. In Minitool 1,
students could hide the bars, which they sometimes preferred, because “it is better
organized.” The dot plot of Minitool 2 can be obtained by hiding the bars of
Minitool 1 and imaginatively dropping the endpoints on the horizontal axis or on the
other dots that prevent them from dropping further down (cf. Wilkinson, 1999).
Note that the dots are stacked and do not move sideways to fill up white areas in the
graph (Figure 6). The advantages of this dot plot representation are that it is easy to
interpret, it comes closer to conventional representations of distributions than
Minitool 1, and students can organize data in ways that come close to histogram and
box plot, for instance.
Minitool 2 has more options to organize data than Minitool 1. Apart from sorting
by size and by subgroup (color), students can also group data into their own groups,
two equal groups (for the median), four equal groups (for a box plot, Figure 7a),
equal interval width (for a histogram, Figure 7b), and fixed group size (Figure 6b).
This last option turned out to be useful for stimulating reasoning about density.
A particular statistical problem that students solved with Minitool 2 was the one
on jeans sizes. Students had to report to a factory the percentage of each size that
should be made, based on a data set of the waist measurements (in inches) of 200
men. This activity, typically done during the ninth lesson, was meant to distract
students’ attention away from the mean and toward the whole distribution.
Furthermore, it could be an opportunity to let students reason about absolute and
relative frequencies.
We expected that students would reason about several aspects of distribution
when comparing different grouping options. The option of fixed group size (Figures
6b and 6c) typically evoked remarks such as “with the thin ones [the narrow bins]
you know that there are many dots together.” We interpret such expressions as
informal reasoning about density, which we see as a key aspect of distribution.
Many students used the four equal groups option to support their conclusion that
“you have to make a lot of jeans in sizes 34–36, and less of 44–46.” Generally, a
LEARNING TO REASON ABOUT DISTRIBUTION
157
skeptical question was needed to provoke more exact answers: “If the factory hired
you for $1,000, do you think the factory would be satisfied with your answer?” Most
students ended up with the fixed interval option and a table with percentages, that is,
relative frequencies.
(a)
(b)
(c)
Figure 6. (a) Minitool 2 with jeans data set (waist size in inches, n = 200). (b) Fixed group
size with 20 data points per group. (c) Minitool 2 with “hide data” function.
158
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
Figure 7. (a) Four equal group option with and without data. Box plot overlay was added after
these seventh-grade teaching experiments. (b) Fixed interval width option with and without
data. Histogram overlay was added after these seventh-grade teaching experiments.
An instructional idea that emerged during the last teaching experiment was that
of “growing samples.” Discussing and predicting what would happen if we added
more data appeared to lead to reasoning about several aspects of distribution in a
coherent way. For the background to this activity, we have to go back to a problem
from the beginning of the instructional unit:
In a certain hot air balloon basket, eight adults are allowed [in addition to the driver].
Assume you are going to take a ride with a group of seventh-graders. How many
seventh-graders could safely go into that balloon basket if you only consider weight?
LEARNING TO REASON ABOUT DISTRIBUTION
159
This question was meant to let students think about variation of weight,
sampling, and representativeness of the average. A common solution in all classes
was that students estimated an average or a typical weight for both adults and
children. Some used the ratio of those numbers to estimate the number of children
allowed, but most students calculated the total weight allowed and divided that by
the average student weight. The student answers varied from 10 to 16.
This activity formed the basis for a class discussion on the reliability of the
estimated weights, during which we asked for a method of finding more reliable
numbers. A student suggested weighing two boys and two girls. The outcome of the
discussion was that the students decided to collect weight data from the whole class.
(In the second teaching experiment, they also collected height data.)
In the next lesson, we first showed the sample of four weight data in Minitool 2
(Figure 8a) and asked what students expected if we added the rest of the data.
Students thought that the mean would be more precise. Because we did not want to
focus on the mean, we asked about the shape and the range. Some students then
conjectured that the range would be larger, and others thought the graph would grow
higher. After showing the data for the whole class (Figure 8b), we asked what would
happen if we added the data for two more classes (Figure 8c). In this way, extreme
values, spread, and shape became topics of discussion. The graphs that students
made to predict the shape if sample size were doubled tended to be smoother than
the graphs students had seen in Minitool 2 (Figure 8d). In our interpretation,
students started to see a pattern in the data—or in Konold and Pollatsek’s words, a
“signal in the noise” (Chapter 8). We concluded that stimulating reasoning about
distribution by “growing samples” is another useful heuristic for instructional design
in statistics education.
A conjecture about students’ evolving notion of distribution that was confirmed
in the retrospective analyses was that students tend to divide unimodal distributions
into three groups of low, “average,” and high values. We saw this conceptual
grouping into three groups for the first time in the second teaching experiment when
we asked what kind of graph students expected when they collected height data.
Daniel did three trials (Figure 9). During his second trial, he said: “You have smaller
ones, taller ones, and about average.” After the third trial he commented: “There are
more around the average.” Especially in the third trial, we clearly see his conceptual
organization into three groups, which is a step away from focusing on individual
data points.
One step further is when students think of small, average, tall, and “in between.”
When in the final test students had to sketch their class when ordered according to
height, Christa drew Figure 10 and wrote: “There are 3 smaller ones, about 10
average, 3 to 4 taller, and of course in between.”
The “average” group, the majority in the middle, seems to be more meaningful
to students than the single value of the mean. Konold and colleagues (2002) call
these ranges in the middle of distributions modal clumps. Our research supports their
view that these modal clumps may be suitable starting points for informal reasoning
about center, spread, and skewness. When growing samples, students might even
learn to see such aspects of distribution as stable features of variable processes.
160
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
Figure 8. Growing samples (weight data in kg): (a) Four students; (b) one class; (c) three
classes; (d) a student’s smoother prediction graph of larger sample.
Figure 9. Three prediction trials of height data; the second and third show three groups.
LEARNING TO REASON ABOUT DISTRIBUTION
161
Figure 10. Class ordered by height. Christa’s explanation: “There are 3 smaller ones, about 10
average, 3 to 4 taller, and of course in between.”
STAGE 3—SYMBOLIZING DATA AS A “BUMP”
Though students in the first two teaching experiments started to reason with
majorities and modal clumps in the second stage, they did not explicitly reason with
shape. We had hoped that they would reason with “hills,” as was the case in the
teaching experiment of Cobb, Gravemeijer, and McClain (Cobb, 1999), but they did
not. A possible reason is that their teaching experiment lasted 34 lessons, whereas
ours lasted only 12 or 15 lessons. In the second teaching experiment, we decided to
try something else. In line with the reasons to let students invent their own data
(Section 5), we asked students to invent their own graphs of their own data. As a
follow-up of the balloon activity mentioned earlier, the students had to make a graph
for the balloon rider, which she could use in deciding how many students she could
safely take on board.
The students of the second teaching experiment drew various graphs. The
teacher focused the discussion on two graphs, namely, Michiel’s and Elleke’s
(Figure 11).
Figure 11. Michiel’s graph (left) and Elleke’s graph.
162
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
The shorter bars represent students’ weights; the lightest bars signify girls’
weights. Though all students used the same data set, Michiel’s graph on a
transparency does not exactly match the values in Elleke’s graph on paper. Michiel’s
graph is more like a rough sketch.
Michiel’s graph is especially interesting, since it offered the opportunity to talk
about shape. Michiel explained how he got the dots as follows. (Please note that a
translation of ungrammatical spoken Dutch into written English does not sound very
authentic.)
Michiel:
Look, you have roughly, averagely speaking, how many students had that
weight and there I have put a dot. And then I have left [y-axis] the number
of students. There is one student who weighs about 35 [kg], and there is
one who weighs 36, and two who weigh 38 roughly.
And so on: the dot at 48, for example, signifies about four students with weights
around 48. After some other graphs had been discussed, including that of Elleke, the
teacher asked the following question.
Teacher: What can you easily see in this graph [by Michiel]?
Laila:
Well, that the average, that most students in the class, uhm, well, are
between 39 and, well, 48.
Teacher: Yes, here you can see at once which weight most students in this class
roughly have, what is about the biggest group. Just because you see this
bump here. We lost the bump in Elleke’s graph.
It was the teacher who used the term bump for the first time. Although she had
tried to talk about shapes earlier, this was the first time the students picked it up. As
Laila’s answer indicates, Michiel’s graph helped her to see the majority of the
data—between 39 and 48 kg. This “average” or group of “most students” is an
instance of what Konold and colleagues (2002) call a modal clump. Teachers and
curriculum designers can use students’ informal reasoning with clumps as
preparation for using the average as a representative value for the whole group, for
example.
Here, the teacher used the term bump to draw students’ attention to the shape of
the data. By saying that “we lost the bump in Elleke’s graph,” she invited the
students to think about an explanation for this observation. Nadia reacted as follows.
Nadia:
The difference between … they stand from small to tall, so the bump, that
is where the things, where the bars [from Elleke’s graph] are closest to one
another.
Teacher: What do you mean, where the bars are closest?
Nadia:
The difference, the endpoints [of the bars], do not differ so much with the
next one.
LEARNING TO REASON ABOUT DISTRIBUTION
163
Eva added to Nadia’s remarks:
Eva:
If you look well, then you see that almost in the middle, there it is straight
almost and uh, yeah that [teacher points at the horizontal part in Elleke’s
graph].
Teacher: And that is what you [Nadia] also said, uh, they are close together and
here they are bunched up, as far as […] weight is concerned.
Eva:
And that is also that bump.
These episodes demonstrate that, for the students, the bump was not merely a
visual characteristic of a certain graph. It signified a relatively large number of data
points with about the same value—both in a hill-type graph and in a value-bar
graph. For the students, the term bump signified a range where there was a relatively
high density of data points. The bump even became a tool for reasoning, as the next
episode shows, when students revisited the battery task as one of the final tasks.
Laila:
Ilona:
But then you see the bump here, let’s say [Figure 3].
This is the bump [pointing at the straight vertical part of the lower 10
bars].
Researcher: Where is that bump? Is it where you put that red line [the vertical value
bar]?
Laila:
Yes, we used that value bar for it […] to indicate it, indicate the bump.
If you look at green [the upper ten], then you see that it lies further, the
bump. So we think that green is better, because the bump is further.
The examples show that some students started to reason about density and shape
in the way intended. However, they still focused on the majority, the modal clump,
instead of the whole distribution. This seemed to change in the 13th lesson of the
second teaching experiment
In that lesson, we discovered that asking students to predict and reason without
available data was helpful in fostering a more global view of data. A first example
of such a prediction question is what a graph of the weights of eighth-graders would
look like, as opposed to one of seventh-graders. We hoped that students would shift
the whole shape instead of just the individual dots or the majority.
Teacher:
Luuk:
Guyonne:
Teacher:
Michiel:
What would a graph of the weights of eighth-graders look like?
I think about the same, but another size, other numbers.
The bump would be more to the right.
What would it mean for the box plots?
Also moves to the right. That bump in the middle is in fact just the box
plot, which moves more to the right.
It could well be that Luuk reasoned with individual numbers, but he thought that
the global shape would look the same. Instead of talking about individual data
points, Guyonne talked about a bump, in singular, shifted to the right. Michiel
related to the box plot as well, though he just referred to the box of the box plot.
Another prediction question also led to reasoning about the whole shape, this
time in relation to other statistical notions such as outliers and sample size. Note that
164
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
students used the term outliers for extreme values, not for values that are
questionable.
Researcher:
Elleke:
Researcher:
Michiel:
Albertine:
If you would measure all seventh-graders in the city instead of just
your class, how would the graph change, or wouldn’t it change?
Then there would come a little more to the left and a little more to the
right. Then the bump would become a little wider, I think. [She
explained this using the term outliers.]
Is there anybody who does not agree?
Yes, if there are more children, than the average, so the most, that also
becomes more. So the bump stays just the same.
I think that the number of children becomes more and that the bump
stays the same.
In this episode, Elleke relates shape to outliers; she thinks that the bump grows
wider if the sample grows. Michiel argues that the group in the middle also grows
higher, which for him implies that the bump keeps the same shape. Albertine’s
answer is interesting in that she seems to think of relative frequency: for her the
shape of the distribution seems to be independent of the sample size. If she thought
of absolute frequency she would have thought that the bump would be much higher.
Apparently, the notion of a bump helped these students to reason about the shape of
the distribution in hypothetical situations. In this way, they overcame the problem of
seeing only individual data points and developed the notion of a bump, which served
as a conceptual unity.
There are several reasons why predictions about shape in such hypothetical
situations can help to foster understanding of shape or distribution. First, if students
predict a graph without having data, they have to reason more globally with a
property in their mind. Konold and Higgins (2002) write that with the individuals as
the foci, it’s difficult to see the forest for the trees. Our conclusion is that we should
ask questions about the forest, or predict properties of other forests—which we
consider another heuristic for statistics education. This heuristic relates to the
cognitive limitations mentioned in Section 5: If there are no available data and
students have to predict something on the basis of some conceptual characteristic, it
is impossible to imagine many individual data points.
A second reason has to do with the smoothness of graphs. Cobb, McClain, and
Gravemeijer (2003) assume that students can more easily reason about hills if the
hills are smooth enough. We found evidence that the graphs students predict tend to
be smoother than the graphs of real data, and we conjecture that reasoning with such
smoother graphs helps students to see the shape of a distribution through the
variation or, in other words, the signal through the noise (Konold & Pollatsek,
Chapter 8). If they do so, they can model data with a notion of distribution, which is
the downward perspective we aimed for (Section 3).
A last example illustrates how several students came to reason about
distributions. These two girls were not disturbed by the fact that distributions did not
look like hills in Minitool 1. The question they dealt with was whether the
distributions of the battery brands looked normal or skewed, where normal was
informally defined as “symmetrical, with the median in the middle and the majority
LEARNING TO REASON ABOUT DISTRIBUTION
165
close to the median.” The interesting point is that they used the term hill to indicate
the majority (see Figure 3), although it looked straight in the case-value bar graph.
This indicates that the hill was not a visual tool; it had become a conceptual tool in
reasoning about distributions.
Albertine:
Nadia:
Albertine:
Oh, that one [battery brand D in Figure 3] is normal […].
That hill.
And skewed if like here [battery brand K] the hill [the straight part] is
here.
DISCUSSION
The central question of this chapter was how seventh-grade students could learn
to reason about distributions in informal ways. In three stages, we showed how
certain instructional activities, supported by computer tool use and the invention of
graphs, stimulated students to reason about aspects of distributions. After a summary
of the results we discuss limitations of this study and implications for future
research.
When solving statistical problems with Minitool 1, students used informal words
such as majority, outliers, reliability, and spread out. The examples show that
students reasoned about aspects of distribution from the very start of the experiment.
The students invented data sets in Minitool 1 that matched certain characteristics of
battery brands such as “good but not reliable.” We argued that letting students
invent their own data sets could stimulate them to think of a data set as a whole
instead of individual data points (heuristic 1). The bar representation of Minitool 1
stimulated a visual compensation strategy of finding the mean, whereas many
students found it easier to see the spread of the data in Minitool 2.
When working with Minitool 2, students developed qualitative notions of more
advanced aspects of distribution such as frequency, classes, spread, quartiles,
median, and density. The dot plot representation in combination with the options to
structure data into two equal groups, four equal groups, fixed group size, and fixed
interval width supported the development of an understanding of the median, box
plot, density, and histogram respectively. Like Konold and colleagues (2002), we
expect that modal clumps are useful to help students reason with center and other
distribution aspects. Growing samples is a promising instructional activity to let
students reason with stable features of variable processes (heuristic 2). The big ideas
of sampling and distribution can thus be developed coherently, but how this could
be done is a topic of future research.
In the third stage, students started to reason with bumps in relation to statistical
notions such as majority, outliers, and sample size in hypothetical situations and in
relation to different graphs. We argued that predictions about the shape and location
of distributions in hypothetical situations are useful to foster a more global view and
to let students see the signal in the noise (heuristic 3).
166
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
IMPLICATIONS
The results of this research study suggest that it is important to provide
opportunities for students to contribute their own ideas to the learning process,
which requires much discussion and interaction during class. We believe that formal
measures such as median and quartiles should be postponed until intuitive notions
about distribution have first been developed. We also encourage teachers to allow
students to use less than precise statistical definitions as students develop their
reasoning, and then make a transition to more specific definitions as students are
able to comprehend these details. We are convinced that teachers should try to learn
about how students are reasoning about distribution by listening and observing as
well as by gathering assessment data. A type of assessment that we found useful
asked students to create a graph representing statistical information. One such task
that was very effective asked students to make graphs that were compatible with a
short story with both informal and statistical notions related to running practice.
There were no restrictions on the type of graph students could use. We had
deliberately incorporated characteristics in the story that ranged from easy (the
fastest runner needed 28 minutes) to difficult (the spread of the running times at the
end was much smaller than in the beginning but the range was still pretty big). This
is the item we used:
A seventh grade is going to train for running 5 km. To track their improvement
they want to make three graphs. One before training starts, one halfway through, and
one after ten training sessions. Draw the graphs that belong to the following story:
•
•
•
Before training started some students were slow and some were already very
fast. The fastest ran the 5 km in 28 minutes. The spread between the other
students was large. Most of them were on the slow side.
Halfway through, the majority of the students ran faster, but the fastest had
improved his time only a little bit, as had the slowest.
After the training sessions had finished, the spread of the running times was
much smaller than in the beginning, but the range was still pretty big. The
majority of the students had improved their times by about 5 minutes. There
were still a few slow ones, but most of the students had a time that was
closer to the fastest runner than in the beginning.
We found that students were able to represent many elements in their graphs and
we learned more about their thinking and reasoning by examining their
constructions.
Although we conclude that it is at least possible for seventh-graders to develop
the kind of reasoning about distribution that is shown in this chapter, it should be
stressed that the students in these experiments had above-average learning abilities
and had been stimulated to reflect during mini-interviews. Other students probably
need more time or need to be older before they can reason about distribution in a
similar way.
LEARNING TO REASON ABOUT DISTRIBUTION
167
Another limitation of this study is that the examples of the third stage were to a
certain extent unique for the second teaching experiment. What would have
happened if Michiel had not made his “bump” graph? This research does not
completely answer that question (there was some reasoning with bumps in the third
and fourth teaching experiment), but it shows what the important issues are and
which heuristics might be useful for instructional activities.
In addition, we noticed that making predictions graphs without having data is not
a statistical practice that automatically emerges from doing an instructional
sequence such as the one described here. We concluded this from observations
during the two subsequent school years, when two novice teachers used the
materials in 12 other seventh-grade classes. When we asked prediction questions,
the students seemed confused because they were not used to such questions. An
implication for teaching is that establishing certain socio-mathematical norms and
certain practices (Cobb & McClain, Chapter 16) are as important as suitable
computer tools, carefully planned instructional activities, and skills of the teacher to
orchestrate class discussions.
These teachers also reported that some of the statistical problems we had used or
designed were too difficult and not close enough to the students’ world of
experience. The teachers also needed much more time than we used in the first year,
and they found it difficult to orchestrate the class discussions. We acknowledge that
the activities continually need to be adjusted to local contingencies, that the miniinterviews probably had a learning effect, and that the teachers needed more
guidance for teaching such a new topic. Hence, another question for future research
is what kind of guidance and skills teachers need to teach these topics successfully.
NOTE
We thank the teachers Mieke Abels, Maarten Jasper, and Mirjam Jansens for joining this
project. The research was funded by the Netherlands Organization for Scientific Research
under number 575-36-003B. The opinions expressed in this chapter do not necessarily reflect
the views of the Organization.
REFERENCES
Bakker, A. (2003). The early history of average values and implications for education. Journal of
Statistics Education, 11(1). Online:
http://www.amstat.org/publications/jse/v11n1/bakker.html
Biehler, R (2001). Developing and assessing students’ reasoning in comparing statistical distributions in
computer-supported statistics courses. In C. Reading (Ed.), Proceedings of the Second International
Research Forum on Statistical Reasoning, Thinking, and Literacy (SRTL-2). Armidale, Australia:
University of New England.
Ben-Zvi, D., & Arcavi, A. (2001). Junior high school students’ construction of global views of data and
data representations. Educational Studies in Mathematics, 45(1), 35–65.
Cobb, P. (1999). Individual and collective mathematical development: The case of statistical data
analysis. Mathematical Thinking and Learning, 1(1), 5–43.
168
ARTHUR BAKKER AND KOENO P. E. GRAVEMEIJER
Cobb, P., McClain, K., & Gravemeijer, K. P. E. (2003). Learning about statistical covariation. Cognition
and Instruction 21(1), 1–78.
Cobb, P., & Whitenack, J. W. (1996). A method for conducting longitudinal analyses of classroom
videorecordings and transcripts. Educational Studies in Mathematics, 30(3), 213–228.
de Lange, J., Burrill, G., Romberg, T., & van Reeuwijk, M. (1993). Learning and testing mathematics in
context: The case—data visualization. Madison, WI: University of Wisconsin, National Center for
Research in Mathematical Sciences Education.
Edelson, D. C. (2002). Design research: What we learn when we engage in design. Journal of the
Learning Sciences, 11(1), 105–121.
Freudenthal, H. (1991). Revisiting mathematics education: China lectures. Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Friel, S. N., Curcio, F. R., & Bright, G. W. (2001). Making sense of graphs: Critical factors influencing
comprehension and instructional implications. Journal for Research in Mathematics Education, 32(2),
124–158.
Gravemeijer, K. P. E. (1994). Developing realistic mathematics education. Utrecht, The Netherlands: CD
Bèta Press.
Hancock, C., Kaput, J. J., & Goldsmith, L. T. (1992). Authentic enquiry with data: Critical barriers to
classroom implementation. Educational Psychologist, 27(3), 337–364.
Konold, C., & Higgins, T. (2002). Highlights of related research. In S. J. Russell & D. Schifter & V.
Bastable (Eds.), Developing mathematical ideas: Working with data (pp. 165–201). Parsippany, NJ:
Seymour.
Konold, C., Robinson, A., Khalil, K., Pollatsek, A., Well, A. D., & Wing, R., et al. (2002). Students’ use
of modal clumps to summarize data. In B. Phillips (Ed.), Developing a Statistically Literate Society:
Proceedings of the International Conference on Teaching Statistics [CD-ROM], Cape Town, South
Africa, July 7–12, 2002.
Lemke, J. L. (2003). Mathematics in the middle: Measure, picture, gesture, sign, and word. In M.
Anderson, A. Sáenz-Ludlow, S. Zellweger, & V. V. Cifarelli (Eds.), Educational perspectives on
mathematics as semiosis: From thinking to interpreting to knowing (pp. 215–234). Ottawa, Ontario:
Legas Publishing.
Meira, L. (1995). Microevolution of mathematical representations in children’s activity. Cognition and
Instruction, 13(2), 269–313.
Mokros, J., & Russell, S. J. (1995). Children’s concepts of average and representativeness. Journal for
Research in Mathematics Education, 26(1), 20–39.
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics.
Reston, VA: NCTM.
Petrosino, A. J., Lehrer, R., & Schauble, L. (2003). Structuring error and experimental variation as
distribution in the fourth grade. Mathematical Thinking and Learning, 5(2&3), 131-156.
Sfard, A. (2000). Steering (dis)course between metaphors and rigor: Using focal analysis to investigate an
emergence of mathematical objects. Journal for Research in Mathematics Education, 31(3), 296–327.
Steinbring, H. (1997). Epistemological investigation of classroom interaction in elementary mathematics
teaching. Educational Studies in Mathematics, 32(1), 49–92.
Strauss, A. & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for
developing grounded theory. Thousand Oaks, CA: Sage.
Wilkinson, L. (1999). Dot plots. American Statistician, 53(3), 276–281.
Zawojewski, J. S., & Shaughnessy, J. M. (2000). Mean and median: Are they really so easy?
Mathematics Teaching in the Middle School, 5(7), 436–440.
Chapter 8
CONCEPTUALIZING AN AVERAGE
AS A STABLE FEATURE OF A NOISY PROCESS
1
Clifford Konold and Alexander Pollatsek
University of Massachusetts, Amherst, USA
INTRODUCTION
Until
recently, the study of statistics in the United States was confined to the
university years. Following recommendations made by the National Council of
Teachers of Mathematics (NCTM, 1989; 2000), and building on the groundbreaking Quantitative Literacy series (see Scheaffer, 1991), statistics and data
analysis are now featured prominently in most mathematics curricula and are also
appearing in the K–12 science standards and curricula (Feldman, Konold, &
Coulter, 2000; National Research Council, 1996). Concurrently, university-level
introductory statistics courses are changing (e.g., Cobb, 1993; Gordon & Gordon,
1992; Smith, 1998) in ways that pry them loose from the formulaic approach copied
with little variation in most statistics textbooks published since the 1950s.1 At all
levels, there is a new commitment to involve students in the analysis of real data to
answer practical questions. Formal inference, at the introductory levels, is taking a
less prominent place as greater emphasis is given to exploratory approaches (à la
Tukey, 1977) to reveal structure in data. This approach often capitalizes on the
power of visual displays and new graphic-intensive computer software (Biehler,
1989; Cleveland, 1993; Konold, 2002).
Despite all the criticisms that we could offer of the traditional introductory
statistics course, it at least has a clear objective: to teach ideas central to statistical
1
This article originally appeared as “Data Analysis as the Search for Signals in Noisy
Processes,” in the Journal for Research in Mathematics Education, 33 (4), 259–289,
copyright 2002, and is reproduced here with the permission of the National Council of
Teachers of Mathematics. All rights reserved. The writing of this article was supported by
National Science Foundation (NSF) grants REC-9725228 and ESI-9818946. Opinions
expressed are those of the authors and not necessarily those of NSF.
169
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 169–199.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
170
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
inference, including the Law of Large Numbers and the Central Limit Theorem. For
the students now learning more exploratory forms of data analysis, the objective is
less clear. There are various proposals about which core ideas we should target in
early instruction in data analysis. Wild and Pfannkuch (1999), for example, view
variation as the core idea of statistical reasoning and propose various subconstructs
that are critical to learning to reason about data. Recently designed and tested
materials for 12- to 14-year-olds aim at developing the idea of a distribution (Cobb,
1999; Cobb, McClain, & Gravemeijer, 2003). According to the supporting research,
this idea entails viewing data as “entities that are distributed within a space of
possible values,” in which various statistical representations—be they types of
graphical displays or numerical summaries—are viewed as different ways of
structuring or describing distributions (see Cobb, 1999, pp. 10–11). Others have
argued the centrality of the idea of data as an aggregate—an emergent entity (i.e.,
distribution) that has characteristics not visible in any of the individual elements in
the aggregate (Konold & Higgins, 2003; Mokros & Russell, 1995).
In this article, we build on these ideas of variation, distribution, and aggregate to
offer our own proposal for the core idea that we believe should guide statistics and
data analysis instruction, beginning perhaps as early as age 8. In short, that idea
involves coming to see statistics as the study of noisy processes—processes that
have a signature, or signal, which we can detect if we look at sufficient output.
It might seem obvious that a major purpose of computing statistics such as the
mean or median is to represent such a “signal” in the “noise” of individual data
points. However, this idea is virtually absent from our curricula and standards
documents. Neither NCTM’s Principles and Standards for School Mathematics
(2000) nor the American Association for the Advancement of Science (AAAS),
Science for All Americans (1989), explicitly describes an average as anything like a
signal. Our search through several middle school and high school mathematics
curricula has not uncovered a single reference to this idea. Nor does it appear in
earlier research investigating students’ ideas about averages and their properties
(Mokros & Russell, 1995; Pollatsek, Lima, & Well, 1981; Strauss & Bichler, 1988).
The idea is evident, however, in a few recent studies. In their investigation of
statistical reasoning among practicing nurses, Noss, Pozzi, and Hoyles (1999) refer
briefly to this interpretation; one nurse the authors interviewed characterized a
person’s average blood pressure as “what the normal range was sort of settling down
to be.” The idea of signal and noise is also evident in the work of Biehler (1994),
Wild and Pfannkuch (1999), and Wilensky (1997).
OVERVIEW
We begin by describing how statisticians tend to use and think about averages as
central tendencies. We then contrast this interpretation with various other
interpretations of averages that we frequently encounter in curriculum materials.
Too frequently, curricula portray averages as little more than summaries of groups
of values.2 Although this approach offers students some rationale for summarizing
CONCEPTUALIZING AN AVERAGE
171
group data (for example, to see what is “typical”), we will argue that it provides
little conceptual basis for using such statistical indices to characterize a set of data,
that is, to represent the whole set. To support this claim, we review research that has
demonstrated that although most students know how to compute various averages
such as medians and means, few use averages to represent groups when those
averages would be particularly helpful—to make a comparison between two groups.
We recommend beginning early in instruction to help students develop the idea of
central tendency (or data as a combination of signal and noise). To explore the
conceptual underpinnings of the notion of central tendency, we briefly review its
historical development and then examine three types of statistical processes. For
each process, we evaluate the conceptual difficulty of regarding data from that
process as a combination of signal and noise. Finally, we outline some possible
directions for research on student thinking and learning.
In this article, we focus our discussion on averages, with an emphasis on means
(using the term average to refer to measures of center collectively, including the
mean, median, and mode). By focusing on averages, we risk being misunderstood
by those who have recently argued that instruction and public discourse have been
overemphasizing measures of center at the expense of variability (e.g., Shaughnessy,
Watson, Moritz, & Reading, 1999; also see Gould, 1996). A somewhat related but
more general critique comes from proponents of Tukey’s (1977) exploratory data
analysis (EDA) who advocate that, rather than structure our curricula around a
traditional view of inferential statistics, we should instruct young students in more
fluid and less theory-laden views of analysis (e.g., Biehler, 1989; 1994).
Those concerned that measures of center have been overemphasized as well as
proponents of EDA may misread us as suggesting that instruction should aim at
teaching students to draw conclusions by inspecting a limited number of simple
summaries such as means. In fact, we agree wholeheartedly with Shaughnessy et al.
(1999) and with EDA proponents that we should be teaching students to attend to
general distributional features such as shape and spread, and to look at distributions
in numerous ways for insights about the data. We do not view the decision to focus
our analysis here on measures of center as being at odds with their concerns. Our
decision is partly pragmatism and partly principle.
On the pragmatic side, we wanted to simplify our exposition. Almost all
statistical measures capture group properties, and they share an important property
with good measures of centers: They stabilize as we collect more data. These
measures include those of spread, such as the standard deviation, interquartile range,
percentiles, and measures of skewness. But switching among these different
measures would needlessly complicate our exposition.
The deeper reason for focusing our discussion on measures of center is that we
believe such measures do have a special status, particularly for comparing two sets
of data. Here, some proponents of teaching EDA may well disagree with us. Biehler
(1994), for example, maintained that the distribution should remain the primary
focus of analysis and that we should regard an average, such as the mean, as just one
of many of its properties. We will argue that the central idea should be that of
searching for a signal and that the idea of distribution comes into better focus when
it is viewed as the “distribution around” a signal. Furthermore, we claim that the
172
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
most basic questions in analyzing data involve looking at group differences to
determine whether some factor has produced a difference in the two groups.
Typically, the most straightforward and compelling way to answer these questions is
to compare averages. We believe that much of statistical reasoning will elude
students until they understand when a comparison of two averages makes sense and,
as a corollary, when such a comparison is misleading. If they do not understand this,
students’ explorations of data (i.e., “data snooping”) will almost certainly lack
direction and meaning.
SIGNALS IN NOISY PROCESSES
A statistician sees group features such as the mean and median as indicators of
stable properties of a variable system—properties that become evident only in the
aggregate. This stability can be thought of as the certainty in situations involving
uncertainty, the signal in noisy processes, or, the descriptor we prefer, central
tendency. Claiming that modern-day statisticians seldom use the term central
tendency, Moore (1990, p. 107) suggests that we abandon the phrase and speak
instead of measures of “center” or “location.” But we use the phrase here to
emphasize conceptual aspects of averages that we fear are often lost, especially to
students, when we talk about averages as if they were simply locations in
distributions.
By central tendency we refer to a stable value that (a) represents the signal in a
variable process and (b) is better approximated as the number of observations
grows.3 The obvious examples of statistics used as indicators of central tendency are
averages such as the mean and median. Processes with central tendencies have two
components: (a) a stable component, which is summarized by the mean, for
example; and (b) a variable component, such as the deviations of individual scores
around an average, which is often summarized by the standard deviation.
It is important to emphasize that measures of center are not the only way to
characterize stable components of noisy processes. Both the shape of a frequency
distribution and global measures of variability, for example, also stabilize as we
collect more data; they, too, give us information about the process. We might refer
to this more general class of characteristics as signatures of a process. We should
point out, however, that all the characteristics that we might look at, including the
shape and variability of a distribution, are close kin to averages. That is, when we
look at the shape of a particular distribution, we do not ordinarily want to know
precisely how the frequency of values changes over the range of the variable.
Rather, we tame the distribution’s “bumpiness.” We might do this informally by
visualizing a smoother underlying curve or formally by computing a best-fit curve.
In either case, we attempt to see what remains when we smooth out the variability.
In a similar manner, when we employ measures such as the standard deviation or
interquartile range, we strive to characterize the average spread of the data in the
sample.
CONCEPTUALIZING AN AVERAGE
173
Implicit in our description of central tendency is the idea that even as one speaks
of some stable component, one acknowledges the fundamental variability inherent in
that process and thus its probabilistic nature. Because of this, we claim that the
notion of an average understood as a central tendency is inseparable from the notion
of spread. That average and variability are inseparable concepts is clear from the
fact that most people would consider talking about the average of a set of identical
values to be odd. In addition, it is hard to think about why a particular measure of
center makes sense without thinking about its relation to the values in the
distribution (e.g., the mean as the balance point around which the sum of the
deviation scores is zero, or the median as the point where the number of values
above equals the number of values below).
Not all averages are central tendencies as we have defined them above. We
could compute the mean weight of an adult lion, a Mazda car, and a peanut, but no
clear process would be measured here that we could regard as having a central
tendency. One might think that the mean weight of all the lions in a particular zoo
would be a central tendency. But without knowing more about how the lions got
there or their ages, it is questionable whether this mean would necessarily tell us
anything about a process with a central tendency. Quetelet described this distinction
in terms of true means of distributions that follow the law of errors versus arithmetic
means that can be calculated for any assortment of values, such as our hodgepodge
above (see Porter, 1986, p. 107).
Populations versus Processes
In the preceding description, we spoke of processes rather than populations. We
contrast these two ways of thinking about samples or batches of data, as shown in
Figure 1. When we think of a sample as a subset of a population (see the left
graphic), we see the sample as a piece allowing us to guess at the whole: The
average and shape of the sample allow us perhaps to estimate the average and shape
of the population. If we wanted to estimate the percentage of the U.S. population
favoring gun control, we would imagine there being a population percentage of
some unknown value, and our goal would be to estimate that percentage from a
well-chosen sample. Thinking in these terms, we tend to view the population as
static and to push to the background questions about why the population might be
the way it is or how it might be changing.
From the process perspective (as depicted in the right graphic of Figure 1), we
think of a population or a sample as resulting from an ongoing, dynamic process, a
process in which the value of each observation is determined by a large number of
causes, some of which we may know and others of which we may not. This view
moves to the foreground questions about why a process operates as it does and what
factors may affect it. In our gun control example, we might imagine people’s
opinions on the issue as being in a state of flux, subject to numerous and complex
influences. We sample from that process to gauge the net effect of those influences
at a point in time, or perhaps to determine whether that process may have changed
over some time period.
174
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
For many of the reasons discussed by Frick (1998), we have come to prefer
thinking of samples (and populations, when they exist) as outputs of processes.4 One
reason for this preference is that a process view better covers the range of statistical
situations in which we are interested, many of which have no real population (e.g.,
weighing an object repeatedly). Another reason for preferring the process view is
that when we begin thinking, for example, about how to draw samples, or why two
samples might differ, we typically focus on factors that play a role in producing the
data. That is, we think about the causal processes underlying the phenomena we are
studying. Biehler (1994) offered a similar analysis of the advantages of viewing data
as being produced by a probabilistic mechanism—a mechanism that could be altered
to produce predictable changes in the resultant distribution. Finally, viewing data as
output from a process highlights the reason that we are willing to view a collection
of individual values as in some sense “the same” and thus to reason about them as a
unity: We consider them as having been generated by the same process.
Figure 1. Data viewed as a sample of a population (left) versus data viewed as output of a
noisy process (right).
This notion of process is, of course, inherent in the statistician’s conception of a
population, and we expect that most experts move between the process and
population perspectives with little difficulty or awareness.5 However, for students
new to the study of statistics, the choice of perspective could be critical. To illustrate
more fully what we mean by reasoning about processes and their central tendencies,
we discuss recent results of the National Assessment of Educational Progress
(NAEP).
CONCEPTUALIZING AN AVERAGE
175
NAEP Results as Signals of Noisy Processes
NAEP is an assessment of student capabilities in Grades 4, 8, and 12, conducted
every 4 years in the United States. On the 1998 assessment, eighth graders averaged
264 on the reading component.6 What most people want to know, of course, is how
this compares to the results from previous assessments. In this case, the mean had
increased 4 points since the 1994 assessment. The 12th graders had also gained 4
points on average since 1994, and the fourth graders, 3 points. Donahue, Voelkl,
Campbell, and Mazzeo (1999) interpreted these differences as evidence that
children’s reading scores were improving.
Reports such as this are now so commonplace that we seldom question the logic
of this reasoning. But what is the rationale in this case for comparing group means
and for taking the apparently small difference between those means seriously? We
will argue that to answer these questions from a statistical perspective requires a
well-formed idea of a central tendency.
Interpreted as a central tendency, the mean of 264 is a measure of a complex
process that determines how well U.S. children read at a given point in time. An
obvious component of this process is the reading instruction that children receive in
school. Another component of the process is the behavior of adults in the home:
their personal reading habits, the time they spend reading to their children, and the
kind and quantity of reading material they have in the home. A third component
consists of factors operating outside the home and school, including determinants of
public health and development, such as nutrition levels and the availability and use
of prenatal care; genetic factors; and the value placed on literacy and education by
local communities and the society at large.
Using a statistical perspective, we often find it useful to regard all these
influences together (along with many others that we may be unaware of) as a global
process that turns out readers of different capabilities. In the sense that we cannot
know how these various factors work together in practice to produce results, the
global process is a probabilistic one, unpredictable at the micro level. However,
even though readers produced by this process vary unpredictably in their
performance, we can regard the entire process at any given time as having a certain
stable capability to produce competent readers. The average performance of a large
sample of readers produced by this process is one way to gauge the power of that
process (or its propensity) to produce a literate citizenry. As Mme. de Staël
explained in 1820, “events which depend on a multitude of diverse combinations
have a periodic recurrence, a fixed proportion, when the observations result from a
large number of chances” (as quoted in Hacking, 1990, p. 41). And because of the
convergence property of central tendencies, the larger the data set, the better the
estimate we expect our sample average to be of the stable component of the process.
Given the huge sample size in the reading example (about 11,000 eighth graders)
and assuming proper care in composing the sample, we expect that the sample mean
of 264 is very close to this propensity. Assuming that the 1994 mean is of equal
quality, we can be fairly certain that the difference between these two means reflects
a real change in the underlying process that affects reading scores. Note that the
176
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
important inference here does not concern a sampling issue in the narrow sense of
randomly sampling from a fixed known population. That is, assuming no changes in
the system, we would expect next year’s mean to come out virtually the same even
though the population of eighth graders would consist of different individuals.
Focusing on the process rather than the population helps make the real intent of our
question clear.
The mean is not necessarily the best single number to serve as an index of such a
change. The median is also a good index, and changes in the 25th percentile, the
percent above some minimal value, the standard deviation, or the interquartile range
could also be valid indicators of changes in the underlying educational process. As
long as a process remains stable, we expect the mean, or any of these other statistical
indices obtained from that process, to remain relatively unchanged from sample to
sample. Conversely, when a statistic from a large sample changes appreciably, we
assume that the process has changed in some way. Furthermore, these expectations
are crucial in our attempts to evaluate efforts to alter processes. In the case of
reading, we might introduce new curricula, run an advertising campaign
encouraging parents to read to their children, expand the school free lunch program
in disadvantaged areas, and upgrade local libraries. If we do one or more of these
things and the mean reading scores of an appropriate sample of children increases,
we have grounds for concluding that we have improved the process for producing
readers. Again, we emphasize that though we have specified the mean in this
example, we might be as happy using the median or some other measure of center.
The above example, however, indicates a way in which a measure of center is
often special. That is, the practical issue in which we are usually interested is
whether, overall, things are getting better or worse, a question most naturally
phrased in terms of a change of center. It is much harder to think of examples where
we merely want to increase or decrease the variability or change the shape of the
distribution. We could imagine an intervention that tried only to narrow the gap
between good and poor readers, in which case we would compare measures of
spread, such as the standard deviation. Although there are questions that are
naturally phrased in terms of changes in variability or distribution shape, such
questions are typically second-order concerns. That is, we usually look at whether
variability or shape have changed to determine whether we need to qualify our
conclusion about comparing measures of center. Even in situations where we might
be interested in reducing variability, such as in income, we are certainly also
interested in whether this comes at the expense of lowering the average.
DIFFERENT INTERPRETATIONS OF AVERAGES
We have argued that statisticians view averages as central tendencies, or signals
in variable data. But this is not the only way to think about them. In Table 1, we list
this interpretation along with several others, including viewing averages as data
reducers, fair shares, and typical values. We consider an interpretation to be the goal
that a person has in mind when he or she computes or uses an average. It is the
CONCEPTUALIZING AN AVERAGE
177
answer that a person might give to the question, “Why did you compute the average
of those values?” Some of these interpretations are described in Strauss and Bichler
(1988) as “properties” of the mean. Mokros and Russell (1995) described other
interpretations as “approaches” that they observed elementary and middle school
students using.7 In Table 1, we also provide an illustrative problem context for each
interpretation. Of course, any problem could be interpreted from a variety of
perspectives. But we chose these particular examples because their wording seemed
to suggest a particular interpretation.
Table 1. Examples of contexts for various interpretations of average
Interpretation/
meaning
Example context
Data reduction
Ruth brought 5 pieces of candy, Yael brought 10 pieces, Nadav brought
20, and Ami brought 25. Can you tell me in one number how many
pieces of candy each child brought? (From Strauss & Bichler, 1988)
Fair share
Ruth brought 5 pieces of candy, Yael brought 10 pieces, Nadav brought
20, and Ami brought 25. The children who brought many gave some to
those who brought few until everyone had the same number of candies.
How many candies did each girl end up with? (Adapted from Strauss &
Bichler, 1988)
Typical value
The numbers of comments made by eight students during a class period
were 0, 5, 2, 22, 3, 2, 1, and 2. What was the typical number of
comments made that day? (Adapted from Konold & Garfield, 1992)
Signal in noise
A small object was weighed on the same scale separately by nine
students in a science class. The weights (in grams) recorded by each
student were 6.2, 6.0, 6.0, 15.3, 6.1, 6.3, 6.2, 6.15, 6.2. What would you
give as the best estimate of the actual weight of this object? (Adapted
from Konold & Garfield, 1992)
Data Reduction
According to this view, averaging is a way to boil down a set of numbers into
one value. The data need to be reduced because of their complexity—in particular,
due to the difficulty of holding the individual values in memory. Freund and Wilson
(1997) draw on this interpretation to introduce averages in their text: “Although
distributions provide useful descriptions of data, they still contain too much detail
for some purposes” (p. 15). They characterize numerical summaries as ways to
further simplify data, warning that “this condensation or data reduction may be
accompanied by a loss of information, such as information on the shape of the
distribution” (p. 16). One of the high school students interviewed by Konold,
178
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
Pollatsek, Well, and Gagnon (1997) used this as a rationale for why she would look
at a mean or median to describe the number of hours worked by students at her
school:
We could look at the mean of the hours they worked, or the median. … It would go
through a lot to see what every, each person works. I mean, that’s kind of a lot, but
you could look at the mean. … You could just go through every one … [but] you’re
not going to remember all that.
Fair Share
The computation for the mean is often first encountered in elementary school in
the context of fair-share problems, with no reference to the result being a mean or
average. Quantities distributed unevenly among several individuals are collected and
then redistributed evenly among the individuals. The word average, in fact, derives
from the Arabic awariyah, which translates as “goods damaged in shipping.”
According to Schwartzman (1994), the Italians and French appropriated this term to
refer to the financial loss resulting from damaged goods. Later, it came to specify
the portion of the loss borne by each of the many people who invested in the ship.
Strauss and Bichler (1988) provided 11 problems as examples of tasks that they used
in their research, and we would regard all but three of them as involving the idea of
fair share. We can view many commonly encountered rates, such as yearly
educational expenditure per student, as based on the fair-share idea, since we tend to
think most naturally about these rates as distributing some total quantity equally
over some number of units. In such cases, we do not ordinarily think of the
computed value in relation to each individual value; nor do we worry, when
computing or interpreting this fair share, about how the component values are
distributed or whether there are outliers.
Typical Value
Average as a typical score is one of the more frequently encountered
interpretations in current precollege curricula. What appears to make values typical
for students are their position (located centrally in a distribution of values) and/or
their frequency (being the most frequent or even the majority value). Younger
students favor the mode for summarizing a distribution, presumably because it can
often satisfy both of these criteria (Konold & Higgins, 2003). Mokros and Russell
(1995) speculated that those students they interviewed who used only modes to
summarize data may have interpreted typical as literally meaning the most
frequently occurring value. Researchers have also observed students using as an
average a range of values in the center of a distribution (Cobb, 1999; Konold,
Robinson, Khalil, Pollatsek, Well, Wing, & Mayr, 2002; Mokros & Russell, 1995;
Noss, Pozzi, & Hoyles, 1999; Watson & Moritz, 1999). These “center clumps” are
located in the heart of the distribution and often include a majority of the
CONCEPTUALIZING AN AVERAGE
179
observations. In this respect, these clumps may serve as something akin to a mode
for some students.
Signal in Noise
According to this perspective, each observation is an estimate of an unknown but
specific value. A prototypical example is repeatedly weighing an object to determine
its actual weight. Each observation is viewed as deviating from the actual weight by
a measurement error, which is viewed as “random.” The average of these scores is
interpreted as a close approximation to the actual weight.
Formal Properties of Averages
Many school tasks involving averages seem unrelated to any of the particular
interpretations we describe above. For example, finding the average of a set of
numbers out of context seems intended only to develop or test students’
computational abilities. Other school tasks explore formal properties of averages,
which we also would not view as directly related to particular interpretations. Such
tasks include those meant to demonstrate or assess the idea that (a) the mean of a set
of numbers is simply related to the sum of those numbers, (b) the mean is a balance
point and the median a partition that divides the cases into two equal-sized groups,8
(c) the mean and median lie somewhere within the range of the set of scores, and (d)
the mean or median need not correspond to the value of an actual observation. In
their longitudinal study of the development of young students’ understandings of
average, Watson and Moritz (2000) focused in particular on these relations, asking
students, for example, how the mean number of children per family could possibly
be 2.3 rather than a whole number. We consider most of the properties enumerated
by Strauss and Bichler (1988, p. 66) to be formal relations of this sort. We are not
arguing that these are unimportant or trivial ideas, but rather that they are usually not
tied to particular interpretations of averages.
Applying Interpretations to the Problem of Group Comparison
In the NAEP example, we explored the notion of central tendency and showed
how it provides a basis for using averages—means, in that case—to compare
groups. Because the mean is a very stable estimator in large samples, we can use it
to track changes in a process even though the output from that process is variable
and unpredictable in the short run.
What bases do the other interpretations of average provide for evaluating the two
NAEP results by comparing means? Consider the data reduction interpretation: Data
are distilled to a single value, presumably because of our inability to consider all the
values together. We argue that nothing in this interpretation suggests that any new
information emerges from this process; indeed, a considerable loss of information
seems to be the price paid for reducing complexity. By this logic, it would seem that
180
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
as a data set grows larger, any single-value summary becomes less representative of
the group as increasingly more information is lost in the reduction process.
The typical-value interpretation is nearer to the central tendency interpretation
since it may involve the idea that the value, in some sense, represents much of the
data in the group. However, as with the data reduction interpretation, it is not clear
why one ideally would like to have typical values from large samples rather than
from small ones. Indeed, it would seem as reasonable to regard a typical score as
becoming less (rather than more) representative of a group as that group became
larger and acquired more deviant values.
The fair-share interpretation may provide some basis for using means to
compare groups. One could think of the mean in the 1998 NAEP data as the reading
score that all students sampled that year would have if reading ability were divided
evenly among all the students sampled. Based on this reasoning, one might
reasonably conclude that the 1998 group had a higher reading score than the 1994
group. Cortina, Saldanha, and Thompson (1999) explored the use of this notion by
seventh- and eighth-grade students and concluded that these students could use the
idea of fair share to derive and compare means of unequal groups. However, we
would guess that many students would regard such reasoning skeptically unless it
were physically possible to reallocate quantities in the real-world situation. If, for
example, we were thinking about the number of boxes of cookies sold by different
scout troops (as in the study by Cortina et al.), redistributing the cookie boxes
evenly makes some sense. In contrast, if we were reasoning about mean weight,
height, or IQ of a number of individuals, we would have to think of these pounds,
inches, or IQ points being shared metaphorically.9
Furthermore, we are skeptical about whether the fair-share interpretation is a
statistical notion at all. It seems to ignore, in a sense, the original distribution of
values and to attend only to the total accumulation of some amount in a group.
Consider, for example, the value we would compute to decide how the different
numbers of candies brought by various children to a party could be equally
redistributed among the children (see Table 1). In this context, the particulars about
how the candies were originally distributed seem irrelevant. That is, the number that
constitutes a fair share is not viewed as a representation or summary of the original
distribution but rather as the answer to the question of how to divide the candies
equitably.
In conclusion, whereas some of the interpretations may be useful to summarize a
group of data, it is quite another thing to take a statistic seriously enough as to use it
to represent the entire group, as one must do when using averages to compare
groups. We claim that viewing an average as a central tendency provides a strong
conceptual basis for, among other things, using averages to compare two groups,
whereas various other interpretations of average, such as data reducers and typical
values, do not.
We acknowledge that our analysis of these alternative interpretations has been
cursory and that it should thus be regarded skeptically. However, our primary
purpose is to highlight some of the questions that should be asked in exploring
different approaches to introducing students to averages. Furthermore, there is good
evidence that whatever interpretations students do have of averages, those
CONCEPTUALIZING AN AVERAGE
181
interpretations usually do not support using averages to compare one group to
another. Many studies have demonstrated that even those who know how to
compute and use averages in some situations do not tend to use them to compare
groups.
Students’ Tendency Not to Use Averages to Compare Groups
Gal, Rothschild, and Wagner (1990) interviewed students of ages 8, 11, and 14
about their understanding of how means were computed and what they were useful
for. They also gave the students nine pairs of distributions in graphic form and asked
them to decide whether the groups were different or not. Only half of the 11- and
14-year-olds who knew how to compute the mean of a single group (and, also, to
some extent, how to interpret it) went on to use means to compare two groups.
Hancock, Kaput, and Goldsmith (1992) and, more recently, Watson and Moritz
(1999) have reported similar findings.
This difficulty is not limited to the use of means. Bright and Friel (1998)
questioned 13-year-old students about a stem-and-leaf plot that showed the heights
of 28 students who did not play basketball. They then showed them a stem-and-leaf
plot that included these data along with the heights of 23 basketball players. This
latter plot is shown in Figure 2. Heights of basketball players were indicated in bold
type, as they are here. Students had learned how to read this type of display and had
no difficulty reading values from it. Asked about the “typical height” in the single
distribution of the non–basketball players, the students responded by specifying
middle clumps (e.g., 150–160 cm), a reasonable group summary. Yet, shown the
plot with both distributions, they could not generalize this method or find another
way to determine “How much taller are the basketball players than the students who
did not play basketball?”
We found similar difficulties when we interviewed four high school seniors
(ages 17–18) who had just completed a yearlong course in probability and statistics
(Biehler, 1997; Konold et al., 1997). During the course, the students had frequently
used medians (primarily in the context of box plot displays) as well as means to
make group comparisons. However, during a postcourse interview in which they
were free to use whatever methods of comparison seemed appropriate, they seldom
used medians or means for this purpose. Instead, they tended to compare the number
of cases in each group that had the same value on the dependent variable. For
example, to decide if males were taller than females, they might inspect the sample
for all individuals who were 6 feet tall and argue that males were taller because there
were more males than females of that height. In making these comparisons, students
typically did not attend to the overall number of individuals in the two groups (in
this case, to the overall number of males vs. females). Other researchers, including
Cobb (1999) and Watson and Moritz (1999), have reported students using this same
“slicing” technique over a range of different problems to compare two groups.
182
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
Note. The row headed by 13 (the stem) contains four cases (leaves)—three students of 138 centimeters
and a fourth student of 139 centimeters.
Figure 2. Stem-and-leaf plot of heights of students and basketball players (boldface) from
“Helping Students Interpret Data,” by G. Bright and S. N. Friel, in Reflections on Statistics:
Learning, Teaching, and Assessment in Grades K–12 (p. 81), edited by S. P. Lajoie, 1998,
Mahwah, NJ: Lawrence Erlbaum Associates. Copyright 1998 by Lawrence Erlbaum
Associates.
In short, even though instruction in statistics usually focuses on averages, many
students do not use those measures of central tendency when they would be
particularly helpful—to make comparisons between groups composed of variable
elements. We suggest that this pattern is symptomatic of students’ failure to interpret
an average of a data set as saying something about the entire distribution of values.
To address this problem instructionally, we believe that we should be encouraging
students early in statistics instruction to think of averages as central tendencies or
signals in noisy processes. We acknowledge that this is a complex idea and one that
is particularly difficult to apply to the type of processes that we often have students
investigating. We explore these conceptual difficulties below.
THREE TYPES OF PROCESSES AND THEIR CONCEPTUAL CHALLENGES
Hints about the cognitive complexity of central tendency are found in the
historical account of its development. It was Tycho Brache in the late 1500s who
introduced the use of means as central tendencies to astronomy (Plackett, 1970). He
used them to address a problem that had long troubled astronomers: What to take as
the position of a star, given that the observed coordinates at a particular time tended
to vary from observation to observation. When early astronomers began computing
means of observations, they were very cautious, if not suspicious, about whether and
CONCEPTUALIZING AN AVERAGE
183
when it made sense to average observations. In fact, before the mid-eighteenth
century, they would never combine their own observations with those obtained from
another astronomer. They were fearful that if they combined data that had anything
but very small errors, the process of averaging would multiply rather than reduce the
effect of those errors (Stigler, 1986, p. 4). Taking the mean of multiple observations
became the standard solution only after it had been determined that the mean tended
to stabilize on a particular value as the number of observations increased.
It was another hundred years before Quetelet began applying measures of central
tendency to social and human phenomena (Quetelet, 1842). The idea of applying
means to such situations was inspired partly by the surprising observation that
national rates of birth, marriage, and suicides—events that at one level were subject
to human choice—remained relatively stable from year to year. Some, including
Arbuthnot and De Moivre, had taken these stable rates as evidence of supernatural
design. Quetelet explained them by seeing collections of individual behaviors or
events as analogous to repeated observations. Thus, he regarded observing the
weights of 1,000 different men—weights that varied from man to man—as
analogous to weighing the same man 1,000 times, with the observed weight varying
from trial to trial. The legitimacy of such an analogy, of course, has been a heated
controversy in statistics. Even at the time, Quetelet’s ideas brought stiff rebukes
from thinkers such as Auguste Comte, who thought it ludicrous to believe that we
could rise above our ignorance of values of individual cases simply by averaging
many of them (Stigler, 1986, p. 194). To Comte, statistics applied to social
phenomena was computational mysticism.
We think that the way these early thinkers reacted to different applications of the
mean is not merely a historical accident but instead says something about the “deep
structure” of these different applications. To explore the challenges of learning to
think about data as signal and noise, we examine the metaphor in the context of
three types of statistical processes: repeated measures, measuring individuals, and
dichotomous events.
Repeated Measures
Consider weighing a gold nugget 100 times on a pan balance, a prototypical
example of repeated measurement. It almost goes without saying that the purpose of
weighing the nugget is to determine its weight. But how does one deal with the fact
that the observed weight varies from trial to trial? We assume that statisticians and
nonstatisticians alike would regard these fluctuations as resulting from errors in the
measurement process. But given this variation, how should we use the 100
measurements to arrive at the object’s weight? Should all the measurements be
used? Perhaps not, if they are all not equally accurate. A novice might attempt to
deal with this question by trying to separate the 100 measurements into two classes:
those that are truly accurate versus those that are not. The problem then becomes
how to tell which observations are truly accurate, because the actual weight is not
known.
184
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
One aspect of this situation that makes using a mean of the observations
particularly compelling is that, conceptually, we can separate the signal from the
noise. Because we regard an object as having some unknown but precise weight, it
is not a conceptual leap to associate the mean of several weighings with this actual
weight, while attributing the trial-by-trial variations to a distinctly different thing:
chance error produced by inaccuracies of the measurement instrument and by the
process of reading values from it. Indeed, we can also regard each individual
weighing as having two components—a fixed component determined by the actual
weight of the nugget and a variable component attributable to the imperfect
measurement process.
The relative clarity of this example hinges on our perception that the weight of
the nugget is a real property of the nugget. A few philosophers might regard it
(possibly along with the nugget itself) as a convenient fiction. But to most of us, the
weight is something real that the mean weight is approximating closely and that
individual weighings are approximating somewhat less closely. Another reason that
the idea of central tendency is compelling in repeated measurement situations is that
we can easily relate the mean to the individual observations as well. To help clarify
why this is so, we will make some of our assumptions explicit.
We have been assuming that the person doing the weighing is careful and that
the scale is unbiased and reasonably accurate. Given these assumptions, we expect
that the variability of the weighings would be small and that the frequency
histogram of observations would be single-peaked and approximately symmetric. If
instead we knew that the person had placed the nugget on different parts of the
balance pan, read the dial from different angles, or made errors in transcribing the
observations, we would be reluctant to treat the mean of these numbers as a central
tendency of the process. We would also be hesitant to accept the mean as a central
tendency if the standard deviation was extremely large or if the histogram of weights
was bimodal. In the ideal case, most observations would be close to the mean or
median and the distribution would peak at the average, a fact that would be more
apparent with a larger data set because the histogram would be smoother. In this
case, we could easily interpret the sample average as a good approximation to a
signal or a central tendency and view the variability around it as the result of random
error.
These assumptions about the procedure and the resulting data may be critical to
accepting the mean of the weighings as a central tendency, but they are not the only
things making that interpretation compelling. As indicated earlier, we maintain that
the key reason the mean observation in this example is relatively easy to accept as a
central tendency is that we can view it as representing a property of the object while
viewing the variability as a property of a distinctly independent measurement
process. That interpretation is much harder to hold when—rather than repeatedly
measuring an attribute of a single object—we measure an attribute of many different
objects, taking one measurement for each object and averaging the measurements.
CONCEPTUALIZING AN AVERAGE
185
Measuring Individuals
Consider taking the height of 100 randomly chosen adult men in the United
States. Is the mean or median of these observations a central tendency? If so, what
does it represent? Many statisticians view the mean in this case as something like
the actual or true height of males in the United States (or in some subgroup). But
what could a statement like that mean?
For several reasons, an average in this situation is harder to view as a central
tendency than the average in the repeated measurement example. First, the gold
nugget and its mass are both perceivable. We can see and heft the nugget. In
contrast, the population of men and their average height are not things we can
perceive as directly. Second, it is clear why we might want to know the weight of
the nugget. But why would we want to know the average height of a population of
men? Third, the average height may not remain fixed over time, because of factors
such as demographic changes or changes in diet. Finally, and perhaps most
important, we cannot easily compartmentalize the height measurements into signal
and noise. It seems like a conceptual leap to regard each individual height as partly
true height, somehow determined from the average of the population, and partly
random error determined from some independent source other than measurement
error.
For all of these reasons, it is hard to think about the average height of the group
of men as a central tendency. We speculate, however, that it is somewhat easier to
regard differences between the averages of two groups of individual measurements
as central tendencies. Suppose, for example, we wanted to compare the average
height of U.S. men to the average height of (a) U.S. women or (b) men from
Ecuador. We might interpret the difference between averages as saying something in
the first case about the influence of genetics on height and in the second, about the
effects of nutrition on height. When making these comparisons, we can regard the
difference in averages as an indicator of the “actual effect” of gender or of nutrition,
things that are easier to imagine wanting to know about even if they are difficult to
observe directly.10
Some support for this speculation comes from Stigler (1999), who claims that
Quetelet created his infamous notion of the “average man” not as a tool to describe
single distributions, but as a method for comparing them: “With Quetelet, the
essential idea was that of comparison—the entire point was that there were different
average men for different groups, whether categorized by age or nationality, and it
was for the study of the nature and magnitude of those differences that he had
introduced the idea” (p. 61). Although we concede that the notion of a “true” or
“actual” value is still a bit strained in these comparison cases, we believe that one
needs some approximation to the idea of true value to make meaningful
comparisons between two groups whose individual elements vary. To see why, let
us look more closely at the comparison of men versus women.
Suppose we compute a mean or median height for a group of U.S. men and
another for a group of U.S. women. Note that the act of constructing the hypothesis
that gender partly determines height requires us to conceive of height as a process
186
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
influenced by various factors. Furthermore, we cannot see how comparing the two
groups is meaningful unless we have (a) an implicit model that gender may have a
real genetic effect on height that is represented by the difference between the
average for men and the average for women, and (b) a notion that other factors have
influences on height that we will regard as random error when focusing on the
influences of gender on height.11 Thus, we claim that the concept of an average as
approximating a signal, or true value, comes more clearly into focus when we are
considering the influence of a particular variable on something (in this case, gender
on height). Such a comparison scheme provides a conceptual lever for thinking
about signal (gender influences) and noise (other influences). We return to this point
later.
Discrete Events
Another measure that is often used as an index of central tendency is the rate of
occurrence of some event. As a prototypical example, consider the rate of
contracting polio for children inoculated with the Salk vaccine. Even though
individual children either get the disease or do not, the rate tells us something about
the ability of inoculated children, as a group, to fight the disease.
How can we view a rate (or probability) as a measure of central tendency? First,
a probability can be formally viewed as a mean through what some would regard as
a bit of trickery. If we code the event “polio” as a 1, and the event “no polio” as a 0,
then the probability of getting polio is merely the mean of these Boolean values.
Producing a formal average, however, does not automatically give us a measure of
central tendency. We need to be able to interpret this average as a signal related to
the causes of polio. Compare the distribution of values in the dichotomous case to
the ideal case of the weighing example. In the dichotomous case, the mean is not a
value that can actually occur in a single trial. Rather than being located at either of
the peaks in the distribution, the mean is located in the valley between, typically
quite far from the observed values. Thus, it is nearly impossible to think about the
rate or probability as the true-value component of any single observation and the
occurrence or nonoccurrence of an individual case of polio as the sum of a true
value and a random error component. We suspect this is largely why the idea of a
central tendency in dichotomous situations is the least tangible of all.
It might help in reasoning about this situation to conceive of some process about
which the rate or probability informs us. In the disease example, the conception is
fairly similar to the earlier height example: A multitude of factors influence the
propensity of individuals to get polio—level of public health, prior development of
antibodies, incident rate of polio, age—all leading to a rate of getting the disease in
some population. So even though individuals either get polio or do not, the
propensity of a certain group of people to get polio is a probability between 0 and 1.
That value is a general indicator of the confluence of polio-related factors present in
that group.
As with our height example, although an absolute rate may have some meaning,
we think it is much easier to conceptualize the meaning of a signal when we are
CONCEPTUALIZING AN AVERAGE
187
comparing two rates. In the polio example, this might involve comparing the rate in
an inoculated group to the rate in a placebo control group. Here, as with the height
example, most people would consider the difference in rates (or the ratio of the
rates) to be a valid measure of the efficacy of the vaccine or as a reasonable way to
compare the efficacy of two different vaccines.
The Role of Noise in Perceiving a Collection as a Group
We have argued that the idea of central tendency, or data as signal and noise, is
more easily applied to some types of processes than to others. But other factors, to
which we have alluded, may affect the difficulty of applying this idea. Consider the
case of comparing the heights of men and women. We would expect that the shape
and the relative spread of the distributions would affect how easy it is to conceive of
each distribution as a coherent group and, consequently, to be able to interpret each
group’s average as an indicator of a relatively stable group characteristic.
Indeed, perhaps the most critical factor in perceiving a collection of individual
measurements as a group is the nature of the variability within a group and how it
relates to the differences between groups. In general, we expect that these individual
measurements are easier to view as belonging to a group (and thus as having a
central tendency) when the variability among them is relatively small. To explain
what we mean by relatively small, we find the idea of natural kinds helpful.
According to Rosch and Mervis (1975), people often mentally represent real-world
concepts as prototypes and judge particular instances as “good” or “bad” depending
on how closely those instances match the category prototype. For example, a
prototypical bird for most North Americans is a medium-sized songbird, something
like a robin. The closer an instance is to the category prototype, the less time it takes
to identify that instance as a member of the category. North Americans can
categorize a picture of a starling as a bird faster than they can a picture of an ostrich.
In this theory of natural kinds, prototypes function much as averages do:
Instances of the category are single observations that can be some distance from the
average (or prototype). In fact, some competing theories of natural kinds (e.g.,
Medin & Schaffer, 1978) claim there is no actual instance that functions as a
prototype, but that the effective prototype is simply a mean (in some
multidimensional feature space) of all the instances in memory. What makes some
categories, such as birds, natural kinds is that there is little variability across features
within the category relative to the variability of those features between various
animal categories. So, even though there are some non-prototypical instances of
birds, such as penguins and ostriches, the distributions of features of birds overlap
little with those of other natural kinds such as mammals, so that the groups cohere.
This research suggests that it might be easier to accept, for example, the mean
heights of the men and women as representing group properties if there were no
overlap in heights of the men and women, or if at least the overlap were small
relative to the spread of the distributions.12
188
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
Applying Central Tendency to Nonstandard Cases
In the foregoing examples, we focused on relatively ideal cases. We tacitly
assumed that our histograms of people’s heights, for example, were single-peaked,
approximately symmetric, and, configured as two histograms, had approximately
equal spread. In such cases, most experts would accept some average as a
meaningful measure of central tendency. Is the idea of central tendency applicable
only to these ideal cases, or is it more generalizable than that? In this section, we
consider several nonstandard examples to make the case that we can and do apply
the idea of central tendency to less ideal situations, in which there is some doubt
about whether a single measure of center is adequate to describe the data. We argue
that statistical reasoning in these situations still rests to a large extent either on the
conception of an average as a central tendency or on its cousin, a single measure that
describes the variability of a group of observations.
Distributions with Outliers
Consider cases where there are outliers that we decide should be removed from
the data set. In the case of weighing, suppose a typical observation differs from the
mean weight by something like 1 mg. If one of our observations was 5 mg away
from the mean, most people might think it sensible to omit that value in calculating
the mean. Two ideas seem implicit in this thinking: (a) that “true” measurement
error is associated with weighing on that scale and (b) that some different process
can sometimes generate observations with unusually high measurement error. Only
with such an implicit model can we consider, let alone decide, that an extremely
deviant observation must have been due to nonrandom error (e.g., misrecording the
observation or having a finger on the pan). Similarly, if we had one or two height
observations that were 60 cm from the mean, we might disregard them in certain
analyses as resulting from a process different from the process producing the rest of
the data (e.g., from a mutation or birth defect). Here again, this makes sense only if
we have some implicit model of a typical (male or female) height from which
individual observations differ by something like “random genetic and/or
environmental variation.” We can then regard extremely tall or short people as not
fitting this model—as resulting from a somewhat different process and therefore
calling for a different explanation. For these same reasons, Biehler (1994, p. 32)
suggested that “symmetrical unimodal distributions are something distinctive,” and
deviations from them require additional modeling.
Distributions with Unusual Shape
Continuing with the example of men’s heights, consider the case perhaps
furthest from the ideal, where the histogram of men’s heights is bimodal. We would
be reluctant in this case to interpret any average as a central tendency of men’s
heights. Why? With a bimodal histogram, we would be doubtful that the men we
were looking at comprised a simple process, or “natural kind.” Rather, we would
CONCEPTUALIZING AN AVERAGE
189
suspect that our batch of men consisted of two distinct groups and that we could not
make any useful statements unless we uncovered some underlying variable that
distinguished the two. A similar but somewhat less severe problem would result if
the histogram was unimodal but the variability in the group seemed enormous (e.g.,
if men’s heights from an unknown country varied from 60 cm to 900 cm with a
mean of 450 cm). Given the huge variability in this case, we would question
whether the data came from a coherent process and whether it made sense, therefore,
to use an average to represent it. Of course, people’s intuitions about whether
variability is enormous may differ and are likely to depend on the model they have
of typical variability (or indeed whether they have any conceptual model for
thinking about sources of variability).
Comparing Groups with Skewed or Differently Shaped Distributions
When comparing two histograms, say of men’s and women’s heights, we run
into difficulties when the histograms are of different shape. Imagine, for example,
that the men’s heights were positively skewed and the women’s heights negatively
skewed. Because there is clearly something different about the variability in each
group, we would be reluctant to compare the two groups using their averages. That
is, unless we could generate a model of why the groups’ histograms differed in
shape and, as a result, conclude that the different shapes were just two versions of
random error, we would probably be wary of viewing the difference between the
two averages as representing something like the “gender effect on height.”
Consider the comparison of differences in income from one decade to another,
where both histograms are highly skewed with long tails out to the right. If the
histograms have the same variance and the same shape, we claim it is reasonable to
accept the shift in central tendency as an estimate of the actual change in income for
the group, even though we might have misgivings about using the average for either
group as the best measure of actual income. That is, even though the variability in
each group may not match our ideal view of “noise,” we can at least convince
ourselves that it is the same noise process in both groups. Of course, even though
one histogram is a horizontal translation of the other, it does not necessarily mean
that income has improved the same amount for each individual (or each type of
individual), give or take random error. Indeed, a finer analysis could indicate that
certain groups have become better off while other groups have not changed or have
even become worse off. It is worth noting, however, that many such arguments
about why looking at the differences between group averages is inappropriate or
misleading rely on the perception that the groups are, in some sense, not “natural
kinds” (e.g., that the processes determining incomes of poor people are different
from those determining incomes of rich people). Nonetheless, these arguments are
usually most compelling when we can identify natural subgroups in the larger group
and can show that the changes in the averages in these subgroups differ from each
other (e.g., the rich got richer and the poor got poorer, or different things happened
to Blacks and Whites).
190
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
Another classic difficulty involves comparing two averages when the
distributions differ in spread. For example, what if Country A not only has a higher
mean income than Country B but also has a higher standard deviation? This would
call for more serious modeling of the variability. A special case that would make it
conceptually easier to compare the averages of the two groups would be the
situation in which the difference in standard deviations was commensurate with the
difference in means (ideally, the ratio of standard deviations would be equal to the
ratio of the means). In that case, we could view the effect as multiplicative rather
than additive, since Country A’s typical income would be equal to Country B’s
multiplied by a factor that represents the effect(s) that distinguish A from B. And it
would be reasonable to assume that the same multiplicative factor also applied to the
noise process.
Summary of Analyses of Nonstandard Cases
As we have implied in our argument above, we do not necessarily see these
nonstandard cases as problems for the type of framework that we are advocating.
Indeed, we think that the idea of central tendency of a process allows us to (a)
decide to eliminate an outlier or break data into suitable subsets, (b) come up with a
conceptual model that explains why the groups are asymmetric or differ in spread or
shape, or (c) decide that there is little we can sensibly conclude about the differences
between the two sets of data.
Let us summarize by asking what we could conclude about the difference in
men’s and women’s heights from the distributions we described earlier that were
skewed in opposite directions. We assert that we could conclude nothing without
some conceptual model. If we were trying to make a statement about genetic gender
differences, for example, we would have to be convinced that everything else was
random and that, for instance, we could not explain the mean height difference as
resulting from gender differences in diet. In other words, there is virtually nothing
about analyzing data that is model-free. Some may regard this as a radical proposal,
but we claim that a mean or median has little heuristic value (and is likely to have
little meaning or heuristic value for the student) unless we can conceive of the data
coming from some coherent process that an average helps to elucidate.
IMPLICATIONS FOR STATISTICS EDUCATION
The idea of noisy processes, and the signals that we can detect in them, is at the
core of statistical reasoning. Yet, current curricula do not introduce students to this
idea, instruments meant to assess student reasoning about data do not include items
targeting it, and statistics education researchers have not given it much attention. If
our argument is valid, then critical changes are called for in education research, the
formulation of education objectives, curriculum materials, teacher education, and
assessment. These are tightly interrelated components of educational reform. If we
CONCEPTUALIZING AN AVERAGE
191
fail to advance our efforts on all these fronts, we run the risk of continuing to lose
the small ground gained on any one of them.
Accordingly, we describe here what we see as essential components of a signalversus-noise perspective and offer suggestions about how we might help students
(and future teachers) develop these ideas. We do not aim our speculations at
curriculum designers or teachers in the hope that that they will implement them.
Instead, we intend them for researchers and others who are considering what big
ideas should guide our standards and curriculum objectives, for those designing and
running teacher institutes, and for those developing assessment frameworks and
instruments.
Using Repeated Measures
According to our analysis, processes involving repeated measures are easier than
other types of statistical processes to view as part signal and part noise. This
suggests that to establish the signal-versus-noise interpretation of various statistical
measures, we initially involve students in investigations of repeated measures.
Current curricula make little use of repeated measures. Perhaps this is because
many of the prototypical situations, such as our weighing example, can be somewhat
boring and seemingly pointless unless they are introduced in meaningful ways.
There are many suitable and potentially interesting contexts.13 In the later grades,
these include a number of high-stakes scientific and political issues. For informed
public policy, we need good estimates of the thickness of the ozone layer, of
dissolved oxygen in rivers, of concentrations of atmospheric CO2. Statistical control
of manufacturing processes provides another context in which it is relatively clear
why we need to track a process by looking at its outputs. Of course, time-series
analyses are complex, and we need more research to help determine the kinds of
questions regarding them that introductory students can fruitfully explore.
Lehrer, Schauble, and their colleagues have employed some interesting repeated
measure contexts with younger students. For example, students in a second-grade
class designed cars to race down a track (Lehrer, Schauble, Carpenter, & Penner,
2000). During trial runs, students became unhappy about a decision to base a claim
about a car’s speed on a single trial. Frequently, something would happen to impede
a car—for example, it would run up against the track’s railing. The agreed-on
remedy was to race each car five times. Not surprisingly, the students could not
agree later on how to get a single measure of speed from the five trials. However,
their proposal of multiple trials was, by itself, suggestive of some notion of signal (a
car’s actual top speed on that track) and noise (its observed times resulting from
unpredictable events).
This classroom episode suggests an important distinction. That is, a student
might perceive data as comprising signal and noise and yet not necessarily view a
statistical measure such as an average as an acceptable indicator of signal. We
would expect that with processes involving repeated measures, students would tend
to think of each measurement as a combination of signal and noise, particularly if
sources of measurement error were easy to identify, as in measuring length with a
192
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
ruler. But these same students might not be likely to think of an average of repeated
measures as indicative of signal (any more than the early astronomers were). Thus,
the instructional challenge is how to help students interpret measures such as
averages as indicators of central tendency. Taking a clue from the historical
development of the concept, it would seem fruitful to have students explore the
relative stability of various indicators in different samples.
Explorations of Stability
The idea of stability is closely related to the idea of signal. If the weight of an
object is not changing from trial to trial, it seems reasonable to expect that a good
indicator of its weight should also not vary much from sample to sample. Recall that
it was observing the stability from year to year of such things as birth and death
rates that led Quetelet to begin regarding these rates as indicators of prevailing and
relatively stable societal conditions, and to make the analogy to means of repeated
measures. Similar investigations by students could set the stage for interpreting
averages as indicators of signal.
A method frequently used to demonstrate stability is to draw multiple samples
from a known population and evaluate particular features, such as the mean, across
these replications. However, we expect that these demonstrations are often
conducted prematurely—before students have understood why one is interested in
the mean. Furthermore, in real sampling situations we never do these repeated
samplings, which leaves many students confused about what we can possibly learn
from this hypothetical exercise. The following three alternative methods of
exploring stability appear promising on the basis of their use in classrooms with
students as young as 8 years old.
Comparing Different Measures
In this approach, students compare the relative accuracy of different
measurement methods. Lehrer, Schauble, Strom, and Pligge (2001) used this
approach with third and fifth graders, who measured weights and volumes as part of
a study of densities of different materials. The students explored several different
ways to measure each attribute. They did this by using each method repeatedly to
measure the same object. The students came to favor those methods that produced
less variability in these repeated measures. Having established what measurement
technique they would use, students then considered various proposals of what to use
as, for example, the volume of a particular object. The problem, of course, was that
even with the same measurement method, repeated measuring gave the students a
range of values. They ultimately decided to discard outliers and compute the means
of the remaining observations as their “best guess” of the weights and volumes of
these objects.
CONCEPTUALIZING AN AVERAGE
193
Observing Growing Samples
Another way of exploring stability is to have students observe a distribution as
the sample gets larger. We tested this approach recently in a seventh-grade
mathematics class. Students had conducted an in-class survey to explore whether
boys and girls were paid similar allowances. While comparing the two distributions,
one student expressed reservations about drawing conclusions, arguing that she had
no idea what the distributions might look like if they collected more data. Her
classmates agreed.
To help the class explore this issue, we constructed an artificial pond filled with
two kinds (colors) of paper fish. According to our cover story, a farmer wanted to
determine whether a new type of genetically engineered fish grew longer, as
claimed, than the normal fish he had been using. Students “captured” fish from the
pond, reading off fish type and length (which was written on the fish.) On an
overhead display, we constructed separate stacked dot plots for each type of fish as
students read off their data. After about 15 fish had been sampled, we asked students
what the data showed so far. Students observed that the data for the normal fish
were clustering at 21–24 cm, whereas the data for the genetically engineered fish
were clustering at 25–27 cm. Then we asked them what they thought would happen
as we continued to sample more fish, reminding them of their earlier reservations
with the allowance data. Some said that the stacks would become higher and the
range would get bigger, without mentioning what would happen to such features as
the general shape or the location of the center clump. However, other students did
anticipate that the center clusters would “grow up” but would nevertheless maintain
their approximate locations along the horizontal axis. The latter, of course, is what
they observed as they continued to add more fish to the sample distributions. After
the sampling, we showed them both population distributions along with their sample
data, calling their attention to the fact that the centers of their sample distributions
were quite good predictors of the centers of the population distributions—that these
stable features of the samples were signals.
Simulating Processes
A third way to explore stability is to investigate why many noisy processes tend
to produce mound-shaped distributions. Wilensky (1997) described a series of
interviews that he conducted with graduate students who were exploring this
question through computer simulations. We conducted a similar investigation with
fifth-grade students in an after-school program on data analysis. In analyzing a data
set on cats (from Rubin, Mokros, & Friel, 1996), students noticed that many
frequency distributions, like tail length and body weight, were mound shaped. As
part of exploring why this might be, students developed a list of factors that might
cause a cat’s tail to be longer or shorter. Their list included diet, being in an
accident, and length of father’s and mother’s tails. Using this list, we constructed a
spinner to determine the value of each factor for a particular cat’s tail. One student
might spin +2 inches for diet, +3 inches for mother’s contribution, –2 inches for an
194
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
accident, and so on (Of course, each student wanted his or her cat to have the
longest tail.) Before they began spinning, students predicted that if they built 30 cat
tails in this way, they would get about equal numbers of cats with short, medium,
and long tails. After several trials they noticed they were tending to get medium
tails, which they explained by pointing out that you would have to be “real lucky” to
get a big number every spin, or “real unlucky” to get a small number every spin. As
this was our last session with these students, we could not explore what they might
have generalized from this experience; but we believe that understanding why such
processes produce normal-shaped distributions is a critical part of coming to trust
how process signals rise up through the noise.
Group Comparison
We have speculated that it is often easier to regard the difference between two
averages as a central tendency than it is to think of a single average that way. This
suggests, perhaps somewhat counterintuitively, that rather than beginning
instruction by having students explore single distributions of individual values, we
instead might fruitfully start with questions involving group comparison. Some
support for the benefit of having even young students grapple with comparison
problems comes from accounts from teachers of data analysis in the elementary
grades (Konold & Higgins, 2003). Similarly, all the problems in the middle-school
materials developed by Cobb, McClain, and Gravemeijer involve group comparison
(Cobb, 1999; Cobb, McClain, & Gravemeijer, 2003). As Watson and Moritz (1999)
pointed out, some of the benefits of comparison contexts are undoubtedly related to
their being more interesting and allowing students to see more clearly why the
question matters and why averages might be useful. But in addition, we expect that
in a comparison situation, students can more easily view averages of the individual
groups as summary measures of processes and can readily perceive the difference
between those measures as some signal rising through the din of variability.
Conducting Experiments
Many educators have touted the benefits of students’ collecting their own data
(e.g., Cobb, 1993). Among the expected advantages are increased student interest
and the rich source of information that students can draw on as they later analyze
and reason about the data. There may be additional benefits to having students
design and run simple, controlled experiments. One benefit derives from the fact
that experimental setups involve group comparison. In addition, we speculate that
data from experiments are easier than observational data to view as coming from a
process. As experimenters, students take an active role in the process—for example,
by fertilizing one group of plants and comparing their growth to that of an
unfertilized group of plants. Even quite young students can understand the
importance in such cases of treating both groups of plants the same in all other
respects (Lehrer, Carpenter, Schauble, & Putz, 2000; Warren, Ballenger,
CONCEPTUALIZING AN AVERAGE
195
Ogonowski, Rosebery, & Hudicourt-Barnes, 2001). They then observe firsthand that
not every plant in the fertilized group responds the same and that the effect of the
fertilizer becomes evident, if at all, only when comparing the two groups. With
observational data, students must reason backwards from observed differences to
possible explanations for those differences, and their tendency in explaining the data
is to offer different causal accounts for each individual value. With the experimental
setup, students first see the process and then the data resulting from it, a difference
in perspective that may help them focus on the class of causes that apply uniformly
at the group, as opposed to the individual, level.
CONCLUSIONS
We fear that some readers will hear in our analysis and recommendations a call
to abandon the teaching of noninferential exploratory methods of data analysis and
to eschew data from other than well-defined samples. In fact, we believe that we
should begin teaching informal methods of data analysis in the spirit of EDA to
students at a young age. Moreover, we are not recommending that the teaching of
data analysis should be grounded in, or necessarily headed toward, the technical
question of drawing formal inferences from carefully constructed samples.
We agree with Tukey (1977) that we should not, as a rule, approach data with
the knee-jerk desire to model them mathematically. Rather, our objective should be
more general—to learn from them. For this purpose, being able to display data
flexibly and in various ways can lead to interesting insights and hypotheses, some of
which we may then choose to model more formally (Cleveland, 1993). It is this
sensible approach to the general enterprise—not only to how but also to why we
collect and explore data—that we believe is most important to convey to students in
early introductions to statistics.
It is important that we keep in mind, however, that most of us who regularly use
exploratory methods of data analysis have strong backgrounds in inferential
methods. When we approach data exploration with fewer assumptions, we often set
aside, for the moment, much of the power of the mathematical models of statistics.
But to play data detective, we have a host of tools and experiences to draw on, many
of which stem from our knowledge of the mathematical models of statistics. As
Cleveland (1993) observes, “Tools matter (p. 1).” The tools that he was referring to
were methods of displaying data. We would add that underlying the skillful use of
such graphical tools is the skillful use of conceptual ones, which matter even more.
Our references to the pioneering work of Quetelet were meant to point out that
the early users of means did not regard them simply as ways to describe centers of
distributions, which is how some today (misleadingly) characterize them. Recent
histories of the development of statistics (Hacking, 1990; Porter, 1986; Stigler,
1986) portray the early innovators of statistics as struggling from the beginning with
issues of interpretation. In this regard, Quetelet’s idea of the “average man” was a
way to take the interpretation of a mean as a “true value” of repeated measures and
bootstrap it to a new domain—measurements of individuals—for which the mean
196
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
did not initially make much intuitive sense. We believe that learning to reason about
data requires students to grapple with the same sorts of interpretation issues; in the
process, they need to develop conceptual (not necessarily mathematical) models of
data that can guide their explorations. The idea of data as signal and noise,
physically embodied in the workings of the Galton Board (see Biehler, 1994), is
perhaps the most fundamental conceptual model for reasoning statistically. Future
research should help us learn how the idea develops and how we can foster that
development in our students.
NOTES
1. As George Cobb (1993) remarked, “If one could superimpose maps of the routes taken by
all elementary books, the resulting picture would look much like a time-lapse night
photograph of car taillights all moving along the same busy highway” (p. 53).
2. David Krantz (personal communication, December 13, 2001) shared with us his response
to the question, “Do we really need the mean in descriptive stats?” which had appeared
on a data analysis listserv. “I’m not very clear on what is meant by ‘descriptive statistics.’
To be honest, I don’t think there is any such thing, except as a textbook heading to refer
to the things that are introduced prior to consideration of sampling distributions. Any
description must have a purpose if it is to be useful—it is supposed to convey something
real. The line between ‘mere description’ and suggesting some sort of inference is very
fuzzy.”
3. Many use the term central tendency as a synonym for average or center. When referring
to central tendency in this article, we have in mind the particular definition specified here.
4. Adopting this perspective, we will generally refer to processes rather than to populations,
to signals or central tendencies of processes rather than to population parameters, and to
estimates of signals rather than to sample statistics. We use the term process to refer both
to processes that remain relatively stable over time as well as to stochastic processes,
which can change quickly over time.
5. However, Frick (1998) argues that the difference between processes and populations is
more than terminology, claiming that the tension between theoretical descriptions of
random sampling and what we actually do in practice could be resolved if we thought
explicitly of sampling from processes rather than from populations.
6. The maximum score on the reading component was 500, and the standard deviation was
50.
7. See Bakker (2001) for a review of the historical origins of various types of averages and a
discussion of parallels between these ideas and the development of student thinking.
8. There are good grounds for considering the idea of mean as balance point as an
interpretation. This interpretation figures centrally in mechanics, where the mean is a
measure of center of mass. But in the statistics texts that we examined, the idea of mean
as balance point seemed to be used solely as a way to visualize the location of the mean
in a distribution of values and not as an interpretation as we have defined it.
9. We have to be careful using this logic. For example, mean income would be a different,
and probably better, indicator of the power of the economic system to take care of its
citizens if the wealth were in fact distributed equally.
CONCEPTUALIZING AN AVERAGE
197
10. Of course, both differences may reflect both nature and nurture.
11. It is possible that genetic differences may also (or instead) be reflected by differences in
variability in the groups. Thinking about such differences, however, also requires
thinking about some sort of measure (e.g., the standard deviation or the interquartile
range) as a signal reflecting the typical variability in a group.
12. However, we should note that in the Bright and Friel (1998) study cited earlier, the two
distributions were non-overlapping, yet students did not use averages to compare them.
13. For several good examples of activities written around such processes, see Erickson
(2000).
REFERENCES
American Association for the Advancement of Science (AAAS). (1989). Science for all Americans.
Washington, D.C.: American Association for the Advancement of Science (AAAS).
Bakker, A. (2001). Historical and didactical phenomenology of the average values. In P. Radelet-de
Grave (Ed.), Proceedings of the Conference on History and Epistemology in Mathematical Education
(Vol. 1, pp. 91–106). Louvain-la-Neuve and Leuven, Belgium: Catholic Universities of Louvain-laNeuve and Leuven.
Biehler, R. (1989). Educational perspectives on exploratory data analysis. In R. Morris (Ed.), Studies in
mathematics education (Vol. 7, pp. 185–201). Paris: UNESCO.
Biehler, R. (1994). Probabilistic thinking, statistical reasoning, and the search for causes—Do we need a
probabilistic revolution after we have taught data analysis? In J. Garfield (Ed.), Research papers from
ICOTS 4 (pp. 20–37). Minneapolis: University of Minnesota.
Biehler, R. (1997). Students’ difficulties in practicing computer-supported data analysis: Some
hypothetical generalizations from results of two exploratory studies. In J. B. Garfield & G. Burrill
(Eds.), Research on the role of technology in teaching and learning statistics: Proceedings of the
1996 IASE Round Table Conference (pp. 169–190). Voorburg, The Netherlands: International
Statistical Institute.
Bright, G. W., & Friel, S. N. (1998). Helping students interpret data. In Lajoie, S. P. (Ed.), Reflections on
statistics: Learning, teaching, and assessment in grades K–12 (pp. 63–88). Mahwah, NJ: Erlbaum.
Cleveland, W. S. (1993). Visualizing data. Summit, NJ: Hobart Press.
Cobb, G. (1993). Reconsidering statistics education: A National Science Foundation conference
[Electronic version]. Journal of Statistics Education, 1(1), Article 02.
Cobb, P. (1999). Individual and collective mathematical development: The case of statistical data
analysis. Mathematical Thinking and Learning, 1(1), 5–43.
Cobb, P., McClain, K., & Gravemeijer, K. (2003). Learning about statistical covariation. Cognition and
Instruction, 21, 1-78.
Cortina, J., Saldanha, L., & Thompson, P. (1999). Multiplicative conceptions of the arithmetic mean. In
F. Hitt & M. Santos (Eds.), Proceedings of the 21st Meeting of the North American Chapter of the
International Group of the Psychology of Mathematics Education (Vol. 2, pp. 466–472), Cuernavaca,
Mexico: Centro de Investigación y de Estudios Avanzados.
Donahue, P. L., Voelkl, K. E., Campbell, J. R., & Mazzeo, J. (1999) NAEP 1998 Reading Report Card
for the Nation and the States. Document No. NCES 1999-500. Washington, DC: National Center for
Educational
Statistics,
U.S.
Department
of
Education.
Available
at
http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=1999500
Erickson, T. (2000). Data in depth: Exploring mathematics with Fathom. Emeryville, CA: Key
Curriculum Press.
Feldman, A., Konold, C., & Coulter, R. (2000). Network science, a decade later: The Internet and
classroom learning. Mahwah, NJ: Erlbaum.
Freund, R. J., & Wilson, W. J. (1997). Statistical methods. Boston: Academic Press.
Frick, R. W. (1998). Interpreting statistical testing: Process and propensity, not population and random
sampling. Behavior Research Methods, Instruments, & Computers, 30(3), 527–535.
198
CLIFFORD KONOLD AND ALEXANDER POLLATSEK
Gal, I., Rothschild, K., & Wagner, D. A. (1990). Statistical concepts and statistical reasoning in school
children: Convergence or divergence. Paper presented at the annual meeting of the American
Educational Research Association, Boston, MA.
Gordon F. S., & Gordon S. P. (1992). Statistics for the twenty-first century (MAA Notes, no. 26).
Washington, D.C.: Mathematical Association of America.
Gould, S. J. (1996). Full house. New York: Harmony Books.
Hacking, I. (1990). The taming of chance. Cambridge, UK: Cambridge University Press.
Hancock, C., Kaput, J. J., & Goldsmith, L. T. (1992). Authentic inquiry with data: Critical barriers to
classroom implementation. Educational Psychologist, 27(3), 337–364.
Konold, C. (2002). Teaching concepts rather than conventions. New England Journal of Mathematics,
34(2), 69–81.
Konold, C., & Garfield, J. (1992). Statistical reasoning assessment: Intuitive thinking. Unpublished
Manuscript. Amherst: University of Massachusetts.
Konold, C., & Higgins, T. (2003). Reasoning about data. In J. Kilpatrick, W. G. Martin, & D. E. Schifter
(Eds.), A research companion to principles and standards for school mathematics (pp.193-215).
Reston, VA: National Council of Teachers of Mathematics (NCTM).
Konold, C., Pollatsek, A., Well, A., & Gagnon, A. (1997). Students analyzing data: Research of critical
barriers. In J. B. Garfield & G. Burrill (Eds.), Research on the role of technology in teaching and
learning statistics: Proceedings of the 1996 IASE Round Table Conference (pp. 151–167). Voorburg,
The Netherlands: International Statistical Institute.
Konold, C., Robinson, A., Khalil, K., Pollatsek, A., Well, A., Wing, R., & Mayr, S. (2002). Students’ use
of modal clumps to summarize data. Paper presented at the Sixth International Conference on
Teaching Statistics, Cape Town, South Africa.
Lehrer, R., Carpenter, S., Schauble, L., & Putz, A. (2000). Designing classrooms that support inquiry. In
J. Minstrell & E. V. Zee (Eds.), Inquiring into inquiry learning and teaching in science (pp. 80–99).
Washington, DC: AAAS.
Lehrer, R., Schauble, L., Carpenter, S., & Penner, D. (2000). The inter-related development of
inscriptions and conceptual understanding. In P. Cobb, E. Yackel, & K. McClain (Eds.), Symbolizing
and communicating in mathematics classrooms: Perspectives on discourse, tools, and instructional
design (pp. 325–360). Mahwah, NJ: Erlbaum.
Lehrer, R., Schauble, L., Strom, D., & Pligge, M. (2001). Similarity of form and substance: Modeling
material kind. In D. Klahr & S. Carver (Eds.), Cognition and instruction: 25 years of progress. (pp.
39–74). Mahwah, NJ: Erlbaum.
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review,
85, 207–238.
Mokros, J., & Russell, S. J. (1995). Children’s concepts of average and representativeness. Journal for
Research in Mathematics Education, 26, 20–39.
Moore, D. S. (1990). Uncertainty. In L. A. Steen, (Ed.), On the shoulders of giants: New approaches to
numeracy (pp. 95–137). Washington, DC: National Academy Press.
National Council of Teachers of Mathematics (NCTM). (1989). Curriculum and evaluation standards for
school mathematics. Reston, VA: NCTM.
National Council of Teachers of Mathematics (NCTM). (2000). Principles and standards for school
mathematics. Reston, VA: NCTM.
National Research Council. (1996). National science education standards. Washington, DC: National
Academy Press.
Noss, R., Pozzi, S., & Hoyles, C. (1999). Touching epistemologies: Meanings of average and variation in
nursing practice. Educational Studies in Mathematics, 40, 25–51.
Plackett, R. L. (1970). The principle of the arithmetic mean. In E. S. Pearson and M. G. Kendall (Eds.),
Studies in the history of statistics and probability (pp. 121–126). London: Charles Griffen.
Pollatsek, A., Lima, S., & Well, A. (1981). Concept or computation: Students’ misconceptions of the
mean. Educational Studies in Mathematics, 12, 191–204.
Porter, T. M. (1986). The rise of statistical thinking, 1820-1900. Princeton, NJ: Princeton University
Press.
Quetelet, M. A. (1842). A treatise on man and the development of his faculties. Edinburgh, Scotland:
William and Robert Chambers.
Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories.
Cognitive Psychology, 8, 382–439.
CONCEPTUALIZING AN AVERAGE
199
Rubin, A., Mokros, J., & Friel S. (1996). Data: Kids, cats, and ads. Investigations in number, data, and
space. Palo Alto, CA: Seymour.
Scheaffer, R. (1991). The ASA-NCTM Quantitative Literacy Program: An overview. In D. Vere-Jones
(Ed.), Proceedings of the Third International Conference on Teaching Statistics (pp. 45–49).
Voorburg, The Netherlands: International Statistical Institute Publications.
Schwartzman, S. (1994). The words of mathematics: An etymological dictionary of math terms used in
English. Washington, DC: Mathematical Association of America.
Shaughnessy, J. M., Watson, J., Moritz, J., & Reading, C. (1999). School mathematics students’
acknowledgment of statistical variation. Paper presented at the 77th annual meeting of the National
Council of Teachers of Mathematics, San Francisco.
Smith, G. (1998). Learning statistics by doing statistics [Electronic version]. Journal of Statistics
Education, 6(3), Article 04.
Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Cambridge,
MA: Harvard University Press.
Stigler, S. M. (1999). Statistics on the table: The history of statistical concepts and methods. Cambridge,
MA: Harvard University Press.
Strauss, S., & Bichler, E. (1988). The development of children’s concepts of the arithmetic average.
Journal for Research in Mathematics Education, 19, 64–80.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Warren, B., Ballenger, C., Ogonowski, M., Rosebery, A., & Hudicourt-Barnes, J. (2001). Rethinking
diversity in learning science: The logic of everyday languages. Journal of Research in Science
Teaching, 38, 1–24.
Watson, J. M., & Moritz, J. B. (1999). The beginning of statistical inference: Comparing two data sets.
Educational Studies in Mathematics, 37, 145–168.
Watson, J. M., & Moritz, J. B. (2000). The longitudinal development of understanding of average.
Mathematical Thinking and Learning, 2(1), 9–48.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–265.
Wilensky, U. (1997). What is normal anyway? Therapy for epistemological anxiety. Educational Studies
in Mathematics, 33, 171–202.
Chapter 9
REASONING ABOUT VARIATION
Chris Reading1 and J. Michael Shaughnessy2
University of New England, Australia 1, and Portland State University, USA2
OVERVIEW
“Variation is the reason why people have had to develop sophisticated statistical
methods to filter out any messages in data from the surrounding noise” (Wild &
Pfannkuch, 1999, p. 236). Both variation, as a concept, and reasoning, as a process,
are central to the study of statistics and as such warrant attention from both
researchers and educators. This discussion of some recent research attempts to
highlight the importance of reasoning about variation. Evolving models of cognitive
development in statistical reasoning have been discussed earlier in this book
(Chapter 5). The focus in this chapter is on some specific aspects of reasoning about
variation.
After discussing the nature of variation and its role in the study of statistics, we
will introduce some relevant aspects of statistics education. The purpose of the
chapter is twofold: first, a review of recent literature concerned, directly or
indirectly, with variation; and second, the details of one recent study that
investigates reasoning about variation in a sampling situation for students aged 9 to
18. In conclusion, implications from this research for both curriculum development
and teaching practice are outlined.
NATURE OF VARIATION
Perusal of recent research literature suggests that the terms variation and
variability are at times used interchangeably. Although some researchers do hold
this view, a closer investigation of terms unfolds a distinction. A survey of various
dictionaries demonstrated that variation is a noun used to describe the act of varying
or changing condition, and variability is one noun form of the adjective variable,
meaning that something is apt or liable to vary or change (see for example Pearsall
201
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 201–226.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
202
CHRIS READING AND J. MICHAEL SHAUGHNESSY
& Trumble, 2001, p. 1598). In the world of educators and researchers these two
terms have come to have more specific usages.
In this chapter these two terms will not be treated as interchangeable, although
some of the referenced research uses them interchangeably. The term variability will
be taken to mean the characteristic of the entity that is observable, and the term
variation to mean the describing or measuring of that characteristic. Consequently,
the following discourse, relating to “reasoning about variation,” will deal with the
cognitive processes involved in describing the observed phenomena in situations
that exhibit variability, or the propensity for change. Moore (1997) points out that
both variability and the measuring and modeling of that variability are important. It
is in this measuring and modeling that variation will become the focus of this
chapter.
Patterns and relationships between variables in data indicate variability. The
search for the source of such variability may result in explanations being found for
the variability, or it may result in the need to estimate the extent of unexplained, or
random, variation. Wild and Pfannkuch (1999, pp. 240–242) discuss the modeling of
variation and the importance of considering both explained and unexplained
variation when exploring data. They point out that while many will agree with those
who view all variation as “caused” those who believe in “uncaused” variation should
consider the possibility that unexplained variation may be due to sources as yet
undiscovered. This leads one to question the notion of unexplained, or random,
variation. If the concept of random variation is puzzling even to statisticians and
researchers, how much more puzzling must it be to those just embarking on their
data handling careers?
Possible confusion over the nature of variation may well influence the approach
taken to data handling and to a description of variability. How do people react to
variation in data? There appear to be three broad categories of reaction: those who
ignore variation as if it does not exist; those who investigate existing patterns of
variation and work to fit in with them; and those who try to change the pattern of
variation to something more desirable (Wild & Pfannkuch, 1999, p. 236). The latter
is possible only if manipulable causes of variation can be isolated. Isolation and
modeling of variation allows prediction, explanation and control, as well as
questioning of why variation occurs, resulting in looking for causes. In fact, students
who are presented with numbers that vary will often seek a “real” explanation for
why they are not the same without being too concerned about actually describing the
variation. This is especially so when students have some contextual knowledge
about a situation. Even after some instruction, the randomness ideas are still much
weaker in students than the impulse to postulate causes (Wild & Pfannkuch, 1999, p.
238).
THE ROLE OF VARIATION IN STATISTICS
Why is reasoning about variation so important? Variation, or variability, is
featured in the American Statistics Association definitions of statistical thinking, and
REASONING ABOUT VARIATION
203
so any serious discussion on statistical thinking must examine the role of variation.
Meletiou (2002) cites many references that discuss the importance of variation and
the role of variation in statistical reasoning. Two of these, Moore (1997) and Wild
and Pfannkuch (1999), are critical to appreciating the role of variation in statistics.
Moore emphasized the omnipresence of variability, and the importance of
measuring and modeling variability, while Wild and Pfannkuch put variation at the
heart of their model of statistical thinking when consideration of variation emerged
from interviews with statisticians and students, as one of the five types of
fundamental statistical thinking. There are four aspects of variation to consider:
noticing and acknowledging, measuring and modeling (for the purposes of
prediction, explanation or control), explaining and dealing with, and developing
investigative strategies in relation to variation (Wild & Pfannkuch, 1999, pp. 226–
227). We, the authors, also suggest two important aspects of variation—describing
and representing—that need to be considered. Much of the uncertainty that needs to
be dealt with when thinking statistically stems from omnipresent variation and from
these six aspects of variation that form an important foundation for statistical
thinking.
RELEVANT ASPECTS OF STATISTICS EDUCATION
What is missing in the study of statistics? Both in curriculum design, and
statistics education research, variation has not been given the attention that is
warranted given the general acknowledgment of the importance of variation to
statistics. Two of the principal statistical concepts in the teaching and learning of
statistics, or data handling as it appears in curricula, are measures of central
tendency and measures of dispersion. The latter is often referred to as variability or
spread. Whenever statistics are discussed there is an overemphasis on the
measurement of central tendency and lack of attention to the measurement of
variability. Research shows that there is a conceptual gap among students in the
concept of variability (Shaughnessy, 1997, p. 3) that needs to be addressed both in
the area of curriculum design and in statistics education research.
Since statistics is a recent addition to the mainstream school mathematics
curriculum (at least in the United States, for example, National Council of Teachers
of Mathematics [NCTM], 1989, 2000), one might suspect some gaps in student
learning of statistical concepts. There is ample evidence from the 1996 National
Assessment of Educational Progress (NAEP) in the United States data that students’
have weak conceptions of measures of central tendency, and even weaker
conceptions of the role and importance of variation and spread in statistical thinking
(Zawojewski & Shaughnessy, 2000). Students’ current lack of understanding of the
nature of variability in data and chance may be partly due to the lack of emphasis of
variation in our traditional school mathematics curriculum and textbooks. It may
also be partly due to teachers’ inexperience in teaching statistical concepts.
In the United States, for example, most school mathematics textbooks do not
encourage students to identify potential sources of variation in data sets. Neither do
204
CHRIS READING AND J. MICHAEL SHAUGHNESSY
they provide opportunities for students to visualize variability, nor to investigate
ways of measuring variability, nor what such measures actually mean. Teachers and
students may know the procedure for computing standard deviation; but they may be
unable to explain what it means, or why or when it is a good measure for expected
variation (Green, 1993). Exceptions to this trend of a lack of exploration of sources
of variation can be found in some of the Quantitative Literacy materials and in the
Data Driven Mathematics series, both written by teams of classroom teachers and
statistics educators (Landwehr & Watkins, 1985, 1995; Scheaffer et al., 1999).
The variety of models for centers that have been researched and used in teaching
students (Russell & Mokros, 1996) is not matched by a correspondingly rich array
of models for students’ conceptions of spread or variability. Shaughnessy (1997)
speculates on the reasons for this absence of research about variation. One reason
may be that research often mirrors the emphases in curricular materials which, to
date, has lacked a variation focus.
Another reason may be that statisticians have traditionally been very enamored
with standard deviation as the measure of spread or variability; teachers and
curriculum developers may tend to avoid dealing with spread because they feel they
would have to introduce standard deviation, which is computationally complex and
perhaps difficult to motivate in school mathematics. Still another reason may be that
centers, or averages, are often used for prediction; and comparison and the
incorporation of spreads, or variation, into the process only confounds the issue.
People are comfortable predicting from centers—it feels like firm ground compared
to variability issues. Finally, the whole concept of variability may just be outside of
many people’s comfort zone, perhaps even outside their zone of belief.
If this imbalance in research focus is to be addressed, then more research on
reasoning about variation needs to be undertaken to assist educators to better equip
future students in measuring and modeling variability as they reason about variation.
The focus of this chapter is research involving students aged 9 to 18 years. These
students are living in a world where from an early age they are surrounded by
variability in their everyday life. But when it comes to collecting, representing,
reducing, and interpreting data, all too often their learning experiences are focused
on the central tendency of data and lack opportunities to describe the variation that
occurs. Educators need to modify learning experiences so that students can move
comfortably from identifying variability; to describing, representing, and sifting out
causes for; and finally, to measuring variation. The research described in the
following sections discusses aspects of students’ reasoning that may be used to
inform the evolution of both curriculum and teaching practice. First, some of the
research on reasoning about variation is discussed. Next we review some research
on students’ understanding of samples and sampling in general. Finally, we
investigate recent research on the more specific area of students’ reasoning about
variation in a sampling environment.
REASONING ABOUT VARIATION
205
RECENT RESEARCH INTO REASONING ABOUT VARIATION
Recently, research involving reasoning about variation in a diverse range of
statistical situations has emerged. This research reflects some changing expectations
of students. The research includes investigations into the role of variation in
correlation and regression (Nicholson, 1999), graphical representation (Meletiou &
Lee, 2002), probability sample space (Shaughnessy & Ciancetta, 2002), comparison
of data sets (Watson & Moritz, 1999; Watson, 2001) and chance, data and graphs,
and sampling situations (Watson & Kelly, 2002a, 2002b, 2002c).
Some researchers are now developing hierarchies to describe various aspects of
variation and its understanding. Watson, Kelly, Callingham, and Shaughnessy
(2003) investigated three contexts for variation—chance, data, and sampling—and
described four levels of reasoning: prerequisites of variation, partial recognition of
variation, applications of variation, and critical aspects of variation. The
description of each level is based on various aspects of the three types of variation.
Of most interest to the present discussion is the shift from Level 2 (partial
recognition of variation) to Level 3 (applications of variation). Responses at Level 2
do not reflect an understanding of chance and variation, with students likely to make
flawed interpretations. It is only responses at Level 3, or above, that demonstrate a
focus on appropriate aspects of the concepts while ignoring irrelevant aspects.
A variety of research situations suggest that reasoning about variation could be
more of a natural instinct than is catered to in the present structure of learning
environments. When students were responding to an open-ended data reduction
question, Reading and Pegg (1996, p. 190) found that although many students took
the expected option of reducing data based on measures of central tendency, nearly a
quarter of them preferred reductions based on measures of dispersion. When
designing computer minitools, which had “bar(s)” for partitioning the data,
McClain, Cobb, and Gravemeijer (2000, p. 181) anticipated that students would
place one bar at the mean of the data set. Instead, some students adapted the
partitioning feature to determine which of two data sets had more consistent values,
suggesting a higher regard for the spread of the data than the central tendency.
More recently, some researchers have focused on investigating reasoning about
variation in sampling situations. But before discussing this research, we consider
some recent findings on conceptions of sampling.
RECENT RESEARCH INTO CONCEPTIONS OF SAMPLING
Statistical analysis often relies on studying a part (sample) to gain information
about the whole (population). Sampling is the process of selecting this part, or
sample, of the population (Moore & McCabe, 2003, p. 225) to provide the reliable
and relevant information, and as such is a core concept in statistics. Some research
studies have identified hierarchies of student thinking on sampling tasks in
probability settings. Watson, Collis, and Moritz (1997) used tasks with grades 3 to 9
206
CHRIS READING AND J. MICHAEL SHAUGHNESSY
involving dice, drawing names from a hat, and Piagetian marble tasks; then they
identified hierarchies of student reasoning about those tasks, based on the theoretical
underpinnings of the structure of observed learning outcomes (SOLO) taxonomy
(Biggs & Collis, 1991). These results were later extended to grade 11 (Watson &
Moritz, 1998). In a similar way, levels of justification have been found in students’
reasoning in a sampling task (Jones et al., 1999; Reading, 1999; Shaughnessy,
Watson, Moritz, & Reading 1999; Torok & Watson, 2000; Zawojewski &
Shaughnessy, 2000), while other researchers have focused on students’ perceptions
of samples and sampling in data handling settings (Wagner & Gal, 1991; Rubin,
Bruce, & Tenney, 1991; Jacobs, 1997, 1999; Watson & Moritz, 2000).
Jacobs (1999) found that while some children were aware of potential bias issues
in surveys, favoring random sampling, other children preferred a quasi-stratified
random sampling, preoccupied with issues of fairness. Reading (2001) found similar
results among secondary students, discussing data collection, who they tended to
create selection criteria based on variables that they perceived would improve the
range of responses in their sample. Thus, in constructing a “sample,” students
demonstrated a desire to make absolutely sure that anything can happen. Reading’s
and Jacob’s results may provide more evidence of what has been called the “equiprobability” bias (Lecoutre, 1994), that all things can happen, and so they all should
have an equal chance of happening. These results are also reminiscent of Konold’s
“outcome approach” (Konold, 1989; Konold, Pollatsek, Well, Lohmeier, & Lipson,
1993).
The previous research on students’ understanding of sampling suggests that
there may be conceptual ties between students’ understanding of variation in
samples, and students’ understanding of sample space in a probability experiment.
The probability question is: What is the range of all possible outcomes, and which
ones are more likely to occur than others (i.e., what is the sample space)? The
statistical question is: If we repeat a probability experiment many times, what sort of
variation in outcomes do we observe, and what is the range of the more likely
outcomes (i.e., what interval captures most of our trials)?
RECENT RESEARCH INTO REASONING ABOUT VARIATION
IN SAMPLING SITUATIONS
Sampling at random attempts to avoid the biases that may occur when a sample
is drawn. Such sampling is based on the principle that each unit in the population
has an equal chance of being selected in the sample (Moore & McCabe, 2003, pp.
225–227). Variation occurs in all sampling situations, but the equal likelihood
principle on which random sampling is based allows calculation of the likely size of
errors that occur (Wild & Seber, 2000, pp. 6–9). This sampling variability dictates
that the value of the parameter of interest, for example the number of a specific color
of lollies out of a sample chosen, will vary with repeated random samplings (Moore
& McCabe, 2003, pp. 260–261). Given that such variation occurs, two important
issues arise—the size of the sample and how many samples should be taken.
REASONING ABOUT VARIATION
207
Several problems associated with reasoning about variation in sampling
situations have been identified. First, the long-held notion that small samples should
provide reliable representations of the parent population from which they were
drawn, which leads to estimates based on what is commonly called the
representativeness heuristic (Tversky & Kahneman, 1974) continues to be supported
by more recent research (Shaughnessy 1992, 1997; Watson & Moritz 2000; Watson
2000). This line of research suggests that people may focus more on issues of
“centers or averages or population proportions” than on issues of spread or variation
when making estimates for the likelihood of chance events, and that they have
minimal conceptions and weak intuitions of how outcomes are distributed “around a
center” in a binomial distribution.
Second, although a sample is generally considered heterogeneous, in certain
contexts it may have homogeneous connotations, thus influencing notions of
variation in the sample. When asked what the word sample meant to them, some
students (Grades 3, 6, and 9) said “a little bit,” like a shampoo sample, or a taste of
food, or a blood sample (Watson & Moritz, 2000). Jacobs (1997) reported similar
findings. Intuitive notions of statistical variation would be unlikely to arise in
connection with such samples, where the issue of variation in a “sample” could
actually be troublesome.
Third, a tension has been found to exist in secondary students between
accounting for variability and a desire for representativeness in samples (Rubin et
al., 1991). Acknowledging the possibility of too much variation conflicts with
whether the sample really is representative of a population. Of course, this question
permeates many real applied statistics situations: When do we have enough evidence
to predict from a sample (is it truly representative)? Could two different samples
really be from different populations (is there too much variance across groups)? This
tension between representativeness and variability always exists in a sampling
situation and needs to be carefully considered.
Finally, when subjects are given a question that involves estimating the
likelihood of a single event, they may actually superimpose a sampling setting on
the question where none was there to begin with, in order to establish a “center”
from which to predict (Shaughnessy, 1997).
In various protocols based around sampling situations, some researchers have
identified students’ reasoning about variation, both from analysis of students’
descriptions of possible outcomes in the sampling situation and their explanations of
why the values were chosen. In particular, Torok and Watson (2000) described a
hierarchy of four levels of developing concepts of variations: weak appreciation of
variation (Level A), isolated appreciation of aspects of variation and clustering
(Level B), inconsistent appreciation of variation and clustering (Level C) and good,
consistent appreciation of variation and clustering (Level D). These were based on
responses to a variety of tasks including some situations with isolated random
variation and others with real-world variation. During analysis of the tasks, two
features emerged as important when differentiating between students. One was the
acknowledgment of variation and description of clustering, and the other was the use
of proportion. Similar levels were also identified by Watson et al. (2003) when they
208
CHRIS READING AND J. MICHAEL SHAUGHNESSY
analyzed responses to a wider range of chance and data tasks and developed a
hierarchy that included aspects of centrality, as well as aspects of spread.
In summary, prior research that provides information about students’
conceptions of variation has predominantly come indirectly from investigations of
students’ understandings of sampling in either a probability experiment or in a data
collection setting. Following are some of the principal findings:
•
•
•
•
•
Students may be strongly attracted to averages or population proportions,
and focus on them to the neglect of issues involving spread or variation.
The issue of “fairness” in creating samples in a survey setting is a prominent
one, especially for younger children. They wish to control variation, or to
allocate variation evenly across groups.
The word sample can have both heterogeneous (e.g., random stratified
sample) and homogeneous (e.g., food, blood) connotations for students.
Students may superimpose a sampling environment on a problem when none
was there to begin with, in order to justify their thinking by
representativeness. Also, there is a tension between variability and
representativeness.
Students’ reasoning about variation may depend on an understanding of
centering and clustering.
CONTEXT FOR A RECENT STUDY: VARIABILITY IN REPEATED SAMPLES
In a secondary analysis of the statistics items from the 1996 National
Assessment of Educational Progress (NAEP), the predominance of single-value
responses given by students to a sampling task was intriguing (Zawojewski &
Shaughnessy, 2000). Students were told the number of red-, yellow-, and bluecolored gumballs in a bowl, and then asked how many red gumballs they would
expect to get in a handful of 10. Grade 4 students consistently gave a single number
for their response—5, 10, 3, and so forth, with only one student in a convenience
sample of 232 students (from a possible 1,302 responses) actually giving a range of
possible numbers for the number of red gumballs that would be pulled. That student
wrote 4–6.
This suggests that Grade 4 students tend to give “point value” answers for
sampling problems, and that they do not normally, in such a “test” situation, give
consideration to a “range of likely values” in their responses. This is troubling
because it suggests that students do not recognize the role that variability plays in a
sampling task. However, point-value responses do mirror the prototypical responses
to the most frequent types of questions about data and chance posed in classrooms
and textbooks, namely: “What is the probability that …?” Probability questions just
beg students to provide a point-value response and thus tend to mask the issue of the
variation that can occur if experiments are repeated.
REASONING ABOUT VARIATION
209
What would happen if the sampling question were asked in a different way?
What would students say about repeated samples of 10 gumballs? How many reds
would they expect? Would they expect the same number every time? Or, would
students acknowledge that variation exists in the number of reds in repeated
handfuls? What sorts of “likely ranges” for the numbers of reds would students
select? These questions gave birth to what we have come to call the lollie task (in
Australia, a “lollie” is a hard sweet; in the United States, we called this the candy
task): a sampling task involved pulling lollies from a bowl with a known mixture.
The lollie tasks were given in several written forms to over 400 students in
Grades 4–6, in Australia, New Zealand, and the United States and to over 700
secondary students in Grades 9–12 in the United States. In one version, students
were presented with a mixture of 100 lollies—50 red, 30 yellow, and 20 blue—and
were asked how many reds they would expect if a handful of 10 lollies were pulled
out. Then they were asked, “If this experiment were repeated six times, what would
the likely numbers of reds be?” Students were told that after each sample pull, the
lollies were returned to the bowl and thoroughly mixed up again. Six repetitions of
the sampling were chosen, for two reasons. First, to be small enough that students
would see the task as not too daunting and second, large enough that students had an
opportunity to demonstrate variability that they considered might occur.
Some clear categories of student reasoning emerged in the lollie task. Students
often make predictions that we might characterize as “wide” or “narrow,” or “high”
or “low” from what would be expected according to probability and statistics theory.
For example, when given a 50% red mixture, some students expect a very wide
range of numbers of reds to occur in 6 repeated samples of 10, such as 0,1,4,7,9,10
reds. While these students acknowledged that variability exists, they felt that all
possible outcomes for the number of reds should show up. These students may
believe in an equi-probability model for this sampling problem, that all numbers of
reds have the same chance of occurring. Many of the younger students questioned
indicated that everything should have a “fair” chance of occurring, as Jacobs has
found (1997, 1999). Still other students’ reasoning suggested that they think
“anything can happen” in a chance experiment. Students who reasoned in any of
these ways gave “wide” ranges for the numbers of reds in the repeated samples.
A surprising number of students predicted 5,5,5,5,5,5, suggesting no variability
at all in the sampling results. This tendency is stronger among older mathematics
students, who like to predict “what should happen” every time. This indicates that
some students tend to think in terms of point values rather than a range of likely
values, even when they are directed to provide a range of likely values.
Other students predicted high for the numbers of reds in each attempt, predicting
numbers like 6,7,8,8,7,9, and reasoned “because there are a lot of reds in that
mixture.” There were students who in fact did recognize that outcomes in the lollie
experiment are more likely to be distributed symmetrically around 5, such as “from
3 to 7 reds.” However, in these initial investigations, less than 30% of all the
students surveyed or interviewed were able to successfully integrate the roles of both
centers and spreads in sampling scenarios like the lollie sampling problem
(Shaughnessy et al., 1999).
210
CHRIS READING AND J. MICHAEL SHAUGHNESSY
In summary:
•
•
•
•
There was a tendency for students to be preoccupied with the large number
of reds in the population, rather than the proportion of reds, or a likely range
of reds that would be pulled in a sample. This led students to overestimate
the numbers of reds in samples.
There was a tendency for some students to go wide in their predictions of the
range of the numbers that would be pulled in repeated samples. This may be
due to thinking that reflects aspects of the outcome approach, or to beliefs in
equi-probability.
A proportion of students changed their minds and produced a more
normative response to the lollie problem after actually doing the experiment
and seeing the variation. Thus, there is potential for student learning in such
a sampling task, although some of the students did not change their minds
even when confronted with evidence that conflicted with their original
prediction.
There was evidence of potential interference at higher grades (11–12) from
recent experiences with calculating probabilities. These older students were
more often the ones who would predict 5,5,5,5,5,5. It is our conjecture that
since these students have normally been asked questions in the form of
“what is the probability that” or “what is the most likely outcome,” they do
not recognize a situation that calls for dealing with a spread of outcomes.
Students’ reasoning about variation can be investigated in a variety of situations;
Torok and Watson (2002, p. 152) consider it important to include situations with
isolated random variation as well as situations with real-world variation. In this
chapter we will focus on isolated random variation in order to build on, and deepen,
our understanding of students’ reasoning about variation in a sampling environment.
In the following sections of this chapter, we will present findings from one of the
studies in the lollie research that conducted a qualitative analysis of explanations
given by students. The study consisted of interviews based on the lollie task,
conducted in Australian schools, designed to address the following research
questions: What aspects of reasoning in a sampling task indicate consideration of
variation? Is there a hierarchy of reasoning about variation in responses to a
sampling situation?
METHODOLOGY
Twelve students, six from primary school and six from secondary school, were
interviewed to expand on explanations given in written surveys, previously
administered to other students. The primary students were from Grades 4 (Millie), 5
(Kate, Donna), and 6 (Jess, Alice, Tim); secondary students were from Grades 9
(Jane, Prue, Brad) and 12 (Max, Rick, Sue). Although these names are fictitious,
they are used to refer to specific students during the discussions. The schools were
REASONING ABOUT VARIATION
211
asked to suggest students who were reasonably articulate and had average
mathematical ability. The interviews were audiotaped, and students were given
response forms on which to record as they were being interviewed. All interviews
were transcribed. Students were encouraged to articulate all responses, but could
choose whether to record aspects of the response.
Students were asked to respond to two different sampling situations: a mixture
with 50 red, 30 blue, and 20 yellow and another with 70 red, 10 blue, and 20 yellow.
A bowl containing the correct, relevant proportions of wrapped lollies was placed in
full view. Students were told that the lollies were well mixed and the sampling was
blind. In each case the students were asked how many red lollies could be expected
in a handful of 10 lollies. They were then asked to report on the number of reds that
would be drawn by six people in a handful of 10 lollies, with the lollies being
returned to the bowl after each draw and thoroughly remixed.
The interviews were conducted by the researchers in the school setting familiar
to the students. The student response form (condensed) for the 50 red (50R)
situation is shown in Figure 1 as question 1. A suitably adapted question 2 was used
for the 70 red (70R) situation. The interview protocol followed the wording of the
student response sheet, with prompting-style encouragement given to students who
hesitated when responding to the “why” questions.
Initially students were asked how many reds could be expected and whether that
would happen every time (parts 1A, 2A). Then responses to the sampling task were
sought in three different forms: LIST (parts 1B, 2B), CHOICE (parts 1C, 2C) and
RANGE (parts 1D, 2D). Students were also asked why they gave the responses they
did and then given the chance to alter their responses after having actually drawn 6
samples of 10 from the bowl.
Two conceptually difficult notions related to sampling were addressed in the
extended questions for larger sample size (parts 1E, 2E—selecting 50 lollies instead
of 10) and for increased repetitions (parts 1F, 2F—40 draws instead of 6). Taking a
larger sample results in less variability in terms of providing accurate information
about the population, and more repetitions of sampling can help to provide more
detail about the sampling variability (Moore & McCabe, 2003, pp. 265–266). Both
these notions were considered too difficult for primary students and hence were
presented only to the secondary students.
Responses to question 1 and question 2 in Figure 1 were analyzed both
qualitatively and quantitatively. Quantitatively, performance of the particular 12
students in this study is discussed in Reading and Shaughnessy (2000) based on a
coding scheme developed in Shaughnessy et al. (1999). Qualitatively, the
explanations given were arranged hierarchically depending on increasing
appreciation of aspects of reasoning about variation, similar to those identified by
Torok and Watson (2002). It is the results of this qualitative investigation that are
reported here. Detailed case studies of four of the interviews, one in each of Grade 4
(Millie), Grade 6 (Jess), Grade 9 (Jane), and Grade 12 (Max), can be found in
Reading and Shaughnessy (2000).
212
CHRIS READING AND J. MICHAEL SHAUGHNESSY
Student Response Form
1A) Suppose we have a bowl with 100 lollies in it. 20 are yellow, 50 are red, and 30 are blue.
Suppose you pick out 10 lollies.
How many reds do you expect to get? __
Would this happen every time? Why?
1B) Altogether six of you do this experiment.
What do you think is likely to occur for the numbers of red lollies that are written down?
Please write them here.
_____, _____, _____, _____, _____, _____
Why are these likely numbers for the reds?
1C) Look at these possibilities that some students have written down for the numbers they
thought likely. Which one of these lists do you think best describes what might happen?
Circle it.
a) 5,9,7,6,8,7
b) 3,7,5,8,5,4
c) 5,5,5,5,5,5
d) 2,3,4,3,4,4
e) 7,7,7,7,7,7
f) 3,0,9,2,8,5
g) 10,10,10,10,10,10
Why do you think the list you chose best describes what might happen?
1D) Suppose that 6 students did the experiment—pulled out ten lollies from this bowl, wrote
down the number of reds, put them back, mixed them up.
What do you think the numbers will most likely go from? From ____ (low) to ____ (high)
number of reds.
Why do you think this?
**(After doing the experiment)
Would you make any changes to your answers in 1B–1D?
If so, write the changes here.
1E) Suppose that 6 students each pulled out 50 lollies from this bowl, wrote down the number
of reds, put them back, mixed them up.
What do you think the numbers will most likely go from this time?
From ______ (low) to ______ (high) number of reds.
Why do you think this?
1F) Suppose that 40 students pulled out 10 lollies from the bowl, wrote down the number of
reds, put them back, mixed them up. Can you describe what the numbers would be, what
they’d look like?
Why do you think this?
Figure 1. Student Response Form (condensed).
REASONING ABOUT VARIATION
213
RESULTS
As responses were analyzed they indicated important aspects of reasoning about
variation, as acknowledged by Torok and Watson (2002) in their hierarchy. Two of
these characteristics, one based on the description of the variation and the other
looking for the cause of the variation, were considered important enough to warrant
the development of two separate hierarchies in the present study. The description
hierarchy, based around students’ descriptions of the variation occurring, developed
from aspects of students’ responses such as it is more spread out, there’s not the
same number each time (Jess G6). The causation hierarchy, based around students’
attempts to explain the source of the variation, developed from aspects of student
responses such as because there’s heaps of red in there (Jane G9). A categorization
of student responses as giving causation (C) or description (D) or both (C&D) as
part of the explanations is presented (Table 1) for both the 50R and 70R sampling
situations. As trends were similar for primary and secondary students, the data were
combined. Any question that does not have a total of 12 students coded indicates
that the explanations given in the responses were absent or did not contain
description of variation or a mention of causation.
Table 1. Categorization of Student Responses
Every time?
LIST
CHOICE
RANGE
1A
1B
1C
1D
C
10
5
3
5
50 Red
D
1
1
8
3
C&D
1
3
1
1
2A
2B
2C
2D
C
6
6
4
2
70 Red
D
0
3
4
6
C&D
0
1
1
0
Although frequencies are too small for rigorous statistical analysis, trends can be
observed. Discussions of causation were usually given for the explanation of
whether the given answer will occur “every time” (1A, 2A) and when students were
asked to LIST all outcomes (1B, 2B). For the CHOICE question (1C, 2C), a
descriptive answer was more likely for the 50R situation; and a mixture of responses
occurred in the 70R situation, which students were generally finding more difficult
to deal with. For the RANGE question (see 1D, 2D), more descriptive-type
explanations were given for the 70R situation and more causation-type explanations
for the 50R situation. This was the only question where there were noticeable
differences between primary and secondary student responses, with causation
explanations for the 50R situational being from primary students and all but one of
the descriptive explanations for the 70R situation being from secondary students.
Given that the 70R situation was generally more difficult for students to explain, it is
understandable that mainly secondary students chose to describe the variation while
primary students took the option to look for cause.
These results suggest that the form (LIST, CHOICE, RANGE) in which the
question is asked may influence whether a student chooses to describe the variation
or look for a cause. Also, student responses indicated that the 50R situation, dealing
214
CHRIS READING AND J. MICHAEL SHAUGHNESSY
with a more familiar proportion of 50%, was conceptually easier to understand than
the 70R situation.
The details of the two hierarchies—description (coded D1 to D4) and causation
(coded C1 to C4)—follow, together with typical responses from the students. When
students’ responses are quoted directly, either in part or in full, they appear in italics.
Description Hierarchy
As identified in previous research with the sampling task (Shaughnessy et al.
1999), the responses given usually indicated a notion of reasonable spread; but the
actual way that the spread was described varied considerably. The description
hierarchy was developed based on increasing sophistication in the way that students
referred to notions of spread. The interviews under consideration here indicated that
at a less sophisticated level, two different approaches appear to be taken. Some
students chose to concentrate on the middle values, while others were more
preoccupied with extreme values. The more sophisticated explanations by the
students gave consideration to both extreme values as well as what is occurring
between them.
It should be noted that the responses to the more complex situations of larger
sample size (questions 1E, 2E), and increased repetitions (1F, 2F) appeared to play a
more significant role in the development of this hierarchy than the causation
hierarchy. Students found it challenging trying to describe the variation in these
more complex situations, and their explanations of the questions brought deeper
insight into how they reasoned about variation.
D1—Concern with Either Middle Values or Extreme Values
In this sense “extremes” are used to indicate data items that are at the uppermost
end or the lowest end of the data, while “middle values” are used to indicate those
data items that are between the extremes. Typical of responses concerned mainly
with the extremes are those that justify values selected by explaining why it was not
possible to get more extreme values. Jess (G6) chose to explain why she had
excluded certain values rather than why she had included the ones that she did,
because there’s 50 red I don’t think you could get one or two. Such responses were
most likely for the RANGE questions but not exclusively. For example, Rick (G12)
expressed his desire to exclude certain extreme values when eliminating some
CHOICE options for 50R, deciding that a 0, 8, 9 or 10 are not likely.
Typical of those responses indicating more concern with the middle values were
those that explained why specific values were important and those demonstrating
more concern about the relationship of the numbers to each other. Jess (G6)
explained her LIST by saying that the numbers need to be all mixed up but it is hard
to explain, showing specific concern for the middle values and the variety that was
needed within those numbers. Similarly, Prue (G9) wanted a lot of different
numbers, and Kate (G6) stated that size doesn’t matter just different. On the other
hand, Sue (G12) showed concern for the actual values of the middle numbers when
REASONING ABOUT VARIATION
215
explaining in the 70R LIST that she was wavering around 6 and 7 because it won’t
be around the 5 anymore because there’s a larger number of reds. Such discussions
of middle values were more likely to occur when students were asked to LIST the
values, but they also occurred when students gave reasons for CHOICE responses.
For example, Jess (G6), choosing (b), explained: because its more spread out there’s
not the same number each time while Jane (G9), choosing (a), explained: because it
has got the most difference in it.
An interesting explanation came from Jess (G6), who said you have to pick them
all in the higher half of 5. She meant that all the numbers she was selecting needed
to be between 5 and 10. This is an unusual way to express the range of numbers, but
it probably contains no more information than a response that deals with extremes
by stating an upper and lower value.
There is no attempt here to claim that a student is more or less likely to discuss
extreme values or middle values, just that some responses contain information about
extreme values and that others will contain information about middle values. In fact,
some students dealt with extreme values in one response and then middle values in
another. Perhaps the choice of students’ focus, middles or extremes, is influenced by
the types of questions asked or the order in which the questions were asked.
Explaining her choice of RANGE in the 50R situation, Donna (G5) showed concern
for the extremes: 2 and 1 might not come out because they are lower numbers than 3
and 3 is a bit more higher and usually don’t get 2 reds, but then when explaining
her CHOICE for 70R showed more concern for the middle: there’s 5 and 7 and two
7s and then in between 6 and 8 there’s 7.
D2—Concern with Both Middle Values and Extreme Values
These responses described both the extreme values and what is happening with
the values between. Sue (G12), after describing the individual numbers between 3
and 6 there might be one 5, a couple of 4s, maybe a 6 and maybe a 3 and a 5 she
added you would be less likely to get the maximum of 6 or the minimum of 3 than
you would the ones more like 5. In describing the individual numbers, she showed
concern for what is happening in the middle of the data; and in discussing the
maximum and minimum, she showed concern for extremes. Sue more succinctly
identified aspects of both the extreme values and the middle values when she
explained her CHOICE for the 70R situation by stating: not likely to get all of the
same, a 0 or a 2 or even a 3.
A response such as want something more spread from Max (G12) needs to be
interpreted carefully. It can be interpreted to mean that a larger range is what is
wanted, but when Max gave this as his reason for changing his CHOICE option
from the 5,5,5,5,5,5 previously chosen what he meant was that he did not want all
the numbers to be exactly the same. Although this would inadvertently also cause
the range to increase, what Max is really saying was that he wanted some sort of
variation in the numbers as opposed to them being all the same.
Definite concern was shown for both extreme values and what was happening
between them when Rick (G12) explained, on choosing 2,3,4,3,4,4: because you are
216
CHRIS READING AND J. MICHAEL SHAUGHNESSY
unlikely to get two over 5 and you are not likely to get all 10s or all 7s or all 5s and
these are all pretty high numbers and as I said [list] b has a 7 and an 8 and 0
doesn’t seem to be likely or the 9,8. However, given that Rick was responding to a
CHOICE question, where options are designed to have higher and lower ranges and
to show different amounts of change within the numbers, systematically justifying
why he would not select the various options within the question, it is not surprising
that he addressed both aspects. This reinforces the notion that the style of question
influences the type of approach students take when responding.
D3—Discuss Deviations from an Anchor (not necessarily central)
These responses indicate that deviations from some value have been taken into
consideration; but either the anchor for such deviations was not central, or it was not
specifically identified as central. For example, Kate (G5) stated: at least two away
from the highest and lowest when explaining her LIST in the 70R situation,
suggesting consideration of deviations from extreme values rather than central
values.
Responses not specifically mentioning a central value, even though it is
obviously being used as the anchor, are transitional from D3 to D4. Rick (G12)
explained: there are three numbers on each side to justify giving a range of 4 to 10
in the 70R situation. In this case Rick wanted a deviation of 3, but did not state the
central value that he used. However, given that in an earlier response he had
identified 7 as the most likely value in the 70R situation, there is the implication that
he is comparing the deviations to the central value of 7.
D4—Discuss Deviations from a Central Anchor
These responses indicated that consideration had been given to both a center and
what is happening about that center. No responses of this type were given by
primary students. Generally, the better responses at this level were to the questions
concerning more repetitions of the sampling (1F, 2F). The fact that the younger
students were not given questions, considered too conceptually difficult, may well
have deprived them of a chance to express their ideas about describing variation in
this way.
A poorer-quality example of a response at this level that did not describe the
deviations well came from Max (G12), who explained his LIST in the 70R situation
by stating: averaged around half each time so around there somewhere. A better
response came from Sue (G12) who, when responding to the larger sample question
(1E) for the 50R situation, explained: with a small number of people more spread
out but with 50 people it would probably be around 5—probably a wave but not
irregular but in closer and closer to 5. This suggests that she was trying to indicate
the variation that would exist but with convergence toward an expected number.
Then for the 70R situation, she said basically the same thing would happen: but
higher, the other was around 5, it would be around 7. This indicates she was
considering both central tendency and spread.
REASONING ABOUT VARIATION
217
Another form of response, indicating deviations from a central value have been
considered, is one that suggests a distribution of values have been considered.
Student responses that struggle to describe a distribution could be considered as just
discussions of both extreme values and middle values (D2), such as Sue’s (G12)
response in the 70R situation: get a couple around 6 and 7, probably get a 4 and a 5
and maybe an 8, waver around 6 and 7 because I like those numbers, oh, because it
won’t be around the 5 anymore because there’s a larger number of reds.
There were, however, other responses more clearly trying to indicate a
distribution of some type. Max (G12) explained that for the 70R situation it would
follow the same pattern, indicating that he expected similar things to happen for the
50 repetitions as had done for the 6 repetitions. He went on to explain: it would be
more spread, indicating that he thought there would be more variation; he concluded
with an indication that he had considered a distribution by discussing the possible
occurrence of the various numbers: but it would be around 4 to 9, same number of
5,6,7,8, and 9 would appear and average would be around 7 or something, 6 not
sure, may even get two or three. Interestingly, while Sue identified that for the
smaller number of repetitions (6) there would be more variation, Max decided that
for 50 repetitions there would be more variation. However, Max was possibly trying
to convey that with more repetitions it was possible to more easily demonstrate the
nature of the variation.
Causation Hierarchy
The causation hierarchy was developed to indicate aspects of responses that
indicated increasing recognition of the relevant source of variation and the
sophistication of the articulation of that source. It is interesting to note that these
“causes” were discussed in responses, even though students were asked only to
elaborate upon why they gave various responses. At no time were students actually
asked to identify causes. The four levels of responses identified are now described.
C1—Identify Extraneous Causes of Variation
The extraneous sources of variation identified usually focused on physical
aspects of the sampling. These sources included where the lollies were placed in the
bowl (Tim, G6: might have put them all over the place except the center), where
students decided to select from in the bowl (Prue, G9: reds are in the middle and
might pick at the edge), and how the mixing was done (Tim, G6: mix them up, they
scatter, might have put them all over the place). Some students even appeared to
lose sight of the fact that 10 lollies were to be drawn each time, attributing variation
to how big the handful is (Alice, G6: would depend on how big the handful is) and
to the size of the hand (Kate, G5: with me its different because of my hand size),
while others forgot that this was a blind sampling (Jess, G6: got more red as you can
see more of the red). Some authors have referred to these types of responses as
idiosyncratic (for example, Jones et al., 1999).
218
CHRIS READING AND J. MICHAEL SHAUGHNESSY
C2—Discuss Frequencies of Color(s) as Cause of Variation
Responses, with explanations based on the frequencies of specific colors,
indicate that the composition of the population had been considered but either the
effect was not fully appreciated or it could not be articulated. For example, Prue
(G9) could only state that there’s a lot of reds to explain her numerical response in
the 70R situation, suggesting she was not able describe the effect of the 70%
proportion. Although many students focused on the predominance of reds, some
chose to acknowledge the need to allow for the occurrence of other colors. In the
50R situation, when explaining why she would expect numbers no higher than those
she had selected, Millie (G4) said: there are lots of other colors as well. The better
responses at this level are transitional to the C3 level. Although proportion is not
specified, it is suggested by referring to the frequencies of more than one of the
colors. Sue (G12), although not identifying the exact proportions, was able to
observe that there are more red and less of the other colors.
Interviewing students helped to identify that there was a need to be careful with
responses that made statements like there were “more red.” Generally, such a
comment would be taken to mean that there are more red than there are other colors,
as meant by Sue’s comment; but for the 70R situation, some students used this to
mean more red than in the 50R situation. Such a reference is more likely to be
indicating an appreciation of the importance of proportionality. This became
obvious when Jess (G6), for her 70R CHOICE response, explained: because some of
them are higher than the ones I chose last time, want it higher because there are
more reds. By last time Jess meant the 50R situation, and now she wanted to give
higher numbers in her RANGE response because there were more reds for the 70R
situation.
C3—Discuss Proportion(s) of Colors as the Cause of Variation
Responses, with explanations based on the proportionality of color(s),
demonstrated a realization that the proportion of red influences the number of reds
in the sample drawn. Explanations given used the proportions of the colors (usually,
but not always, red) to justify answers. Poorer responses at this level cannot
succinctly articulate the use of the proportion. These responses showed attempts to
describe the proportion but confusion when explaining the ratio in detail. Max (G12)
explained, as the ratio to yellow is about half when justifying 5 out of 10 in the 50R
situation. Other responses, such as Max (G12) stating that you have to get over half
as more than half of them are red, are basing the explanation on proportions, even
though the exact proportion is not stated or perhaps even known. Still, other
responses acknowledged the importance of proportions by referring to the
population from which the samples are drawn. For example, Max (G12) pointed out
that it comes back to the amount there are in the first place, meaning the ratio of
colors as described in the contents of the bowl.
Better responses actually articulate the proportion in some way. In some cases,
the percentage was actually stated, as Rick (G12) explained: 50% red in there so if
REASONING ABOUT VARIATION
219
you take out 10 you would expect to get 50% of 10. Other students just
acknowledged the importance of the proportion but did not elaborate; for example,
Brad (G9) explained: that’s the percentage that there are.
C4—Discuss Likelihoods Based on Proportions
Responses at this level alluded to both proportion and likelihood when
explaining their choices, with the references to proportion often being better
articulated than the attempts to explain the likelihood concept. Students giving such
responses are not only reasoning proportionally, but inferring sample likelihoods
from population proportions. A poorer example came from the only primary student
to respond at this level. Tim (G6), after a rather detailed but clumsy attempt to
explain the proportion, attributed other aspects of the variation, the small possibility
that something else could happen, to something that he could not quite put his finger
on by stating: no matter how slim the chance is still there, you just can’t throw it
away like you do a used tissue. Even some senior students, who had a feel for this
“unexplained” extra that needed to be accounted for when choosing how much
variation to demonstrate, could not articulate what they were thinking. Max (G12)
discussed ratios to explain the possible values in the sample he chose but then
resorted to luck to account for the variation, suggesting: about half but you never
know you could have a bit of luck one time.
A more statistically sophisticated response at this level came from Sue (G12).
Having already demonstrated an appreciation of the importance of proportion in
question 1A by explaining that if you joined the yellow and the blue together it could
be 50 too, so it’s 50/50 and out of 10, 5 is half, Sue indicated an appreciation of
likelihoods by adding: but you could get different answers. This was confirmed in
question 1B when Sue explained her LIST choice by saying: you would be less
likely to get the maximum of 6 or the minimum of 3 than you would the ones more
like 5. Another Grade 12 student, Rick—having already indicated an appreciation of
the importance of proportions—demonstrated a good understanding of likelihood.
When justifying the CHOICE he made for the 70R situation, he explained: got two
7s which I think are most likely and the 8s and 6s which are probably second most
likely I think and the 5 and the 9 which can do other times as well. This explanation
discusses not only “most likely” but also lesser degrees of likelihood, even though
the 5 and 9 occurrences were not well expressed.
It fact, it was mostly Grade 12 students who used the words likely and unlikely;
and in most instances, it was in situations with references such as less, un, not, and
not as. Use of such vocabulary may reflect that Grade 12 students have undertaken
more intensive study of probability than the younger students. Responses also
indicated likelihood by using expressions discussing the “chances” of something
happening. Max (G12) stated, as part of the explanation of the LIST he chose in the
70R situation 10, chances of it getting to 10 would be fairly low.
When interpreting responses for evidence of the discussion of likelihood, care
needs to be taken with the use of words like likely and unlikely. The word likely
actually appeared as part of the questions; so when using likely in their responses,
220
CHRIS READING AND J. MICHAEL SHAUGHNESSY
students may just be reacting to its being used in the questions. Millie explained
away the CHOICE 5,5,5,5,5,5 by saying, unlikely to get all 5s. There was nothing
about this response, or others that she gave, to indicate that Millie really understood
the concept of likelihood. There is a suggestion that even those students who do not
appreciate the importance of the proportions in the parent population attempt to
attribute the possible variation to chance with comments such as you don’t pick up
the same handful made by Donna (G5).
However, some responses at the C4 level suggest a possible conflict. Grade 12
students, who have been exposed to probability instruction, are able to calculate
expected values. This causes them to gravitate toward saying 5 red (in the 50R
situation) for all 6 draws, demonstrating no reference to the possible variation, but
then intuitively realizing that this would not happen. Max (G12) chose all 5s for the
50R situation; but when asked to give a reason, talked himself into a response with
some variation, explaining: the chances of getting 5 every time would not be very
high, but I think the average would be around 5. Rick (G12) also experienced this
conflict but was not able to articulate the situation as well as Max. Although having
stated that 5 was the most common, Rick added: unlikely to get 5 every time or 5
really often, I suppose, may … though it could be anything.
Responses to the more complex situations of larger sample size (questions 1E,
2E) and increased repetitions (1F, 2F) did not add significantly to the discussion of
students’ attempts to find causes for the variation. As mentioned previously, these
more challenging situations were offered only to the senior students, who found
them far more difficult to deal with. Although able to argue reasonable causes of the
variation in the simpler situations, they were not able to articulate these as well in
the more difficult ones. When considering a larger sample, drawing out 50 lollies
instead of 10, students found it very difficult to give a RANGE and even more
difficult to give an explanation. Only one of the Grade 12 students who had been
able to give some reasonable responses for the sample size 10 questions mentioned
color of the lollies as an issue in explaining her estimates.
Summary
Basically two main aspects of variation have come to light in these descriptions.
One aspect is how spread out the numbers are. Students give responses that suggest
that some indication of variation is being considered when they are dealing with
extreme values using the range. The other aspect is what is happening with the
numbers contained within that range. Responses considering the behavior of the
middle values may give specific information about the numbers; or they may just
give attributes that are necessary for the numbers, such as wanting them to be
different. When these two aspects of variation description are brought together,
deviations begin to become an issue; and when these deviations are anchored to a
specific value, usually a center of some description, it will eventually become the
focus of the student’s description of a distribution.
These hierarchies were developed to code how responses may demonstrate
reasoning about variation. Coding of student responses, according to a spread scale,
REASONING ABOUT VARIATION
221
was reported in Shaughnessy et al. (1999). These hierarchies are describing
reasoning in relation to variation, from two perspectives: how students describe the
variation and how they attribute cause to variation.
DISCUSSION
How then have these results addressed the research questions and enriched the
understanding of reasoning about variation? Previously, aspects of the lolliesampling task indicated some consideration of variation in the numerical responses
where samples are described. This study has shown that a richer source of
information is contained in the actual explanations that students give for those
numerical responses. While investigating levels of reasoning about the variation,
two important aspects were identified, suggesting the development of two separate
hierarchies—one for actual description of the variation and another for consideration
of causation. Four levels were identified in each hierarchy. The description
hierarchy presents a developing sense of exploring extremes and middle values,
leading to consideration of deviations anchoring around a central value and finally to
the notion of a distribution. The causation hierarchy describes a developing sense of
identifying the variables that are the source of the variation being described. These
two hierarchies cover two important aspects of reasoning about variation, and
depending on the task in which one is engaged, either or both might be of relevance.
Importantly, the tasks proposed, and the form of the question asked, can affect
the reasoning about variation. Students demonstrated more of their developing
notions of variation in those tasks that they found more difficult. The challenges of
dealing with the 70% proportion of red lollies rather than 50%, drawing larger
samplings and increasing the number of repetitions, all provided more insight into
students’ notions as they struggled with the complexities of the situations. Reading
and Shaughnessy (2000) noticed the influence of the form of question when
discussing the four case studies. The LIST form of the question restricted the
demonstration of understanding of variation because the descriptions were based on
only 6 repetitions of the sampling, but it did allow more flexibility than both the
CHOICE and RANGE forms of the question. This deeper analysis of the interviews
undertaken has identified further issues resulting from the various forms of a
question. First, students’ attempts to describe the variation or to look for a cause
depended on the form of question asked. Although explaining a LIST was more
likely to lead to a cause being sought, better descriptions of variation were more
common for CHOICE questions. The better descriptions may have arisen because
students were exposed to seven different lists—parts (a) to (g)—in the CHOICE
question and needed to compare them in order to make a choice. Second, differences
in the type of information gained from responses were noted within one hierarchy.
When describing variation in the RANGE explanations, students focused more on
discussion of extremes, but when describing variation in the LIST explanations, they
concentrated more on the middle values.
222
CHRIS READING AND J. MICHAEL SHAUGHNESSY
Analysis of responses to the 50R lollie-sampling task presented to a different
group of students by Kelly and Watson (2002) led to the development of a hierarchy
of four levels that ranged from intuitive, ikonic reasoning (Level 1), through “more
red” but inconsistent reasoning (Level 2) and “more” and “half” red with centered
reasoning (Level 3) to distributional reasoning (Level 4). Kelly and Watson’s
hierarchy overlaps aspects of the two hierarchies we have identified in the study in
this chapter. For example, Level 3 and Level 4 responses are distinguished by the
acknowledgment of the proportion of reds. But most important, the notion of
distributional thinking is evident in the Level 4 responses but not in Level 3. The
Kelly and Watson hierarchy was concerned with the “correctness” of the numbers
offered as possible results of the sampling task as well as the explanations of the
variation. The description and causation hierarchies proposed in this chapter, while
acknowledging some features of the Kelly and Watson levels, show more concern
for the approaches to and notions of variation than do the Kelly and Watson levels.
The more sophisticated responses, identified in both the Torok and Watson
(2002) and the present studies, are able to link together aspects of both center and
spread, leading to notions of distribution. In particular, Level D4 responses, where
students consider deviations from some central value, clearly showed that some
students were giving careful consideration to the distribution of possible value
around the center.
Although the 12 interviews have provided a rich basis for delving into reasoning
about variation, one should not lose sight of the various limitations of this study.
First and foremost, the sampling situation as used in this study is a restricted
context, with isolated random variation and a known population. There are many
situations in which students are expected to reason about variation, and sampling is
but one of those situations. Second, only a small number of students were
interviewed. Interviewing and qualitative analysis of responses is a time-consuming
methodology and necessarily restricts the number of students to be included; but the
researcher is usually rewarded with a depth of richness in the data that is not
possible from just written responses. Finally, the style of question asked could have
influenced the approach taken by students in responding. Although some researchers
may see this as a limitation, providing recognition is given to this effect, in the
future it may be useful as a tool not only for designing questions to elicit certain
types of responses from students but also for helping to guide the development of
intuitive notions of variation.
IMPLICATIONS FOR RESEARCH
The findings of this study unfold many possibilities for future research into
reasoning about variation. However, three questions are particularly relevant. The
approaches taken by students indicated the desire not only to describe the variation
but also to discover causes for the variation. This suggests the first question for
future research: Does a similar approach to reasoning about variation, comprising
both description and causation components, arise in other situations (apart from
REASONING ABOUT VARIATION
223
sampling) in which students engage? For example, other possible contexts include
reasoning about data, either in tables or graphs; reasoning about probability
experiments; reasoning about information from the media. Whilst investigating
these two hierarchies, causation and description, various immature notions of
reasoning about variation have been identified. Hence, the second question: How
can these intuitive notions be harnessed to develop a more sophisticated notion of
reasoning about variation? An important part of these intuitive notions appears to be
dealing with aspects of center and spread, and with their linking. This gives rise to a
final question for further investigation: How can students be encouraged to link the
concepts of central tendency and dispersion?
IMPLICATIONS FOR INSTRUCTION AND ASSESSMENT
The findings of this study also unfold a number of issues relevant to instruction
in, and assessment of, reasoning about variation. Five key points encompass many
of these issues. First, do not be afraid to give students more challenging tasks. With
the integration of calculators and computers into learning environments, students are
no longer burdened with the cumbersome calculations once synonymous with the
study of statistics. Students should be allowed to deal with more detailed data sets
since they allow more opportunity for discovering and attempting to explain the
variation that occurs. Second, do not separate the study of central tendency and
spread. Too often, learning situations totally neglect the study of spread or
artificially separate it for the study of central tendency. Educators need to encourage
discussion of more than just centering tendencies and to link, as much as possible,
reasoning about variation with that of central tendency. Third, when learning
situations involve reasoning about variation, allow students to have their untrained
explorations into what is happening with extreme and middle values. These early
explorations are laying a basis for future, more structured, reasoning about variation.
These first three points are mainly applicable in relation to instruction, while the
following two points are equally applicable to both instruction and assessment. First,
educators need to encourage students to explain their responses. Short responses in
both learning and assessment tasks can produce some information on students’
reasoning about variation, but far more is gained if students are asked to explain the
responses given. Finally, whether instructing or assessing, use a variety of tasks or
forms of questions. Different tasks or questions can encourage different aspects of
students’ reasoning; for a chance to develop all aspects of their reasoning, students
need to be offered the opportunity to react to a variety of tasks and respond to a
variety of questions.
Following is a final message to statistics educators about teaching and learning
statistics. Students need to be encouraged to discuss variation in a variety of settings
and to be questioned in a variety of ways. We, as educators, need to tap students’
thinking, reasoning and explanations, in order to get a better hook for where to go
next in instruction and assessment. Unless we know how our students are thinking
about variability, we are apt to miss opportunities to build on what they already
224
CHRIS READING AND J. MICHAEL SHAUGHNESSY
know—or do not know. Thus having students share their thinking, and encouraging
them to discuss and argue about statistical situations, is critical for our pedagogical
knowledge.
REFERENCES
Biggs, J., & Collis, K. (1991). Multimodal learning and the quality of intelligent behavior. In H. Rowe
(Ed.), Intelligence, Reconceptualization and Measurement (pp. 57–76). New Jersey: Erlbaum.
Green, D. (1993). Data analysis: What research do we need? In L. Pereira-Mendoza (Ed.), Introducing
data analysis in the schools: Who should teach it? (pp. 219–239). Voorburg, The Netherlands:
International Statistical Institute.
Jacobs, V. R. (1997). Children’s understanding of sampling in surveys. Paper presented at the annual
meeting of the American Educational Research Association, Chicago.
Jacobs, V. R. (1999). How do students think about statistical sampling before instruction? Mathematics in
the Middle School, 5, 240–263.
Jones, G., Langrall, C., Thornton, C., & Mogill, T. (1999). Students’ probabilistic thinking in instruction.
Journal for Research in Mathematics Education, 30, 487–519.
Kelly, B. A., & Watson, J. M. (2002). Variation in a chance sampling setting: The lollies task. In B.
Barton, K. C. Irwin, M. Pfannkuch, & M. O. J. Thomas (Eds.), Mathematics Education in the South
Pacific Proceedings of the 25th annual conference of the Mathematics Education Research Group of
Australasia, Auckland (pp. 366–373). Sydney: Mathematics Education Research Group of Australasia
(MERGA).
Konold, C. (1989). Informal conceptions of probability. Cognition and Instruction, 6, 59–98.
Konold, C., Pollatsek, A., Well, A., Lohmeier, J., & Lipson, A. (1993). Inconsistencies in students’
reasoning about probability. Journal for Research in Mathematics Education, 24, 392–414.
Landwehr, J. M., & Watkins, A. E. (1985, 1995). Exploring data. Palo Alto: Seymour.
McClain, K., Cobb, P., & Gravemeijer, K. (2000). Supporting students’ ways of reasoning about data. In
M. Burke & F. Curcio (Eds.), Learning mathematics for a new century, 2000 Yearbook (pp. 175–
187). Reston, VA: National Council of Teachers of Mathematics (NCTM).
Meletiou, M. (2002). Conceptions of variation: A literature review. Statistics Education Research
Journal, 1(1), 46–52.
Meletiou, M., & Lee, C. (2002). Student understanding of histograms: A stumbling stone to the
development of intuitions about variation. In B. Phillips (Ed.), Proceedings of the Sixth International
Conference on Teaching Statistics Developing a Statistically Literate Society (CD-ROM). The
Netherlands: International Association for Statistical Education (IASE).
Moore, D. (1997). New pedagogy and new content: The case for statistics. International Statistical
Review, 65, 123–165.
Moore, D. S., & McCabe, G. (2003). Introduction to the practice of statistics (4th ed.). New York:
Freeman.
National Council of Teachers of Mathematics (NCTM). (1989). Curriculum and Evaluation Standards.
Reston, VA: Author.
National Council of Teachers of Mathematics (NCTM). (2000). Principles and Standards for School
Mathematics. Reston, VA: Author.
Nicholson, J. (1999). Understanding the role of variation in correlation and regression. Presentation at the
First International Research Forum on Statistical Reasoning, Thinking and Literacy, Be’eri, Israel.
Pearsall, J., & Trumble, B. (Eds.). (2001). The Oxford English Reference Dictionary (2nd ed.). Oxford,
UK: Oxford University Press.
Reading, C. (1998). Reactions to data: Students’ understanding of data interpretation. In L. PereiraMendoza, L. Kea, T. Kee, & W.-K. Wong (Eds.), Proceedings of the Fifth International Conference
on Teaching of Statistics, Singapore (pp. 1427–1434). Netherlands: ISI Permanent Office.
Reading, C. (1999). Variation in sampling. Presentation at the First International Research Forum on
Statistical Reasoning, Thinking and Literacy, Be’eri, Israel.
REASONING ABOUT VARIATION
225
Reading, C., & Pegg, J. (1996). Exploring understanding of data reduction. In L. Puig & A. Gutierrez
(Eds.), Proceedings of the 20th Conference of the International Group for the Psychology of
Mathematics Education, Valencia, Spain, 4, 187–194.
Reading, C., & Shaughnessy, J. M. (2000). Student perceptions of variation in a sampling situation. In T.
Nakahar, & M. Koyama (Eds.), Proceedings of the 24th Conference of the International Group for
the Psychology of Mathematics Education, Hiroshima, Japan, 4, 89–96.
Rubin, A., Bruce, B., & Tenney, Y. (1991). Learning about sampling: Trouble at the core of statistics. In
D. Vere-Jones (Ed.), Proceedings of the Third International Conference on Teaching Statistics (pp. 1
314–319). Voorburg, The Netherlands: International Statistical Institute.
Russell, S. J., & Mokros, J. (1996). What do children understand about average? Teaching Children
Mathematics, 2, 360–364.
Scheaffer, R., Burrill, G., Burrill, J., Hopfensperger, P. Kranendonk, H., Landwehr, J., & Witmer, J.
(1999). Data-driven mathematics. White Plains, NY: Seymour.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D. A.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 465–494). New
York: Macmillan.
Shaughnessy, M. (1997). Missed opportunities in research on the teaching and learning of data and
chance. In F. Biddulph & K. Carr (Eds.), Proceedings of the Twentieth Annual Conference of the
Mathematics Education Research Group of Australasia (pp. 6–22). Rotorua, NZ: University of
Waikato.
Shaughnessy, J. M., & Ciancetta, M. (2002). Students’ understanding of variability in a probability
environment. In B. Phillips (Ed.), Proceedings of the Sixth International Conference on Teaching
Statistics Developing a Statistically Literate Society (CD-ROM), South Africa. The Netherlands:
IASE.
Shaughnessy, J. M., Watson, J., Moritz, J., & Reading, C. (1999). School mathematics students’
acknowledgment of statistical variation. NCTM Research Pre-session Symposium: There’s More to
Life than Centers. Paper presented at the 77th Annual NCTM Conference, San Francisco, California.
Torok, R., & Watson, J. (2000). Development of the concept of statistical variation: An exploratory
study. Mathematics Education Research Journal, 12(2), 147–169.
Wagner, D. A., & Gal, I. (1991). Project STARC: Acquisition of Statistical Reasoning in Children
(Annual Report: Year 1, NSF Grant No. MDR90-50006). Philadelphia, PA: Literacy Research Center,
University of Pennsylvania.
Watson, J. (2000). Intuition versus mathematics: The case of the hospital problem. In J. Bana & A.
Chapman (Eds.), Mathematics Education Beyond 2000: Proceedings of the 23rd Annual Conference
of the Mathematics Education Research Group of Australasia, Fremantle, (pp. 640–647). Sydney:
MERGA.
Watson, J. M. (2001). Longitudinal development of inferential reasoning by school students. Educational
Studies in Mathematics, 47, 337–372.
Watson, J. M., Collis, K. F., & Moritz, J. B. (1997). The development of chance measurement.
Mathematics Education Research Journal, 9, 60–82.
Watson, J. M., & Kelly, B. A. (2002a). Can grade 3 students learn about variation? In B. Phillips (Ed.),
Proceedings of the Sixth International Conference on Teaching Statistics: Developing a Statistically
Literate Society (CD-ROM), South Africa. The Netherlands: IASE.
Watson, J. M., & Kelly, B. A. (2002b). Grade 5 students’ appreciation of variation. In A. Cockburn & E.
Nardi (Eds.), Proceedings of the 26th Annual Conference of the International Group for the
Psychology of Mathematics Education, University of East Anglia, United Kingdom, 4, 386–393.
Watson, J. M., & Kelly, B. A. (2002c). Variation as part of chance and data in grades 7 and 9. In B.
Barton, K. C. Irwin, M. Pfannkuch, & M. O. J. Thomas (Eds.), Mathematics Education in the South
Pacific: Proceedings of the 25th Annual Conference of the Mathematics Education Research Group
of Australasia, Auckland (pp. 682–689). Sydney: MERGA.
Watson, J. M., Kelly, B. A., Callingham, R. A., & Shaughnessy, J. M. (2003). The measurement of
school students’ understanding of statistical variation, International Journal of Mathematical
Education in Science and Technology, 34, 1-29..
Watson, J. M., & Moritz, J. B. (1998). Longitudinal development of chance measurement. Mathematics
Education Research Journal, 10(2), 103–127.
Watson, J. M., & Moritz, J. (1999). The beginning of statistical inference: comparing two data sets.
Educational Studies in Mathematics Education, 37, 145–168.
226
CHRIS READING AND J. MICHAEL SHAUGHNESSY
Watson, J. M., & Moritz J. (2000). Developing concepts of sampling. Journal for Research in
Mathematics Education, 31, 44–70.
Wild, C., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–265.
Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference.
New York: Wiley.
Zawojewski, J. S., & Shaughnessy, J. M. (2000). Data and chance. In E. A. Silver & P. A. Kenney (Eds.),
Results from the Seventh Mathematics Assessment of the National Assessment of Educational
Progress (pp. 235–268). Reston, VA: NCTM.
Chapter 10
REASONING ABOUT COVARIATION
Jonathan Moritz
University of Tasmania, Australia
OVERVIEW
Covariation concerns association of variables; that is, correspondence of variation.
Reasoning about covariation commonly involves translation processes among raw
numerical data, graphical representations, and verbal statements about statistical
covariation and causal association. Three skills of reasoning about covariation are
investigated: (a) speculative data generation, demonstrated by drawing a graph to
represent a verbal statement of covariation, (b) verbal graph interpretation,
demonstrated by describing a scatterplot in a verbal statement and by judging a
given statement, and (c) numerical graph interpretation, demonstrated by reading a
value and interpolating a value. Survey responses from 167 students in grades 3, 5,
7, and 9 are described in four levels of reasoning about covariation. Discussion
includes implications for teaching to assist development of reasoning about
covariation (a) to consider not just the correspondence of values for a single
bivariate data point but the variation of points as a global trend, (b) to consider not
just a single variable but the correspondence of two variables, and (c) to balance
prior beliefs with data-based observations.
THE PROBLEM
Covariation, in broad terms, concerns correspondence of variation. The nature of
the covariation may be categorized according to the variation possible in the
measure of each variable involved. For logical variables, which can be either True or
False, the logical statement A = NOT(B) expresses logical covariation between A
and B, since varying the value of A from True to False entails a corresponding
variation in the value of B from False to True to maintain the equation as true. The
equation y = 2x expresses numerical covariation between real-number variables x
and y, since a variation in the value of either x or y entails a corresponding variation
in the value of the other variable. Other polynomial and piecewise functions also
express numerical covariation. In all of these cases, the values of the variables may
227
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 227–255.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
228
JONATHAN MORITZ
be said to involve some form of relationship, association, function, dependency, or
correspondence.
Statistical covariation refers to the correspondence of variation of two statistical
variables that vary along numerical scales. Such covariation is commonly
represented in scatterplots using a Cartesian coordinate system that shows the
correspondence of the ordination of each variable. The more general term statistical
association may refer also to associations between two categorical variables,
commonly represented in two-way frequency tables, and between one categorical
and one interval variable, often formulated as the comparison of group. Statistical
association involves more than just a relation of values, but a relation of measured
quantities of distinct characteristics because data are “not merely numbers, but
numbers with a context” (Moore, 1990, p. 96). Much work in the social and physical
sciences concerns attempts to use statistical association as evidence of causal
association between two characteristics, which may be used to enhance our
prediction and control of one variable by knowledge or manipulation of the other
variable. In most cases the statistical association does not perfectly fit the
deterministic models of logical or numerical covariation just described; that is, there
is variation from the model. Tests of statistical significance are required to measure
the degree to which data fit or vary from one of these models. Formal measures of
statistical covariation depend on the type of variation of the measures of each
variable involved: χ2 tests may be used to judge the significance of the association
between categorical variables, and t-tests or analyses of variance are used to judge
the significance of mean values of an interval variable across groupings of a
categorical variable. For statistical covariation, which involves two numerical
variables, Pearson correlation coefficients are commonly used to test the
significance of the linear fit of covariation between the variables. Much of the
discussion in this chapter focuses on covariation that might otherwise be termed
statistical association or correlation, but in the restricted sense of being considered in
relation to degree of fit to a linear function, as opposed to polynomial or piecewise
models.
Reasoning about covariation commonly involves translation processes among
raw numerical data, graphical representations, and verbal statements about statistical
covariation and causal association. Other processes may include calculating and
interpreting statistical tests of association, mathematical modeling to fit the data to a
specific functional equation, and translating to and from symbolic expressions of
algebraic functions. A comprehensive taxonomy of translations among words,
graphs, tables of data, and algebraic formulae was described by Janvier (1978; Bell
& Janvier, 1981; Coulombe & Berenson, 2001). Common translation processes
associated with reasoning about covariation are shown in Figure 1. It is important
that students know what is involved in these translation processes in order to be
sensitive to the possibility of bias or error. Graph production and graph
interpretation are frequently recommended for students in schools. In daily life such
as reading the newspaper, however, adults rarely engage in the data analysis
sequence of graph production, verbal graph interpretation, followed by causal
inference. Many newspaper reports and advertisements make verbal statements that
involve causal claims, but only some use graphs to illustrate the statistical data that
REASONING ABOUT COVARIATION
229
lie behind the claims. More commonly, adults read a causal statement based on a
statistical association, and in order to understand and evaluate it critically, they must
imagine what statistical data lie behind it, that is, speculative data generation.
Speculative data generation requires an understanding of numerical covariation, and
a contextual understanding of data elements concerning how the data might have
been collected and measured. Tasks of speculative data generation have some
degree of freedom in the speculation of what was lost in the forward process of data
interpretation to arrive at the verbal statement. For assessment purposes, this reverse
type of task may be more informative of student understanding than interpretation,
as students are required to supply more detail in their responses. Previous research
of students’ ability to deal with covariation in graphs has more often concerned
graph production and numerical graph interpretation. Drawing a graph to illustrate a
verbal statement of covariation requires both graph production and speculative data
generation; such tasks are rarely found in curricula or research. This chapter focuses
on reasoning about covariation for the processes of speculative data generation,
verbal graph interpretation, and numerical graph interpretation.
LITERATURE AND BACKGROUND
Curriculum
As part of data handling, covariation appears in statistics curricula in Australia
(Australian Education Council [AEC], 1991, 1994), England (Department for
Education and Employment [DEE], 1999), New Zealand (Ministry of Education
[ME], 1992) and the United States (National Council of Teachers for Mathematics
[NCTM], 2000). Students are asked to engage steps in a multistep process (a) to
hypothesize a relationship between two variables, (b) to collect data, (c) to represent
the data graphically or analyze them numerically, and (d) to draw conclusions about
the relationship in verbal statements. This multistep process reflects professional use
in the social and physical sciences, in which covariation is often observed within
bivariate data sets, and causal inferences are made. In Australia, representation tasks
are suggested for lower secondary students, such as “represent two-variable data in
scatter plots and make informal statements about relationships” (AEC, 1994, p. 93),
and “represent bivariate time series data in line graphs” (AEC, 1994, p. 109). In
England (DEE, 1999), secondary students are expected to draw scatter graphs and
line graphs for time-series data, to “look for cause and effect when analyzing data”
(p. 40), and to “draw lines of best fit by eye, understanding what these represent” (p.
41). In New Zealand (ME, 1992), time-series data are emphasized at many levels,
and for senior secondary school years scatterplots are suggested to assess bivariate
association. In the United States (NCTM, 2000), it is recommended that sixth- to
eighth-grade students use scatterplots as an important tool in data analysis, and
students are encouraged to interpret lines of fit. Causal inference is also considered
in these curricula; for example, secondary students in Australia should “investigate
230
JONATHAN MORITZ
and interpret relationships, distinguishing association from cause and effect” (AEC,
1991, p. 178).
aw Numerical Data
raphical epresentation
raph
Production
Numerical
Graph
Interpretation
Verbal
Data
Interpretation
Speculative
Data
Generation
Verbal Statement of Covariation
Verbal
Graph
Interpretation
"Level of noise is related to number of people", or
"Classrooms with more people make less noise"
Causal nference
Causal Statement
"More people in the classroom cause a lower le el of noise"
Figure 1. Forms of representing statistical covariation and skills to translate them.
Apart from statistical contexts, curricula (e.g., AEC, 1991; NCTM, 2000) for
early algebra courses include covariation relating familiar variables, particularly
involving time. Australian primary students are expected to “represent (verbally,
graphically, in writing and physically) and interpret relationships between quantities
[…] such as variations in hunger through the day” (AEC, 1991, p. 193). Similar
suggestions are made for upper-primary students in England (DEE, 1999) and New
Zealand (ME, 1992). In the United States (NCTM, 2000), suggestions for activities
such as the growth of a plant over time have been proposed for third- to fifth-grade
students as part of the algebra standard of “analyze change.”
History of Graphing
A brief history of graphing illustrates some of the cognitive difficulties and
milestones in reasoning about covariation. Statistical graphs were infrequent before
REASONING ABOUT COVARIATION
231
the late 1700s (Tufte, 1983, p. 9), although mapping geographic coordinates was
common. From 1663 to 1815, ideas for time-series graphs were developed involving
mechanical devices, for example, the invention that could record temperature
change over time “on a moving chart by means of pen attached to a float on the
surface of a thermometer” (Tilling, 1975, p. 195), although “such automatic graphs
were considered useless for analysis and were routinely translated into tabular logs
(Beniger & Robyn, 1978, p. 2). More abstract graphs that relate two abstract
measures (i.e., not time or position) were a later development. They are still rarely
used in daily settings: less than 10% of newspaper and magazine graphs surveyed by
Tufte were based on more than one variable (but not a time-series or a map; see p.
83).
These historical developments may be considered to have educational
implications for the ordering of curriculum. At a simple level, maps involve the use
of coordinates denoting position in which representation is a stylized picture. Some
tasks noted in the curriculum documents just cited, for example involving plant
growth, exploit the natural mapping of height on the vertical axis, which may assist
students because the measure on the graph corresponds to the visual appearance of
the actual object. Besides horizontal position, time is a natural covariate; one can
read a graph from left to right, just as one reads English language, and a narrative of
continuous variation unfolds in temporal order. Graphs of one variable over time
permit students to describe in a verbal statement bivariate association as the change
of the variable over time, naturally expressed with use of English tense (e.g., “it
started to grow faster, then it slowed down again,” NCTM, 2000, p. 163). Such
verbal statements are bivariate in nature, although time is implicit and only one
variable is explicit as changing. Despite the feature of continuous variation in a
graph, students may still tend to approach the data pointwise, just as historically
graphs were transcribed into tabular form.
Understanding Covariation
Piaget’s theory of cognitive development (e.g., Piaget, 1983) highlights some of
the key concepts of students’ development of reasoning about covariation.
Correspondence (to confirm identity or a one-one mapping), classification (to
identify as one of a class or group), and seriation (to order a series) were among the
logical operations Piaget observed across many studies and considered to be
universally fundamental to cognitive development. Conservation is perhaps the most
renowned indication of the developmental stage called concrete operations. For
example, when pouring a given quantity of liquid from a thin glass to a wide glass,
most young children attend to only one aspect, such as the height, and proclaim the
thin glass has more. The coordination (correspondence of seriations) of height and
width is what encourages the learner to rely not on the configurations but rather the
transformation or operations (Piaget, 1983, p. 122).
Teaching and reasoning about covariation often focus on either correspondence
of bivariate data points, or variation within variables, and aim to build one aspect
upon the other. Nemirovsky (1996a) described these two approaches with reference
232
JONATHAN MORITZ
to algebra teaching as (a) a pointwise approach of comparing bivariate pairs to
identify the functional rule for translating one to the other, and (b) a variational
approach that considers change in a single variable across a number of cases. These
approaches are similar to two Piagetian schema that Wavering (1989) suggested are
developed in reasoning to create bivariate graphs: (a) one-to-one correspondence of
bivariate data values, and (b) seriation of values of a variable, necessary for scaling
of graphs to produce a coordinate system. The two approaches are also similar to
two competence models for Cartesian graphing of covariation suggested by Clement
(1989): a static model involving translating bivariate data values to points in
coordinate space, and a dynamic model involving concepts of variation. Clement
noted that a basic qualitative form of the dynamic model involves simply the
direction of change with no indication of how the variables are quantitatively
measured (e.g., “the more I work, the more tired I'll get,” p. 80). Carlson, Jacobs,
Coe, Larsen, and Hsu (2002) have proposed a framework for how such qualitative
understanding further develops to reasoning about rates of change. The variational
approach has been advocated by researchers of early algebra learning (e.g.,
Nemirovsky, 1996a, 1996b; Yerushalmy, 1997). Nemirovsky (1996b) discussed the
importance of time-based mathematical narratives without specific data values, with
verbal and graphical language both read left to right to express generalities of how a
quantity varies over time. Yerushalmy (1997) used various graphic icons with
computer software to provide a graphic language that corresponds to verbal terms
increasing, decreasing, and constant, often with time as the implicit covariate.
These studies indicate that verbal phrases and graphs are important forms for
understanding covariation.
Representing Covariation in Graphs
Most research into the developing understanding of covariation has come from
tasks involving graphs. The broader research literature on graphing has often
reported on pointwise tasks of construction and interpretation, such as plotting
points or locating values (Leinhardt, Zaslavsky, & Stein, 1990). Tasks involving
variation and qualitative graphs—that is, without specific data values—have been
considered by some researchers (Leinhardt et al., 1990) to be an underutilized
avenue for exploring understanding of general features of graphs, including
covariation. Students need to develop skills that flexibly combine local and global
views (Ben-Zvi & Arcavi, 2001).
Some researchers have employed tasks to translate verbal descriptions into
graphical representations (e.g., Bell, Brekke, & Swan, 1987a, 1987b; Coulombe &
Berenson, 2001; Krabbendam, 1982; Mevarech & Kramarsky, 1997; Moritz, 2000;
Swan, 1985, 1988). Krabbendam gave 12- to 13-year-olds various graphing tasks,
such as involving a newspaper text about the gathering and dispersion of a crowd of
people. He concluded, “it appears to be rather difficult for children to keep an eye on
two variables” (p. 142), but that “time could play an important part in recording a
relation” (p. 142) provided it is seen to pass gradually (i.e., continuously), thus
supporting a view of continuous variation rather than a pointwise approach. For a
REASONING ABOUT COVARIATION
233
task to represent “how the price of each ticket will vary with the size of the party”
on a bus with a fixed total cost, Swan (1988) found that 37% of 192 thirteen- to
fourteen-year-olds drew a graph that was decreasing. Mevarech and Kramarsky
found that about 55% of 92 eighth-grade students appropriately used labeled twoaxis graphs to represent verbal statements of positive association (“the more she
studies, the better her grades”), negative association, and no association, whereas
only 38% of students correctly represented curvilinear association. Three alternative
conceptions were identified: (a) only a single point represented in a graph (25% of
responses), (b) only one factor represented in each of a series of graphs (30% of
responses), and (c) an increasing function represented irrespective of task
requirements (5% of responses). The first two conceptions may reflect students’
attempts to reduce the complexity of bivariate data sets. After teaching about
Cartesian conventions, distance-time graphs, and graphs found in newspapers, more
students included labels and scales, and there was a reduction but not an elimination
of these three conceptions. Chazan and Bethell (1994) briefly described a range of
dilemmas students encounter in graphing verbal statements of relationships,
including identifying the variables, specifying the units of measurements, deciding
which variables are independent and dependent, and deciding whether to represent a
continuous line or discrete points. Watson (2000; Watson & Moritz, 1997) asked
students to represent “an almost perfect relationship between the increase in heart
deaths and the increase in the use of motor vehicles” (p. 55) as reported in a
newspaper article. Some students’ graphs were pictures of the context or basic
graphs with no context. Some compared single values of each measure without
variation, whereas others showed variation but just for one measure. Successful
responses were those that displayed the relationship in a Cartesian coordinate
system, or by displaying two data series compared over time on the horizontal axis.
Some researchers of students’ skills for graph production have exploited
contexts in which students have prior beliefs about covariation, and there is a natural
mapping of height on the vertical axis and time on the horizontal axis, by asking
students to plot height versus age graphs (Ainley, 1995; Moritz, 2000), a context
used by Janvier (1978). Compared to the graphs observed by other researchers, on
these tasks students of a young age achieved remarkable success for representing
covariation trends in data, possibly because of familiarity with the covariation and
with the measurement of the variables. Moritz (2000) observed that students
represented a curvilinear relationship demonstrating growth ceasing, but a
multivariate task incorporating differences between males and females proved more
difficult: Some students represented a single comparison of one male and one
female to reduce complexity, some a double comparison of heights for two specific
ages, and some a series comparison of two trend lines over a series of ages.
Konold (2002) has suggested that a variety of graph forms are valid alternatives
to scatterplots for representing and interpreting covariation, such as ordered casevalue bars, which involve a bar graph ordered by cases of one variable to examine
any pattern in the other variable. For ordered case-value bars, ordering was
considered important to assist scanning values to offer a global summary. Similar
graphs, which showed two variables measured across a number of cases and
described as series comparison graphs by Moritz (2000), were drawn by students in
234
JONATHAN MORITZ
studies by Brasell & Rowe (1993), Moritz (2000), and Cobb, McClain, and
Gravemeijer (2003).
Interpreting Covariation
Pinker (1990) has suggested that graph comprehension divides at the most
fundamental level into (a) comprehension of the axis framework and scale, and (b)
comprehension of the data elements. The scale is necessary for reading numerical
values, whereas the data cases without the scale permit trend identification and
qualitative comparison of cases. This is the basis for the distinction between skills of
verbal graph interpretation and numerical graph interpretation as sustained in this
chapter. Curcio (2001) suggested three levels of graph comprehension involving
numerical values, described as “reading the data” values, “reading between the data”
involving comparison of values, and “reading beyond the data” involving
interpolation or extrapolation. Many studies have involved numerical tasks and
found that students construct and read graphs as individual numerical points rather
than a global whole (e.g., Bell et al., 1987a; Brasell & Rowe, 1993). When a variety
of tasks were compared, however, Meyer, Shinar, and Leiser (1997) found trend
judgments from line graphs and bar graphs were performed faster and more
accurately than tasks to read values, to compare values from the same data series for
different X values (X comparisons), to compare values from different data series
with the same X value (series comparisons), or to identify the maximum. The terms
X comparison and series comparison match those of Moritz (2000) as qualitative
operations on data elements not requiring reference to the numerical scale.
Subjects’ judgments of statistical association in a variety of situations have been
investigated by researchers in social psychology (e.g., Alloy & Tabachnik, 1984;
Crocker, 1981), science education (e.g., Donnelly & Welford, 1989; Ross &
Cousins, 1993; Swatton, 1994; Swatton & Taylor, 1994), and statistics education
(e.g., Batanero, Estepa, Godino, & Green, 1996; Batanero, Estepa, & Godino, 1997).
Many studies have followed Inhelder and Piaget (1958) in considering association of
dichotomous variables in contingency tables, whereas few have considered
covariation of two numerical variables (Ross & Cousins, 1993). Crocker (1981)
outlined six steps for statistically correct judgments of covariation in social settings,
as well as some common errors at each step. The six steps included deciding what
data are relevant, sampling cases, classifying instances, recalling evidence,
integrating the evidence, and using the covariation for predictions.
People often hold prior beliefs about causal associations between the real-world
variables that may influence judgments (e.g., Jennings, Amabile, & Ross, 1982).
Topic knowledge may result in ignoring the available data (Alloy & Tabachnik,
1984; Batanero et al., 1996), or dismissing an association in the data because there is
no apparent causal relationship or because other variables are more plausible causes
(Batanero et al., 1997; Crocker, 1981; Estepa & Batanero, 1996).
In using statistical data, some people hold deterministic or unidirectional
concepts of association (Batanero et al., 1996, 1997; Crocker, 1981), similar to the
alternative conception of an increasing function irrespective of the direction of
REASONING ABOUT COVARIATION
235
covariation (Mevarech & Kramarsky, 1997). Some attend to selected data or
selected variables as a means of reducing the complexity of the data (Bell et al.,
1987a), similar to the alternative conceptions of representing a single point, a single
pair of values, or a single variable (Mevarech & Kramarsky, 1997; Moritz, 2000).
Attention to selected data points may involve only the extreme points in a scatterplot
(Batanero et al., 1997) or the cells with confirming cases in contingency tables (e.g.,
Batanero et al., 1996; Crocker, 1981; Inhelder & Piaget, 1958). Attention to selected
variables has been observed in some studies that have identified levels of response
based on the number of variables students have referred to in verbal graph
interpretations (e.g., Donnelly & Welford, 1989; Ross & Cousins, 1993; Swatton,
1994; Swatton & Taylor, 1994). Swatton showed sixth-grade students scatter graphs
and line graphs and asked, “what do you notice about [X] and [Y]?” Level 0
responses involved only the context of the data or syntactic/visual patterns in a
graph, Level 1 responses described univariate data patterns, Level 2 involved both
variables, and Level 3 responses involved both variables with appropriate
directionality. Ross and Cousins asked students from grades 5 to 13 to “find out if
there was a relationship” between two continuous variables in situations where a
third, categorical, variable was involved. Their analysis concerned the numbers of
variables students appropriately ordered or described, including 0, 1, 2, or 2 with a
control. Thus the complexity of the data cases, the number of variables given, the
topic of variables, and possible lurking variables can affect judgments of
covariation.
Questioning causal inferences has been considered by some researchers (e.g.,
McKnight, 1990; Watson, 2000). McKnight considered different levels of databased tasks including (a) observation of facts, (b) observation of relationships, (c)
interpretation of relationships, and (d) critical evaluation of inferential claims. These
levels of tasks correspond closely to the three tiers of the statistical literacy
hierarchy of Watson (2000), involving basic understanding of terms, understanding
concepts in context, and questioning inferential claims. Cobb et al. (2003) noted that
seventh-grade students given time to consider the context of the data collection,
prior to data analysis, were able to raise issues of sampling procedures and control
of extraneous variables that might affect conclusions. Thus questioning claims need
not require the level of questioning statistical inference, but may also be addressed
at simpler levels related to the context of the data, such as sampling or measurement.
The Current Study
The current study aimed to explore three of the skills of reasoning about
covariation shown in Figure 1: speculative data generation (translating a verbal
statement into a graph), verbal graph interpretation (translating a scattergraph into a
verbal statement), and numerical graph interpretation (reading values and
interpolating). Speculative data generation was assessed with respect to
demonstration of numerical covariation, not contextual understanding of data
elements, and as much as possible without assessing graph production skills.
236
JONATHAN MORITZ
METHOD
Participants
Participants were from two Tasmanian private schools, one a boys’ school and
the other a girls’ school. Both schools would be expected to draw students of a
higher socioeconomic status than the general school population in Tasmania. At
each school, one class group from the third, fifth, seventh, and ninth grades was
surveyed. Specific classes were selected based on their availability to undertake the
survey with minimal interruption to their teaching program. Females described as
fifth grade were from a composite class of fourth- or fifth-grade students, with 13
students at each grade level. Ninth-grade students were streamed in basic,
intermediate, and advanced mathematics courses; the female class surveyed was
undertaking the basic course, and the male class the advanced course.
Tasks
The tasks in this study are shown in Figures 2 and 3. Contexts were chosen such
that students would be familiar with the variables. Study time and academic grades
are experiential for students, and were used by Mevarech and Kramarsky (1997).
Noise level and number of people in a classroom, though rarely measured, are at
least intuitively experienced by students in schools. The contexts were also chosen
such that students would expect a positive covariation between the variables, but the
task described a negative covariation so that students were forced to rely on the data
rather than prior beliefs. Task 1 was administered in a positive covariation form
instead of the negative form to third- and fifth-grade males. These different forms
were designed to explore whether students might respond differently due to their
prior beliefs about the covariation. The tasks were worded to support a statistical
context for covariation, such as awareness of the data collected and of possible
variability from a perfect linear fit. For each task, the data were six cases, and for
Task 2, the data included repeated values of each variable.
For the speculative data generation question (Q1), no axes were provided, to
permit students to decide the numbers and types of variables to represent and to
develop their own form of representation. Verbal graph interpretation was assessed
using Q2a and Q2d. Q2a was worded in an open manner to avoid the assumption
that an association exists (Donnelly & Welford, 1989). Because students may have
avoided comment on covariation in Q2a, Q2d* was included and then revised to
Q2d to provide a more specific cue about covariation. Numerical graph
interpretation was assessed using Q2b and Q2c. Q2b involved reading a value, and
Q2c was designed to identify whether students based interpolation on proximity to
one or more of Classes A, C, and E.
REASONING ABOUT COVARIATION
237
Task 1 (Negative association)
Anna and Cara were doing a project on study habits.
They asked some students two questions:
•
“What time did you spend studying for the spelling test?”
•
“What score did you get on the test?”
Anna asked 6 students. She used the numbers to draw a graph.
She said, “People who studied for more time got lower scores.”
Q1.
Draw a graph to show what Anna is saying for her 6 students.
Label the graph.
Task 1 (Positive association)
She said, “People who studied for more time got higher scores.”
Q1*.
Draw a graph to show what Anna is saying for her 6 students.
Label the graph.
Figure 2. Task 1 to assess speculative data generation.
(Third- and fifth-grade males received Q1* in place of Q1.)
Procedure
The items were among a total of six or seven tasks in a written survey
administered to students during class time. The items were Q2 (Task 1) and Q6
(Task 2) on the survey. Q1 on the survey concerned graphing three statements
related to height growth with age (Moritz, 2000), Q3 (for secondary students)
concerned graphing a verbal statement concerning motor vehicle use and heart death
incidence (Watson, 2000), and Q4 concerned graphing a table of raw data about six
temperatures recorded with corresponding times at regular intervals. Graphing tasks
were placed before interpretation tasks to ensure exposure to the printed graphs did
not suggest a graphing method.
The time available was 40–70 minutes, although ninth-grade females had only
25 minutes available; in this case after about 15 minutes, students were instructed to
attempt Task 2. Sample sizes vary between questions because those who did not
appear to have attempted the item were removed from the analysis, whereas those
who appeared to have read and attempted the item but offered no response were
included at the lowest response level. Each session began with a brief verbal
introduction to the purpose of the survey. The first question and other selected
questions were read to students on a class or individual basis as required.
Analysis
Students’ representations were scanned into computer graphic files, and their
written responses were typed into a spreadsheet. Responses were categorized using
iterative techniques (Miles & Huberman, 1994), successively refining categories and
subcategories by comparing and contrasting features of graphs or written responses.
Frameworks of four levels were developed that described the degree of success
238
JONATHAN MORITZ
students had in generating a data set, in verbally generalizing the required
covariation, and in numerically interpreting covariation. The levels—Nonstatistical,
Single Aspect, Inadequate Covariation, and Appropriate Covariation—were
informed by the frameworks used by others (Moritz, 2000; Ross & Cousins, 1993;
Swatton, 1994; Watson & Moritz, 1997) who assigned levels according to the
number of aspects, variables, or data elements used, including no use, a single
variable, both variables but not related, and all variables successfully related. These
levels also relate closely to a theoretical model of cognitive development judged by
the structure of the observed learning outcome (Biggs & Collis, 1982), which
identifies four levels as prestructural, unistructural (single aspect), multistructural
(multiple aspects unrelated), and relational. Further details are provided in the
results that follow.
Task 2
Some students were doing
a project on noise.
They visited 6 different
classrooms.
They measured the level of
noise in the class with a
sound meter.
They counted the number
of people in the class.
They used the numbers to
draw this graph.
Q2a.
Q2b.
Q2c.
Q2d.
Q2d*.
Pretend you are talking to someone who cannot see the graph.
Write a sentence to tell them what the graph shows. “The graph shows...
How many people are in Class D?
If the students went to another class with 23 people, how much noise do you
think they would measure? (Even if you are not sure, please estimate or guess.)
Please explain your answer.
Jill said, “The graph shows that classrooms with more people make less noise”.
Do you think the graph is a good reason to say this?
YES or NO
Please explain your answer.
Jill said, “The graph shows that the level of noise is related to the number of
people in the class”. Do you think the graph is a good reason to say this?
YES or NO
Please explain your answer.
Figure 3. Task 2 to assess verbal and numerical graph interpretation.
(Third- and fifth-grade males received Q2d* in place of Q2d.)
REASONING ABOUT COVARIATION
239
RESULTS
Results are discussed for three of the skills shown in Figure 1: speculative data
generation (Q1), verbal graph interpretation (Q2a and Q2d), and numerical graph
interpretation (Q2b and Q2c). For each skill, examples of levels and types of
responses are provided. Quoted examples are annotated with labels indicating grade
and sex, such as “G3f” for a third-grade female.
Speculative Data Generation
Subsets of students were asked to graph a negative covariation (Q1) or a positive
covariation (Q1*). Responses were coded according to the four levels in Table 1. A
descriptive analysis of some responses has been reported previously (Moritz, 2002).
To be coded at the level of Appropriate Covariation, responses showed the
correspondence of variation in two variables, in that (a) the variables were identified
with adequate variation and (b) the direction of the correspondence of variation was
appropriately specified. Variables were considered adequate if (a) labels were
explicit, or units (e.g., hours/minutes) or values (e.g., digital time format) were used
that indicated which variable was denoted, using the notion of indicative labeling
(Moritz, 2000), and (b) the graph included adequate variation of at least three
bivariate values; although the context described six data cases, three were
considered sufficient to demonstrate the covariation. The direction of the
correspondence of variation was appropriately specified either by values at least
ordinal in nature (e.g., “not at all,” “not much,” “a lot”) or by convention of
height/sector angle.
Table 1. Characteristics of four levels of speculative data generation
Level
0. Nonstatistical
1. Single Aspect
2. Inadequate
Covariation
3. Appropriate
Covariation
Description
Responses represent either:
(a) context in a narrative but without a data set of more than one
value of one variable, or
(b) graph axes or values, denoted by number or spatial position,
but without a context indicating a data variable
Responses represent either:
(a) correspondence in a single bivariate case, or
(b) variation of values for a single variable
Responses represent both variables but either:
(a) correspondence is shown with inappropriate variation for at
least one variable, such as one variable only has two distinct
values (often categorical), or
(b) variation is shown for each variable with inappropriate
correspondence, such as not in the correct direction
Responses represent both variables with appropriate
correspondence between the variation of values for each variable
240
JONATHAN MORITZ
Most students demonstrated at least Inadequate Covariation, and many older
students showed Appropriate Covariation, as shown in Table 2. Further details are
noted below. Numbers in text are divided into the two forms of the questions (Q1
and Q1*), whereas numbers in Table 2 are combined.
Table 2. Percentage of student responses at four levels of speculative data generation by
gender and by grade (N = 167)
Levels of Speculative
Data Generation
0–Nonstatistical
1–Single Aspect
2–Inadequate
Covariation
3–Appropriate
Covariation
Total (N)
a
b
3
23
0
Female Grade
5
7
9
12
5 40
12 14 20
3a
11
42
Male Grade
5a
7
11
8
11
4
9
0
0
Total
(N)
18
18
42
38
18
20
5
0
21
11
35
35
26
38
26
64
22
20
5
42
19
78
18
67
24
89
27
96
167b
Third- and fifth-grade males were administered Q1* rather than Q1.
Percentages do not always sum to 100 due to rounding.
Graphing a Negative Covariation (Q1)
Level 0: Nonstatistical. Fourteen students responded with no evidence of a data set
of covariation for test scores and study time. Two students gave no response. Five
students identified the narrative context without a data set, such as a written
narrative with names for individuals and a single test score of “10/10” (Figure 4a,
G3f). Three students drew graphs that identified each variable but without clear data
points, such as labeled axes. Four students drew a basic graph that gave no
indication of the data set for the variables being measured and also failed to show
six data cases (e.g., Figure 4b, G5m).
(a) Narrative
(b) Basic graph
Figure 4. Student responses to Q1 at Level 0.
REASONING ABOUT COVARIATION
241
Level 1: Single Aspect. Eight students showed a single aspect, either correspondence
or variation, in an attempt to show covariation. One student gave a single bivariate
data point, presented in a rudimentary table of raw data (Figure 5a, G5f). Seven
students represented a single variable: Two showed test scores without indication of
study times (e.g., Figure 5b, G7f), and five showed six data cases ordered by values
of the single variable, which was not labeled.
(a) Single comparison
(b) Single variable
Figure 5. Student responses to Q1 at Level 1.
Level 2: Inadequate Covariation. Thirty-four students showed some features of the
required negative covariation but lacked either appropriate variation or appropriate
correspondence. Fifteen students treated study time as a binary variable, five
students giving a double comparison involving two bivariate pairs (e.g., Figure 6a,
G7m) and 10 representing a group comparison including test scores of six students
(e.g., Figure 6b, G3f). Nineteen students did not adequately show the direction of
covariation, nine failing to clearly indicate any covariation (e.g., Figure 6c, G7m),
seven representing a positive covariation (e.g., Figure 6d, G5f), and three showing a
negative trend with some explicit numbers but without labels or units to indicate the
variables, such as a pie graph with larger sectors corresponding to labels of smaller
percentage values.
242
JONATHAN MORITZ
(a) Double comparison
(b) Group comparison
(c) No covariation
(d) Positive covariation
Figure 6. Student responses to Q1 at Level 2
Level 3: Appropriate Covariation. Seventy-four responses provided data for study
times for which higher values were associated with lower test scores, with the
conditions that at least three bivariate data points were shown and study time was
not a binary variable. Thirteen students drew a table of raw data, that is, bivariate
values were written and spatial position was not used to denote value—for example,
including names and repeated values (Figure 7a, G3f), or values placed on a
diagonal but without clear use of coordinates (Figure 7b, G5f). Eleven students drew
series comparison graphs for which the horizontal axis represented the six students
that Anna asked, and the vertical axis displayed study times and test scores, either in
REASONING ABOUT COVARIATION
243
two graphs or superimposed in one graph, often with two scales (e.g., Figure 7c,
G7m). Seven of these were bar graphs, three line graphs, and one was a double pie
graph; seven graphs were unordered on the horizontal axis, and four were ordered
on one variable. Figure 7c illustrates an unordered horizontal axis, although after the
first two cases, the student appears to order the remaining cases. Fifty students
represented orthogonal covariation with the variables on opposing axes. In some
cases axes were unlabeled, but units made clear the variable measured on at least
one axis. Thirty represented study time on the horizontal axis and scores on the
vertical, whereas 20 interchanged the axes. Forty students used conventional
ordering of values on the axes, that is, increasing value as one moves up or right;
seven reversed the values on one axis (giving the visual impression of a positive
covariation); and three showed values unordered in bar graphs (giving the visual
impression of no covariation). Thirty-one responses appeared to indicate a perfect
linear fit with values of equal spacing on each variable, and the other 19 showed
some variation of a perfect linear fit. Students differed in the form of graph used: bar
graphs (25), scattergraphs (7), line graphs (5), and line graphs of connected dots
(13). Figure 7d (G9m) shows a line graph of connected dots with conventional axes
and linear fit.
(a) Table
(b) Diagonal table
(c) Series comparison
(d) Orthogonal covariation
Figure 7. Student responses to Q1 at Level 3.
244
JONATHAN MORITZ
Graphing a Positive Covariation (Q1*)
A positive covariation is consistent with prior beliefs. Compared with the
negative covariation task format (see Table 2), fewer students gave graphs with an
incorrect direction (Level 2) and more students gave a single-variable (Level 1)
graph, as if both variables could be aligned into a single axis of corresponding or
identical values. Figure 8 shows examples of student responses from third-graders
(Figures 8a, 8b, and 8c) and fifth-graders (Figures 8d and 8e). Many students
included names for individual data cases (e.g., Figures 8a, 8b, and 8d), and others
denoted cases by numbers (e.g., Figure 8e) or by separate representations (e.g.,
Figure 8c). Figure 8a was considered to show a single variable of study time,
although if the student had indicated that position on the horizontal axis denoted
score, the response would have been coded at Level 3.
(a) Level 1—Single variable
(b) Level 2—Group comparison
(c) Level 3—Bivariate table
(d) Level 3—Series comparison
(e) Level 3—Orthogonal covariation
Figure 8. Student responses to Q1* (Positive Covariation task format).
REASONING ABOUT COVARIATION
245
Verbal Graph Interpretation (Q2a and Q2d)
Task 2 asked students to interpret a scattergraph (see Figure 3). Questions Q2a
and Q2d (and Q2d*) involved verbal responses. To express the dual notions of
appropriate variation and correspondence, responses to Q2a needed (a) to identify
“noise” and “number of people” or paraphrases, and (b) make appropriate use of
comparative values such as “less” or “more.” The characteristics of the four levels
of responses are shown in Table 3. In most cases coding was based on response to
Q2a; however, in some cases Q2d (and Q2d*) served to demonstrate the student’s
ability to interpret verbally at a high level than demonstrated in Q2a. Further details
are provided later for each level of response. As seen in Table 4, older students
tended to respond at higher levels; and in particular, all students in grades 7 and 9
were able to identify at least a single aspect at level 1. Seventh- and ninth-grade
males performed better than their female counterparts, although this is likely due to
classes sampled rather than the students’ gender.
Table 3. Characteristics of four levels of verbal and numerical graph interpretation
Level
Verbal Graph Interpretation
0. Nonstatistical
Refers to:
(a) context but not variables or the
association, or
(b) visual features, e.g., “dots”
1. Single Aspect
Refers to either
(a) a single data point, or
(b) a single variable (dependent)
2. Inadequate
Covariation
Refers to both variables but:
(a) correspondence is noted by
comparing two or more points
without generalizing to all 6 classes
or to classes in general, or
(b) variables are described but the
correspondence is not mentioned or
is not in the correct direction
Refers to both variables and
indicates appropriate direction
3. Appropriate
Covariation
Numerical Graph
Interpretation
Fails to read data values from
axes. May refer to
(a) context based “guesses,” or
(b) visual features, e.g., the
maximum on the scale
Reads a value given
corresponding bivariate value
(Q2b: 27) but fails to use data to
interpolate
Reads values (Q2b: 27) and
interpolates within local range
but without accuracy (Q2c: 39–
54 or 71–80)
Reads values (Q2b: 27) and
interpolates with accuracy (Q2c:
values 55–70)
246
JONATHAN MORITZ
Table 4. Percentage of student responses at four levels of verbal graph interpretation by
gender and by grade (N = 121)
Levels of Verbal
Graph Interpretation
0–Nonstatistical
1–Single Aspect
2–Inadequate
Covariation
3–Appropriate
Covariation
Total (N)
a
b
Female Grade
3
5
7
9
31 13
0
0
38 22 20
8
23 57 45 67
3a
31
46
15
Male Grade
5a
7
9
29
0
0
14
8 10
43 33
5
Total
(N)
13
25
43
8
9
35
25
8
14
58
86
13
23
20
12
13
7
12
21
40
121b
Third- and fifth-grade males were administered Q2d* rather than Q2d.
Percentages do not always sum to 100 due to rounding.
Level 0: Nonstatistical. Some students offered responses that described no
covariation. These included non-responses, responses generically about the topic,
such as “that there is 6 classrooms and each dot shows that that is each classroom”
(G3f) or “the graph shows class C, class A, class B, class D, class F, class E and
numbers” (G5f).
Level 1: Single Aspect. One student commented on a single data point: “it shows that
class C had 21 children in there and sound level is 70” (G3m). Many students
referred to one variable, the level of noise, without reference to number of people in
the classroom, although some mentioned that classrooms were involved. Some of
these mentioned no values, with responses such as “noise” (G3m). Some
commented that noise values varied, such as “it shows that some classes are noisier”
(G3f). Others referred to specific values of noise, such as “80 is the most loud and
zero is the most soft” (G3f).
Level 2: Inadequate Covariation. Some students referred to both variables but did
not describe any covariation in the data, such as “the number of people in each class
and the noise level” (G5f), or “level of noise goes up in 10’s and going across is the
number of people in the class room which is going up from 20, 21, 22, to 30” (G9f).
Possibly these students read the axis labels but not the data series. Others mentioned
both variables and gave some evidence of generalizing covariation between the two
variables, such as “that the classroom with the least people is the noisiest and the
classroom with the most is the quietest” (G7f), and “that the class with the least
people in it is making the most noise” (G5m).
Level 3: Appropriate Covariation. Some students generalized the graphs into a
pattern statement, namely a description of the negative covariation. Some responses
were simply stated, such as “that less people make more sound” (G7m), and some
built up to the idea, for example, “Room C is the noisiest then A followed by B, E
and D are each forty, then F brings up the rear, so the more people the less noise”
(G7m). Some emphasized both ends of the generalization, similar to those at the
previous level but describing “classes” in the plural to generalize either to the set of
six classes or to classes in general: “The classes with less people are the loudest. The
rooms with more people are the quietest” (G9m). Other students mentioned the
imperfect nature of the covariation: “In most cases the higher the amount of noise
REASONING ABOUT COVARIATION
247
the lower the amount of people with the exception of E” (G9m). Responses included
statements that emphasized variation by comparison across cases such as “the more
X, the less Y,” “cases with more X have less Y,” and “as X increases, Y decreases.”
No students gave responses that objectified the correspondence or relationship at the
expense of variation, such as “X and Y are negatively/inversely related.”
Numerical Graph Interpretation (Q2b and Q2c)
Numerical graph interpretation was assessed by two questions, one involving
reading a value (Q2b), and the other involving interpolation (Q2c). The coding of
the levels of response is shown in Table 3. There was a high degree of consistency
between responses to Q2b and Q2c, in that of 83 responses showing some evidence
of interpolation at Levels 2 and 3, only six students were unable to read values, and
three of these responded “40” by reading point E. Of 12 nonstatistical (Level 0)
responses, four did not respond to Q2b and five responded “23”—probably because
of it appearing on the next line for Q2c. Nonstatistical responses to Q2c were
idiosyncratic, such as “50, because some talk and some don’t” (G3f). Single Aspect
responses read a single value from the graph, but for Q2c, either acknowledged they
did not know or gave responses that used single points in an idiosyncratic argument
such as “30, under E” (G7m) and “80, because there would have been 50 people in
the room” (G5f). Responses interpolating at Level 2 offered values in the ranges 39–
54 or 71–80, and/or provided reasons related to adjacent data points, such as “If 23
people were in the class I would estimate 50 because in the classes of 24 they’re 40
and 60 and 50 was in the middle” (G9f). Responses coded at Level 3 showed
evidence of interpolation using the trend of the data to predict a value in the range
55–70. Many predicted a value of 65, often with reasoning such as, “about 65
because in the class of 24 it is 60 and in the class of 21 it is 70” (G7f); and some
predicted other values such as, “60 because that is the trend of the graph” (G9m).
The percentages of students who responded at each level are shown in Table 5.
Notably, no third- or fifth-grade students responded at Level 3, whereas no seventhor ninth-grade students responded at Level 0.
Associations among Skills
Associations among the skills of speculative data generation, verbal graph
interpretation, and numerical graph interpretation are shown in Table 6. Using the
scores of the levels on an interval scale from 0 to 3, numerical graph interpretation
was highly correlated with verbal graph interpretation (r119 = 0.54) and with
speculative data generation (r109 = 0.47), whereas the correlation of verbal graph
interpretation with speculative data generation was weaker (r109 = 0.30).
248
JONATHAN MORITZ
Table 5. Percentage of student responses at four levels of numerical graph interpretation by
gender and by grade (N = 121)
Levels of Numerical
Graph Interpretation
0 – Nonstatistical
1 – Single Aspect
2 – Inadequate
Covariation
3 – Appropriate
Covariation
Total (N)
a
3
46
23
31
Female Grade
5
7
13
0
52 25
35 35
9
0
33
50
3
15
62
23
Male Grade
5
7
9
14
0
0
29 25 10
57 50 10
Total
(N)
12
39
40
0
0
40
17
0
0
25
81
13
23
20
12
13
7
12
21
30
121a
Percentages do not always sum to 100 due to rounding.
Table 6. Percentage of student responses at four levels of one skill by level of another skill
Response
Speculative Data
Generation
0
1
2
Level
Verbal Graph Interpretation
0
0
23
19
1
42
46
24
2
42
23
48
3
17
8
10
Numerical Graph Interpretation
0
33
23
24
1
17
62
48
2
50
8
29
3
0
8
0
Total (N)
12
13
21
a
Total
Verbal Graph
Interpretation
1
2
3
(N)
0
9
12
26
52
13
24
35
39
—
—
—
—
—
—
—
—
0
25
35
40
65
12
36
36
27
111
15
54
31
0
13
24
40
28
8
25
Total
3
(N)
—
—
—
—
—
—
—
—
—
—
—
—
9
44
35
12
43
0
8
35
58
40
12
39
40
30
121a
Percentages do not always sum to 100 due to rounding.
DISCUSSION
Four levels of response were detailed for tasks concerning speculative data
generation, verbal graph interpretation, and numerical graph interpretation. These
levels relate closely to levels described in previous research of correlational
reasoning (Ross & Cousins, 1993; Swatton, 1994) and graph comprehension
(Curcio, 2001). Most students, even third-graders, offered responses that identified
at least a single aspect related to the data, such as reading a value from a scatterplot,
which demonstrated they could engage the task. Levels of verbal and numerical
graph interpretation were highly correlated, possibly in part because the coding of
both required reading from the axes, whether the variable label or the value on the
scale.
REASONING ABOUT COVARIATION
249
Many students, even third-graders, demonstrated a negative covariation by
speculative data generation. This finding extends to a younger age the findings of
Swan, 1988 (success 37% of 13- to 14-year-olds) and of Mevarech & Kramarsky,
1997 (55% of eighth-graders). Reasons for this success rate may include the aboveaverage capabilities of the sample of students, or the context of the task involving
six discrete data cases in a familiar setting. A notable difference of the current study
from previous research was the open-ended response format and the coding, which
did not insist students represent the data in a certain form, such as with Cartesian
axes.
This study set out to assess the skill of speculative data generation, irrespective
of representational form. The highest level of response illustrates this skill with
different forms of representation—tables of raw data, series comparison graphs, and
orthogonal covariation graphs as well as bar graphs, line graphs, and scattergraphs—
each with potential to be ordered or unordered. That some students drew tables
rather than graphs may reflect the historical tradition noted to reproduce accurately
all aspects of the data in a table rather than a graph (Beniger & Robyn, 1978), and
raises the question of the graph constructor being aware of audience and of the
purpose for the representation. In this study series comparison graphs were
considered to demonstrate the highest level of speculative data generation (Konold,
2002; Watson & Moritz, 1997), whereas for the purposes of assessing graph
production skills, Brasell and Rowe (1993) considered such graphs were Cartesian
failures. Further, the principle of indicative labeling (Moritz, 2000) was used to
assist assessing poorly labeled graphs. Figure 8c, for example, has only three
bivariate data points; time is unlabeled, but can be inferred by the clock
representation, and score is only inferred by the notation of “/100.” The student did
not use labels as requested, nor show six data points, but the representation
illustrates the student expressed the two aspects of covariation, namely
correspondence and variation. Clearly there are many aspects of student
understanding we may seek to assess, including graph production skills to conform
to various conventions. If we want to encourage the view of graphs as tools for
analysis rather than ends in themselves (e.g., NCTM, 2000), then we need to permit
and even encourage a variety of representations to achieve the purposes of engaging
the data and reasoning about covariation. In short, there is a place for assessing the
skill of speculative data generation, and this study indicates this assessment is
appropriate by third grade.
Many of the different approaches to graphing observed—difficulties with labels
or units, inversion of axes, reversal or uneven metric scales, and continuity versus
discrete data points—have been observed previously (e.g., Chazan & Bethell, 1994).
Selecting familiar and distinct variables for tasks may be important for students’
reasoning and in particular for labeling and use of units. In light of the interpretation
difficulties of some students in this study, such as reading values from the wrong
axis, it may also be helpful to use distinctively different values for each variable—
such as 1, 2, 3 versus 10, 20, 30—so that students can be clear which variable is
referred to by a value. Use of discrete data appeared to encourage many students to
consider six different cases in tables or bar graphs. Other students connected these
data points by lines, or showed a line without data points. In algebra classes,
250
JONATHAN MORITZ
covariation is often represented in a graph by a line or a line connecting points,
whereas statistics is typified by the notion of data sets, which tend to be classified
according to the number and type of variables and the number of cases of discrete
values. What a line segment in a graph denotes should be clarified with respect to
the variable. In some situations, such as measuring temperature, it may be a valid
interpolation between known data points, and in other cases with discrete data, a
connecting line may confuse what is measured. A slightly more sophisticated notion,
acceptable in both statistics and algebra classes, is a straight line of best fit of the
points, which may be formalized into an algebraic expression of the function.
IMPLICATIONS
Three difficulties students encountered, also observed by Mevarech and
Kramarsky (1997), included (a) focusing on isolated bivariate points only, such as
reducing study time from a numerical variable to a measure with only two
categorical values; (b) focusing on a single variable rather than bivariate data; and
(c) handling a negative covariation that was counter to prior belief in a positive
association. These difficulties are discussed in the next three sections with
suggestions for how teachers may build student understanding.
From Single Data Points to Global Trends
Many students described the scattergraph by reference to one or two bivariate
data points, and several students drew single or double comparison graphs, that is,
comparing one or two bivariate data points (e.g., Figures 5a and 6a). Pointwise
approaches may provide an important way into many statistical issues—such as
repeated values in either variable and the contextual understanding of data elements
involving measurement and sampling issues—that do not occur in algebraic studies
of continuous functions. In this respect, tables and series comparison graphs (e.g.,
Figures 7a, 7c, and 8d) may be significant representations for reasoning about
covariation, since they devote a feature (column or axis) to retain case information,
such as the name of a person, and can represent two cases with identical bivariate
values, which are slightly problematic to display in Cartesian coordinates.
Students’ reasoning about isolated data points emphasized correspondence of
two measures but did not describe variation to indicate covariation adequately.
Development of the pointwise approach in verbal interpretations may be considered
as a progression of comparisons within variables, from single-point values (“class C
had 21 children …”) to comparison of points (“the classroom with the least people is
the noisiest …”) to generalizing beyond the available points (“the more people the
less noise …”). This follows the levels of “reading the data,” “reading between the
data,” and “reading beyond the data” described by Curcio (2001). For speculative
data generation, a pointwise approach was the building block used by some young
students who added more data points; for example, the student who drew Figure 7a
REASONING ABOUT COVARIATION
251
probably began with a representation much like Figure 5a. In generating more
points, students appeared to find it easy to maintain the appropriate correspondence
between the measures: Students who drew double or group comparisons conceived
of study times as two high and low extremes, and generated scores that were
corresponding low and high extremes (e.g., Figures 6a and 6b). Even the student
who drew the table in Figure 7a appears to have clustered times into high (3 and 4)
and low (½ and 1) values and corresponding scores into low (3 and 5) and high (9
and 10). The difficulty in generating more data points appeared to be generating
appropriate variation that ensured both numerical variables do vary.
An important idea for development of reasoning beyond isolated points or
dichotomous extremes may be the ordering of cases on a single variable (Ross &
Cousins, 1993; Wavering, 1989). For speculative data generation, one can generate
new cases that have incrementally more or less of one measure, often at fixed
differences, and then simply increment the other variable appropriately. Such fixed
differences move a student away from considering isolated cases that may include
repeated values, to a generation of patterns within a variable that is frequent in
algebra. For real-world data variables, generating new values may be restricted by
the minimum or maximum possible values. Figure 7c illustrates the impact of
ordering and extremes values, where, after generating two cases, the student reached
the maximum score on the scale of 50, and thus broke the pattern to generate the
rightmost four cases in order. Extremes of possible values may also explain why
Figures 7b and 8c did not include six data cases: Having reached a score of 100, the
students could not generate another score in the order. For verbal interpretation,
ordering of one variable allows variation of the other variable to be observed as an
increasing or decreasing feature of the data series (a trend) verbally summed up as a
single phrase, thus corresponding to the graphic language of the data series with the
verbal language of change (Yerushalmy, 1997).
From Single Variables to Bivariate Data
Sixteen students drew graphs of single variables, and many described only the
variable noise in verbal descriptions of a scattergraph. These students emphasized
variation but did not describe correspondence of two measures to indicate
covariation adequately. Those who had success in verbally describing the
covariation all used the language of incremental change across cases, implied by
ordering each variable, rather than objectifying the correspondence as “X is related
to Y.” Interpolation tasks, though numerical and often involving reference to specific
points, may in fact encourage students to discuss differences between points and
lead to discussion of increments more globally.
A change-over-time approach to covariation has been recommended by algebra
curricula (e.g., NCTM, 2000) and researchers (e.g., Nemirovsky, 1996b). Such an
approach carries with it implicitly the understanding that time is ordered, and thus
verbal phrases such as “it started to grow faster, then it slowed down again”
(NCTM, 2000, p. 163) allow students to focus on change of one variable without
attending to the correspondence of the variables, as is required if the independent
252
JONATHAN MORITZ
variable is not time. Tables and series comparison graphs may be significant
representations not just for developing reasoning to include more cases as noted
earlier but also for emphasizing both variables and the correspondence of individual
data values. Both of these representations treat each variable as a measured variable
(often termed dependent and, if graphed, represented on the vertical axis) across a
number of cases, whereas Cartesian graphs have axes conventionally considered
independent (horizontal) and dependent (vertical). Aside from the implication of
dependency and possibly causation (difficulties discussed in the next section), some
students do not attend to the variable on the horizontal axis, such as the many
interpretations involving only the variable noise. Tables and series comparison
graphs (e.g., Figures 7c and 8d) may be considered as natural progressions
composed of two univariate tables or graphs (e.g., Figures 5b and 8a). As already
noted, ordering of values is a key concept that allows not only handling of variation,
but also establishing correspondence case-wise. Once cases are ordered by one
variable, such as in the horizontal dimension, the foundation is set for coordinating
the correspondence of two variables in Cartesian coordinates. The transformation
from an ordered table (no use of dimension), or from an ordered-series comparison
graph with both data series in an axis framework (both variables denoted by vertical
dimension), to the orthogonal covariation of Cartesian coordinates can be seen in
Figures 7b and 8e, where bivariate cases have been (reverse) ordered by study times
in the horizontal dimension, and vertical height incorporated to denote variation in
test scores. In these representations, moving the written value labels from the data
elements to the axes results in Cartesian coordinates, as in Figure 7d.
From Prior Beliefs to Data-Based Judgments
Some students generated or interpreted a data set as a positive covariation based
on prior beliefs when a negative covariation existed in the data. Others wrote the
values on one axis in reverse order, thus displaying a negative covariation but
appearing visually as an increasing function, in accord with an alternative
conception that all covariation graphs should appear in a positive direction
(Mevarech & Kramarsky, 1997). The counterintuitive nature of the tasks was
important for assessment purposes in eliciting these responses. An important level
for these students to achieve was appreciating covariation in context, similar to Tier
2 of Watson’s (2000; Watson & Moritz, 1997) statistical literacy hierarchy, evident
by representing a verbal claim in a graph or by interpreting a graph in a verbal
statement. To do this, students must be encouraged to suspend prior beliefs
temporarily to look at the data and examine what covariation might be indicated.
Once the claim of covariation is understood in context, students must question
the process of inference from statistical data to causal claim—Tier 3 of Watson’s
hierarchy. At this level, awareness of prior beliefs should be encouraged, as well as
its balanced integration with available data. An important feature of using tasks
involving counterintuitive covariation is that they should naturally raise questions
about reliability of the data set, and about generalizability to a causal inference. The
tasks involving only six data points were designed to be easy for students to break
REASONING ABOUT COVARIATION
253
down the tasks to represent covariation as a series of corresponding cases and draw
it quickly, but also importantly introduced the issue of sample size. Other questions
used as part of this wider study have elicited student responses noting that small
sample size made generalization difficult. These responses will be discussed in
future research reports.
Future Teaching and Research
This study has shown that graphing and verbalizing covariation, using familiar
contexts, can occur before the standardization of graphing conventions. Teaching of
standard graphs forms, such as Cartesian coordinates, might not eliminate
alternative conceptions (Mevarech & Kramarsky, 1997), and might even inhibit
reasoning about covariation, if students are able to interpret only their own
representation. Instruction may be more effective if it builds on students’ existing
reasoning and challenges further development of this reasoning. Employing the
Piagetian principle of cognitive conflict, Watson and Moritz (2001) asked students
to construct a pictograph, and then showed students different representations and
asked them to comment; many students could acknowledge the merits of more
structured graphs. For research this procedure has potential for students to have
moments of learning during observation, as they recognize the merits of another way
of reasoning. The new ideas can be selectively shown in order to build on a
students’ existing idea. For teaching situations, it may prove helpful to use graphs
hand-drawn by anonymous students, similar to the student’s own, since this removes
the emotional personal threat of one’s own work being critiqued unfavorably. Once
students have begun to engage the context of the variables, they can begin to
investigate covariation among variables, discuss ways of reasoning about
covariation, and only slowly be introduced to conventions for expressing their
reasoning in graphs, words, and numerical methods.
REFERENCES
Ainley, J. (1995). Re-viewing graphing: Traditional and intuitive approaches. For the Learning of
Mathematics, 15(2), 10–16.
Alloy, L. B., & Tabachnik, N. (1984). Assessment of covariation by humans and animals: The joint
influence of prior expectations and current situational information. Psychological Review, 91(1), 112–
149.
Australian Education Council. (1991). A national statement on mathematics for Australian schools.
Carlton, Vic.: Author.
Australian Education Council. (1994). Mathematics—A curriculum profile for Australian schools.
Carlton, Vic.: Curriculum Corporation.
Batanero, C., Estepa, A., & Godino, J. D. (1997). Evolution of students’ understanding of statistical
association in a computer based teaching environment. In J. Garfield & G. Burrill (Eds.), Research on
the role of technology in teaching and learning statistics (pp. 191–205). Voorburg, The Netherlands:
International Statistical Institute.
Batanero, C., Estepa, A., Godino, J. D., & Green, D. R. (1996). Intuitive strategies and preconceptions
about association in contingency tables. Journal for Research in Mathematics Education, 27, 151–
169.
254
JONATHAN MORITZ
Bell, A., Brekke, G., & Swan, M. (1987a). Diagnostic teaching: 4 Graphical interpretations. Mathematics
Teaching, 119, 56–59.
Bell, A., Brekke, G., & Swan, M. (1987b). Diagnostic teaching: 5 Graphical interpretation teaching styles
and their effects. Mathematics Teaching, 120, 50–57.
Bell, A., & Janvier, C. (1981). The interpretation of graphs representing situations. For the Learning of
Mathematics, 2(1), 34–42.
Beniger, J. R., & Robyn D. L. (1978). Quantitative graphics in statistics: A brief history, American
Statistician, 32, 1–10.
Ben-Zvi, D., & Arcavi, A. (2001). Junior high school students’ construction of global views of data and
data representations. Educational Studies in Mathematics, 45, 35–65.
Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York:
Academic Press.
Brasell, H. M., & Rowe, M. B. (1993). Graphing skills among high school physics students. School
Science and Mathematics, 93(2), 63–70.
Carlson, M., Jacobs, S., Coe, E., Larsen, S., & Hsu, E. (2002). Applying covariational reasoning while
modeling dynamics events: A framework and a study. Journal for Research in Mathematics
Education, 33, 352–378.
Chazan, D., & Bethell, S. C. (1994). Sketching graphs of an independent and a dependent quantity:
Difficulties in learning to make stylized, conventional “pictures.” In J. P. da Ponte & J. F. Matos
(Eds.), Proceedings of the 18th Annual Conference of the International Group for the Psychology of
Mathematics Education, 2, 176–184. Lisbon: University of Lisbon.
Clement, J. (1989). The concept of variation and misconceptions in Cartesian graphing. Focus on
Learning Problems in Mathematics, 11, 77–87.
Cobb, P., McClain, K., & Gravemeijer, K. (2003). Learning about statistical covariation. Cognition and
Instruction, 21(1), 1–78.
Coulombe, W. N., & Berenson, S. B. (2001). Representations of patterns and functions: Tools for
learning. In A. A. Cuoco & F. R. Curcio (Eds.), The roles of representation in school mathematics
(2001 Yearbook) (pp. 166–172). Reston, VA: National Council of Teachers of Mathematics.
Crocker, J. (1981). Judgment of covariation by social perceivers. Psychological Bulletin, 90, 272–292.
Curcio, F. R. (2001). Developing data-graph comprehension in grades K through 8 (2d ed.). Reston, VA:
National Council of Teachers of Mathematics.
Department for Education and Employment. (1999). Mathematics: The national curriculum for England.
London: Author and Qualifications and Curriculum Authority.
Donnelly, J. F., & Welford, A. G. (1989). Assessing pupils’ ability to generalize. International Journal of
Science Education, 11, 161–171.
Estepa, A., & Batanero, C. (1996). Judgments of correlation in scatterplots: Students’ intuitive strategies
and preconceptions. Hiroshima Journal of Mathematics Education, 4, 21–41.
Inhelder, B., & Piaget, J. (1958). Random variations and correlations. In B. Inhelder & J. Piaget, The
growth of logical thinking from childhood to adolescence (A. Parsons & S. Milgram, Trans.) (pp.
224–242). London: Routledge & Kegan Paul.
Janvier, C. (1978). The interpretation of complex Cartesian graphs representing situations: Studies and
teaching experiments. Unpublished doctoral dissertation, University of Nottingham.
Jennings, D. L., Amabile, T. M., & Ross, L. (1982). Informal covariation assessment: Data-based versus
theory-based judgments. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under
uncertainty: Heuristics and biases (pp. 211–230). Cambridge, UK: Cambridge University Press.
Konold, C. (2002). Alternatives to scatterplots. In B. Phillips (Ed.), Proceedings of the Sixth International
Conference on Teaching Statistics. Cape Town, South Africa.
Krabbendam, H. (1982). The non-qualitative way of describing relations and the role of graphs: Some
experiments. In G. Van Barnveld & H. Krabbendam (Eds.), Conference on functions (Report 1, pp.
125–146). Enschede, The Netherlands: Foundation for Curriculum Development.
Leinhardt, G., Zaslavsky, O., & Stein, M. K. (1990). Functions, graphs and graphing: Tasks, learning and
teaching. Review of Educational Research, 60(1), 1–64.
McKnight, C. C. (1990). Critical evaluation of quantitative arguments. In G. Kulm (Ed.), Assessing
higher order thinking in mathematics (pp. 169–185). Washington, DC: American Association for the
Advancement of Science.
Mevarech, Z. R., & Kramarsky, B. (1997). From verbal descriptions to graphic representations: Stability
and change in students’ alternative conceptions. Educational Studies in Mathematics, 32, 229–263.
REASONING ABOUT COVARIATION
255
Meyer, J., Shinar, D., & Leiser D. (1997). Multiple factors that determine performance with tables and
graphs. Human Factors, 39(2), 268–286.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.).
Thousand Oaks, CA: Sage.
Ministry of Education. (1992). Mathematics in the New Zealand curriculum. Wellington, NZ: Author.
Moore, D. S. (1990). Uncertainty. In L. A. Steen (Ed.), On the shoulders of giants: New approaches to
numeracy (pp. 95–137). Washington, DC: National Academy Press.
Moritz, J. B. (2000). Graphical representations of statistical associations by upper primary students. In J.
Bana & A. Chapman (Eds.), Mathematics education beyond 2000 (Proceedings of the 23rd Annual
Conference of the Mathematics Education Research Group of Australasia, 2, 440–447. Perth:
Mathematics Education Research Group of Australasia.
Moritz, J. B. (2002). Study times and test scores: What students’ graphs show. Australian Primary
Mathematics Classroom, 7(1), 24–31.
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics.
Reston, VA: Author.
Nemirovsky, R. (1996a). A functional approach to algebra: Two issues that emerge. In N. Bednarz, C.
Kieran, & L. Lee (Eds.), Approaches to algebra: Perspectives for research and teaching (pp. 295–
313). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Nemirovsky, R. (1996b). Mathematical narratives, modeling, and algebra. In N. Bednarz, C. Kieran, & L.
Lee (Eds.), Approaches to algebra: Perspectives for research and teaching (pp. 197–220). Dordrecht,
The Netherlands: Kluwer Academic Publishers.
Piaget, J. (1983). Piaget’s theory. (G. Cellerier & J. Langer, Trans.) In P. Mussen (Ed.), Handbook of
child psychology (4th ed., Vol. 1, pp. 103–128). New York: Wiley.
Pinker, S. (1990). A theory of graph comprehension. In R. Freedle (Ed.), Artificial intelligence and the
future of testing (pp. 73–126). Hillsdale, NJ: Erlbaum.
Ross, J. A., & Cousins, J. B. (1993). Patterns of student growth in reasoning about correlational problems.
Journal of Educational Psychology, 85(1), 49–65.
Swan, M. (1985). The language of functions and graphs. University of Nottingham: Shell Center.
Swan, M. (1988). Learning the language of functions and graphs. In J. Pegg (Ed.), Mathematics
Interfaces: Proceedings of the 12th Biennial Conference of the Australian Association of Mathematics
Teachers (pp. 76–80). Newcastle, NSW: The New England Mathematical Association.
Swatton, P. (1994). Pupils’ performance within the domain of data interpretation, with particular
reference to pattern recognition. Research in Science and Technological Education, 12(2), 129–144.
Swatton, P., & Taylor, R. M. (1994), Pupil performance in graphical tasks and its relationship to the
ability to handle variables. British Educational Research Journal, 20, 227–243.
Tilling, L. (1975). Early experimental graphs. British Journal for the History of Science, 8, 193–213.
Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.
Watson, J. M. (2000). Statistics in context. Mathematics Teacher, 93(1), 54–58.
Watson, J. M., & Moritz, J. B. (1997). Student analysis of variables in a media context. In B. Phillips
(Ed.), Papers on Statistical Education Presented at ICME-8 (pp. 129–147). Hawthorn, Australia:
Swinburne Press.
Watson, J. M., & Moritz, J. B. (2001). Development of reasoning associated with pictographs:
Representing, interpreting, and predicting. Educational Studies in Mathematics, 48(1), 47–81.
Wavering, M. J. (1989). Logical reasoning necessary to make line graphs. Journal of Research in Science
Teaching, 26(5), 373–379.
Yerushalmy, M. (1997). Mathematizing verbal descriptions of situations: A language to support
modeling. Cognition and Instruction, 15(2), 207–264.
Chapter 11
STUDENTS’ REASONING ABOUT THE
NORMAL DISTRIBUTION
1
Carmen Batanero1, Liliana Mabel Tauber2, and Victoria Sánchez3
Universidad de Granada, Spain1, Universidad Nacional del Litoral, Santa Fe, Argentina2,
and Universidad de Sevilla, Spain3.
OVERVIEW
In
this paper we present results from research on students’ reasoning about the
normal distribution in a university-level introductory course. One hundred and
seventeen students took part in a teaching experiment based on the use of computers
for nine hours, as part of a 90-hour course. The teaching experiment took place
during six class sessions. Three sessions were carried out in a traditional classroom,
and in another three sessions students worked on the computer using activities
involving the analysis of real data. At the end of the course students were asked to
solve three open-ended tasks that involved the use of computers. Semiotic analysis
of the students’ written protocols as well as interviews with a small number of
students were used to classify different aspects of correct and incorrect reasoning
about the normal distribution used by students when solving the tasks. Examples of
students’ reasoning in the different categories are presented.
THE PROBLEM
One problem encountered by students in the introductory statistics course at
university level is making the transition from data analysis to statistical inference.
To make this transition, students are introduced to probability distributions, with
most of the emphasis placed on the normal distribution. The normal distribution is
an important model for students to learn about and use for many reasons, such as:
1
This research has been supported by DGES grant BS02000-1507 (M.E.C., Madrid).
257
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 257–276.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
258
CARMEN BATANERO ET AL.
•
•
•
•
Many physical, biological, and psychological phenomena can be reasonably
modeled by this distribution such as physical measures, test scores and
measurement errors.
The normal distribution is a good approximation for other distributions—
such as the binomial, Poisson, and t distributions—under certain conditions.
The Central Limit Theorem assures that in sufficiently large samples the
sample mean has an approximately normal distribution, even when samples
are taken from nonnormal populations.
Many statistical methods require the condition of random samples from
normal distributions.
We begin by briefly describing the foundations and methodology of our study.
We then present results from the students’ assessment and suggest implications for
the teaching of normal distributions. For additional analyses based on this study see
Batanero, Tauber, and Meyer (1999) and Batanero, Tauber, and Sánchez (2001).
THE LITERATURE AND BACKGROUND
Previous Research
There is little research investigating students’ understanding of the normal
distribution, and most of these studies examine isolated aspects in the understanding
of this concept. The first pioneering work was carried out by Piaget and Inhelder
(1951), who studied children’s spontaneous development of the idea of stochastic
convergence. The authors analyzed children’s perception of the progressive
regularity in the pattern of sand falling through a small hole (in the Galton apparatus
or in a sand clock). They considered that children need to grasp the symmetry of all the
possible sand paths falling through the hole, the probability equivalence between the
symmetrical trajectory, the spread and the role of replication, before they are able to
predict the final regularity that produces a bell-shaped (normal) distribution. This
understanding takes place in the formal operations stage (13- to 14-year-olds).
Regarding university students, Huck, Cross, and Clark (1986) identified two
erroneous conceptions about normal standard scores: On the one hand, some students
believe that all standard scores will always range between –3 and +3, while other
students think there is no restriction on the maximum and minimum values in these
scores. Each of those beliefs is linked to a misconception about the normal
distribution. The students who think that z-scores always vary from –3 to + 3 have
frequently used either a picture or a table of the standard normal curve, with this
range of variation. In a similar way, the students who believe that z-scores have no
upper or lower limits have learned that the tails of the normal curve are asymptotic
to the abscissa; thus they make an incorrect generalization, because they do not
notice that no finite distribution is exactly normal.
REASONING ABOUT THE NORMAL DISTRIBUTION
259
For example, if we consider the number of girls born out of 10 newborn babies,
this is a random variable X, which follows the binomial distribution with n = 10 and
p = 0.5. The mean of this variable is np = 5 and the variance is npq = 2.5. So the
maximum z-score that could be obtained from this variable is zmax = (10 – 5)/√2.5 =
3.16. Thus we have a finite limit, but it is greater than 3.
In related studies, researchers have explored students’ understanding of the Central
Limit Theorem and have found misconceptions regarding the normality of sampling
distributions (e.g., Vallecillos, 1996, 1999; Méndez, 1991; delMas, Garfield, &
Chance, 1999). Wilensky (1995, 1997) examined student behavior when solving
problems involving the normal distribution. He defined epistemological anxiety as
the feeling of confusion and indecision that students experience when faced with the
different paths for solving a problem. In interviews with students and professionals
with statistical knowledge, Wilensky asked them to solve a problem by using
computer simulation. Although most subjects in his research could solve problems
related to the normal distribution, they were unable to justify the use of the normal
distribution instead of another concept or distribution, and showed a high
epistemological anxiety.
Meaning and Understanding of Normal Distributions
in a Computer-Based Course
Our research is based on a theoretical framework about the meaning and
understanding of mathematical and statistical concepts (Godino, 1996; Godino &
Batanero, 1998). This model assumes that the understanding of normal distributions
(or any other concept) emerges when students solve problems related to that
concept. The meaning (understanding) of the normal distribution is conceived as a
complex system, which contains five different types of elements:
1. Problems and situations from which the object emerges. In our teaching
experiments, students solved the following types of problems: (a) fitting a
curve to a histogram or frequency polygon for empirical data distributions,
(b) approximating the binomial or Poisson distributions, and (c) finding the
approximate sampling distribution of the sample mean and sample
proportion for large samples (asymptotic distributions).
2. Symbols, words, and graphs used to represent or to manipulate the data and
concepts involved. In our teaching, we considered three different types of
representations:
a) Static paper-and-pencil graphs and numerical values of statistical
measures, such as histograms, density curves, box plots, stem-leaf plots,
numerical values of averages, spread, skewness, and kurtosis. These
might appear in the written material given to the students, or be obtained
by the students or teacher.
b) Verbal and algebraic representations of the normal distribution; its
properties or concepts related to normal distribution, such as the words
normal and distribution; the expressions density curve, parameters of the
260
CARMEN BATANERO ET AL.
normal distribution, the symbol N (ȝ, ı), equation of density function,
and so forth.
c) Dynamic graphical representations on the computer. The Statgraphics
software program was used in the teaching. This program offers a variety
of simultaneous representations on the same screen which are easily
manipulated and modified. These representations include histograms,
frequency polygons, density curves, box plots, stem-leaf plots, and
symmetry and normal probability plots. The software also allows
simulation of different distribution, including the normal distribution.
3. Procedures and strategies to solve the problem. Beyond the descriptive
analyses of the variables studied in the experiment, the students were
introduced to computing probabilities under the curve, finding standard
scores, and critical values (computed by the computer or by hand).
4. Definitions and properties. Symmetry and kurtosis: relative position of the
mean, median and mode, areas above and below the mean, probabilities
within one, two and three standard deviations, meanings of parameters,
sampling distributions for means and proportions, and random variables.
5. Arguments and proofs. Informal arguments and proofs made using graphical
representation, computer simulations, generalization, analysis, and synthesis.
SUBJECTS AND METHOD
Sample and Teaching Context
The setting of this study was an elective, introductory statistics course offered by
the Faculty of Education, University of Granada. The instruction for the topic of
normal distributions was designed to take into account the different elements of
meaning as just described. Taking the course were 117 students (divided into 4
groups), most of whom were majoring in Pedagogy or Business Studies. Some
students were from the School of Teachers Training, Psychology, or Economics.
At the beginning of the course students were given a test of statistical reasoning
(Garfield, 1991) to assess their reasoning about simple statistical concepts such as
averages or sampling, as well as to determine the possible existence of
misconceptions. An examination of students’ responses on the statistical reasoning
test revealed some errors related to sampling variability (representativeness
heuristics), sample bias, interpretation of association, and lack of awareness of the
effect of atypical values on averages. There was a good understanding of
probability, although some students showed incorrect conceptions about random
sequences.
Before starting the teaching of the normal distribution, the students were taught
the foundations of descriptive statistics and some probability, with particular
emphasis on helping them to overcome the biases and errors mentioned. Six 1.5-
REASONING ABOUT THE NORMAL DISTRIBUTION
261
hour sessions were spent teaching the normal distribution, and another 4 hours were
spent studying sampling and confidence intervals. Students received written material
specifically prepared for the experiment and asked to read it beforehand. Half of
these sessions were carried out in a traditional classroom, where the lecturer
introduced the normal distribution as a model to describe empirical data, using a
computer with projection facility. Three samples (n = 100, 1,000, and 10,000
observations) of intelligence quotient (IQ) scores were used to progressively show
the increasing regularity of the frequency histogram and polygon, when increasing
the sample size. The lecturer also presented the students with written material, posed
some problems to encourage the students to discover for themselves all the elements
of meaning described in section 3.2, and guided student discussion as they solved
these problems.
The remaining sessions were carried out in a computer lab, where pairs of
students worked on a computer to solve data analysis activities, using examples of
real data sets from students’ physical measures, test scores, and temperatures, which
included variables that could be fitted to the normal distribution and other variables
where this was not possible. Activities included checking properties such as
unimodality or skewness; deciding whether the normal curve provided a good fit for
some of the variables; computing probabilities under the normal curve; finding
critical values; comparing different normal distributions by using standardization;
changing the parameters in a normal curve to assess the effect on the density curve
and on the probabilities in a given interval; and solving application problems.
Students received support from their partner or the lecturer if they were unable to
perform the tasks, and there was also collective discussion of results.
Assessing Students’ Reasoning about the Normal Distribution
At the end of the course students were given three open-ended tasks, to assess
their reasoning about the normal distribution as part of a final exam that included
additional content beyond this unit. These questions referred to a data file students
had not seen before, which included qualitative and quantitative (discrete and
continuous) variables (See Table 1). The students worked alone with the
Statgraphics program, and they were free to solve the problem using the different
tools they were familiar with.
Each problem asked students to complete a task and to explain and justify their
responses in detail, following guidelines by Gal (1997), who distinguished two types
of questions to use when asking students to interpret statistical information. Literal
reading questions ask students for unambiguous answers—they are either right or
wrong. In contrast, to evaluate questions aimed at eliciting students’ ideas about
overall patterns of data, we need information about the evidential basis for the
students’ judgments, their reasoning process, and the strategy they used to relate
data elements to each other. The first type of question was taken into account in a
questionnaire with 21 items, which was also given to the students in order to assess
literal understanding for a wide number of elements of the normal distribution
262
CARMEN BATANERO ET AL.
(Batanero et al., 2001). The second type of question considered by Gal (1997) was
considered in the following open tasks given to students.
Task 1
In this data file, find a variable that could be fitted by a normal distribution. Explain
your reasons for selecting that variable and the procedure you have used.
In this task the student is asked to discriminate between variables that can be
well fitted to a normal distribution and others for which this is not possible. In
addition to determining the student’s criteria when performing the selection (the
properties they attribute to normal distributions), we expected students to analyze
several variables and use different approaches to check the properties of the different
variables to determine which would best approximate a normal distribution. We also
expected students to synthesize the results to obtain a conclusion from all their
analyses. We hoped that student responses to this task would reveal their reasoning.
Task 2
Compute the appropriate values of parameters for the normal distribution to which
you have fitted a variable chosen in question 1.
In this question the students have to remember what the parameters in a normal
distribution (mean and variance) are. We also expected them to remember how to
estimate the population mean from the sample mean and to use the appropriate
Statgraphic program to do this estimation. Finally, we expected the students to
discriminate between the ideas of statistics (e.g., measures based on sample data)
and parameters (e.g., measures for atheoretical population model).
Task 3
Compute the median and quartiles for the theoretical distribution you have
constructed in Task 2.
The aim is to evaluate the students’ reasoning about the ideas of median and
quartiles for a normal distribution. Again, discrimination between empirical data
distribution and the theoretical model used to fit these data is needed. We expect the
student to use the critical value facility of Statgraphics to find the median and
quartiles in the theoretical distribution. Those students who do not discriminate will
probably compute the median and quartile from the raw empirical data with the
summary statistics program.
The three tasks just described were also used to evaluate the students’ ability to
operate the statistical software and to interpret its results. Since the students were
free to solve the tasks using any previous knowledge to support their reasoning, we
could evaluate the correct or incorrect use of the different meaning elements
(representations, actions, definitions, properties, and arguments) that we defined
earlier and examine how these different elements were interrelated.
Each student worked individually with the Statgraphics and produced a written
report using the word processor, in which they included all the tables and graphs
needed to support their responses. Students were encouraged to give detailed
REASONING ABOUT THE NORMAL DISTRIBUTION
263
reasoning. Once the data were collected, the reports were printed and we made a
content analysis. We identified which elements of meaning each student used
correctly and incorrectly to solve the tasks.
In the next section we provide a global analysis for each question and then
describe the elements of meaning used by the students.
RESULTS AND ANALYSIS
Students’ Perception of Normality
In Table 1, we include the features of variables in the file and the frequency and
percentage of students who selected each variable in responding to the first question.
The normal distribution provided a good fit for two of these variables: Time to run
30 m (December) and Heartbeats after 30 press-ups. The first variable, Time to run
30 m, was constructed by simulating a normal continuous distribution. Normality
can be checked easily in this variable from its graphical representation; the skewness
and kurtosis coefficient were very close to zero, although the mean, median, and
mode did not exactly coincide. Heartbeats after 30 press-ups was a discrete
variable; however, its many different values, its shape, and the values of its different
parameters suggested that the normal distribution could provide an acceptable fit.
Table 1. Description of the variables that students considered to fit a normal distribution well
Variable
Age
Variable Features
Variable type Skewness Kurtosis
Discrete; three
different values
Height
Continuous
Multimodal
Heartbeats after Discrete; many
30 press-ups*
different values
Time spent to run Continuous
30 m.(Dec.)*
Weight
Continuous
Atypical values
Heartbeats at rest Discrete; many
different values
Time spent to run Continuous
30 m. (Sep.)
No answer
0
–0.56
Students
Mean, median, choosing this
variable (%)
and mode
13, 13, 13
27 (23.1)
0.85
2.23
156.1, 155.5, †
26 (22.2)
0.01
–0.19
123.4, 122, 122
37 (31.6)
0.23
–0.42
4.4, 4.4, 5.5
12 (10.3)
2.38
9.76
48.6, 46, 45
4 (3.4)
0.2
–0.48
71.4, 72, 72
6 (5.2)
2.4
12.2
5.3, 5.2, 5
4 (3.4)
9 (7.2)
* Correct answer. The normal distribution is a good approximation for these variables
† Although Height had in fact three modes: 150, 155, 157, that were visible from the stem plot, this was
noticeable only from the histogram with specific interval widths.
264
CARMEN BATANERO ET AL.
The variable Height, despite being symmetric, had kurtosis higher than expected
and was multimodal, though this was noticeable only by examining a stem-and-leaf
plot or histogram of the data.
Some of these students confused the empirical data distribution for Age (Fig. 1a)
with the theoretical distribution they fitted to the data. In Figure 1b the data
frequency histogram for Age and a superimposed theoretical normal curve are
plotted. Some students just checked the shape of the theoretical density curve (the
normal curve with the data mean and standard deviation) without taking into account
whether the empirical histogram approached this theoretical curve or not.
Figure 1. (a) Empirical density curve for Age (b) Theoretical normal curve fitted to Age.
Twenty-two percent of students selected a variable with high kurtosis (Height).
In the following example, while the student could perceive the symmetry from the
graphical representation of data, this graph was however unproductive as regards the
interpretation of the standard kurtosis coefficient (4.46) that was computed by the
student. The student did not compute the median and mode. We assume he visually
perceived the curve symmetry and from this property he assumed the equality of
mean, median, and mode.
Example 2
“I computed the mean (156.1) and standard deviation (8, 93) and they approach those
from the normal distribution. Then I represented the data (Figure 2) and it looks very
similar to the normal curve. The values of mean, median and mode also coincide. Std
Kurtosis = 4.46” (Student 2).
REASONING ABOUT THE NORMAL DISTRIBUTION
265
Figure 2. Density trace for Height.
Finding the Parameters
Table 2 displays the students’ solutions to question 2. Some students provided
incorrect parameters or additional parameters such as the median that are not needed
to define the normal distribution. In Example 3, the student confuses the tail areas
with the distribution parameters. In Example 4, the student has no clear idea of what
the parameters are and he provides all the summary statistics for the empirical
distribution.
Example 3
“These are the distribution parameters for the theoretical distribution I fitted to the
variable pulsation at rest:
area below 98.7667 = 0.08953
area below 111.113 = 0.25086
area below 123.458 = 0.5” (Student 3)
Example 4
“Count=96, Average = 123.458, Median = 122.0, Mode = 120.0, Variance = 337.682,
Standard deviation = 18.3761, Minimum = 78.0, Maximum = 162.0, Range = 84.0,
Skewness = 0.0109784, Stnd. Skewness = 0.043913, Kurtosis = –0.197793, Stnd.
Kurtosis = –0.395585, Coeff. of variation = 14.8845%, Sum = 11852.0” (Student 4).
These results suggest difficulties in understanding the idea of parameter and the
difference between theoretical and empirical distributions.
Table 2. Frequency and percentage of responses in computing the parameters
Response
Correct parameters
Incorrect or additional parameters
No answer
Number and Percentage
60 (51)
18 (15)
39 (33)
266
CARMEN BATANERO ET AL.
Computing Percentiles in the Theoretical Distribution
Table 3 presents a summary of students’ solutions to question 3. About 65% of
the students provided correct or partly correct solutions in computing the median
and quartiles. However, few of them started from the theoretical distribution of
critical values to compute these values. Most of the students computed the quartiles
in the empirical data, through different options such as frequency tables or statistical
summaries; and a large proportion of students found no solution. In the following
example the student is using the percentiles option in the software, which is
appropriate only for computing median and quartiles in the empirical distribution.
He is able to relate the idea of median to the 50th percentile, although he is unable to
relate the ideas of quartiles and percentiles. Again, difficulties in discriminating
between the theoretical and the empirical distribution are noticed.
Example 5
“These are the median and quartiles of the theoretical normal distribution for Age.
The median is 13. Percentiles for Age: 1.0% = 12.0, 5.0% = 12.0, 10.0% = 12.0,
25.0% = 13.0, 50.0% = 13.0, 75.0% = 13.0, 90.0% = 14.0, 95.0% = 14.0,
99.0% = 14.0” (Student 1)
Table 3. Frequency and percentages of students’ solutions classified by type of distribution
Correct
Partly correct
Incorrect
No solution
Theoretical
21 (17.9)
9 (7.7)
2 (1.7)
Type of distribution used
Empirical
29 (24.8)
14 (12.0)
17 (14.5)
None
1 (0.9)
4 (3.4)
20 (17.1)
Students’ Reasoning and Understanding of Normal Distribution
Besides the percentage of correct responses to each question, we were interested
in assessing the types of knowledge the students explicitly used in their solutions.
Using the categorization in the theoretical framework we described in Section 2, we
analyzed the students’ protocols to provide a deeper picture of the students’
reasoning and their understanding of normal distributions. Four students were also
interviewed after they completed the tasks. They were asked to explain their
procedures in detail and, when needed, the researcher added additional questions to
clarify the students’ reasoning in solving the tasks. In this section we analyze the
results, which are summarized in Table 4 and present examples of the students’
reasoning in the different categories.
REASONING ABOUT THE NORMAL DISTRIBUTION
267
Symbols and Representations
Many students in both groups correctly applied different representations, with a
predominance of density curves, and a density curve superimposed onto a
histogram. Their success suggests that students were able to correctly interpret these
graphs, and could find different properties of data such as symmetry or unimodality
from them as in Example 6, where there is a correct use of two graphs to assess
symmetry.
Example 6
“You can see that the distribution of the variable weight is not symmetrical, since the
average is not in the centere of the variable range (Figure 3). The areas over and
below the centre are very different. When comparing the histogram with the normal
density curve, this skews to the left” (Student 5).
Figure 3. Histogram and density trace for Weight.
Among numerical representations, the use of parameters (mean and standard
deviation) was prominent, in particular to solve task 2. Statistical summaries were
correctly applied when students computed the asymmetry and kurtosis coefficients,
and incorrectly applied when they computed the median and quartiles, since in that
question the students used the empirical distribution instead of the theoretical curve
(e.g., in Example 5). Few students used frequency tables and critical values. We
conclude that graphical representations were more intuitive than numeric values,
since a graph provides much more information about the distribution, and the
interpretation of numerical summaries requires a higher level of abstraction.
Actions
The most frequent action was visual comparison (e.g., Examples 2, 6), although
it was not always correctly performed (such as in Example 2, where the student was
unable to use the graph to assess the kurtosis). A high percentage of students
correctly compared the empirical density correctly with the theoretical normal
density (e.g., Example 6). However, 40% of the students confused these two curves.
268
CARMEN BATANERO ET AL.
Table 4. Frequency of main elements of meaning used by the students in solving the tasks
Elements of Meaning
Symbols and Representations
Graphical representations
Normal density curve
Over imposed density curve and histogram
Normal probability plot
Cumulative density curve
Histogram
Frequency polygon
Box plot
Symmetry plot
Numerical summaries
Critical values
Tail areas
Mean and standard deviation (as parameters in the
distribution
Goodness of fit test
Steam-leaf
Summaries statistics
Frequency tables
Percentiles
Actions
Computing the normal distribution parameters
Changing the parameters
Visual comparison
Computing normal probabilities
Finding critical values
Descriptive study of the empirical distribution
Finding central interval limits
Concepts and properties
Symmetry of the normal curve
Mode, Unimodality in the normal distribution
Parameters of the normal distribution
Statistical properties of the normal curve
Proportion of values in central intervals
Theoretical distribution
Kurtosis in the normal distribution; kurtosis coefficients
Variable: qualitative, discreet, continuous
Relative position of mean, median, mode in a normal
distribution
Skewness and standard skewness coefficients
Atypical value
Order statistics: quartiles, percentiles
Frequencies: absolute, relative, cumulative
Arguments
Checking properties in isolated cases
Applying properties
Analysis
Graphical representation
Synthesis
Correct Use
Incorrect Use
45 (38.5)
30 (25.6)
6 (5.1)
2 (1.7)
37 (31.6)
12 (10.3)
2 (1.7)
1 (0.9)
1 (0.9)
29 (24.8)
3 (2.6)
4 (3.4)
5 (4.3)
48 (41.0)
3 (2.6)
2 (1.7)
5 (4.3)
59 (50.4)
26 (22.2)
9 (7.7)
2 (1.7)
47 (40.2)
9 (7.7)
50 (42.7)
10 (8.5)
56 (47.9)
13 (11.1)
28 (23.9)
39 (33.3)
14 (12)
18 (15.4)
2 (1.7)
49 (41.9)
1 (0.9)
68 (58.1)
8 (6.8)
40 (34.2)
32 (27.4)
51 (46.3)
27 (26.1)
13 (11.1)
48 (41.0)
27 (26.1)
50 (42.7)
13 (11.1)
16 (13.7)
16 (13.7)
3 (2.6)
1 (0.9)
50 (42.7)
1 (0.9)
65 (55.6)
35 (29.9)
5 (4.3)
34 (29.1)
5 (4.3)
32 (27.4)
13 (11.1)
1 (0.9)
18 (15.4)
58 (49.6)
32 (27.4)
58 (49.6)
26 (22.2)
63 (53.8)
3 (2.6)
7 (6.0)
5 (4.3)
36 (30.8)
4 (3.4)
REASONING ABOUT THE NORMAL DISTRIBUTION
269
For example, regarding the variable of Age (Figure 1a), the empirical density curve
is clearly nonnormal (since there is no horizontal asymptote). The students who,
instead of using this empirical density, compared the histogram with the normal
theoretical distribution (Figure 1b) did not perceive that the histogram was not well
fitted to the same, even when this was clearly visible in the graph.
A fair number of students correctly computed the parameters, although a large
percentage made errors in computing the critical values for the normal distribution
(quartiles and median, as in Example 5). Even when the computer replaces use of
the normal distribution tables, it does not solve all the computing problems, since
the students had difficulties in understanding the idea of critical values and in
operating the software options. Finally, some students performed a descriptive study
of data before fitting the curve.
Concepts and Properties
Students correctly used the different specific properties of the normal
distribution as well as the definition of many related concepts. The most common
confusion was thinking that a discrete variable with only three different values was
normal (e.g., Examples 1, 5). This was usually because students were unable to
distinguish between the empirical and the theoretical distribution. Other authors
have pointed out the high level of abstraction required to distinguish between model
and reality, as well as the difficulties posed by the different levels in which the same
concept is used in statistics (Schuyten, 1991; Vallecillos, 1994).
An interesting finding is that very few students used the fact that the proportion
of cases within one, two, and three standard deviations is 68%, 95%, and 99%, even
when we emphasized this property throughout the teaching. This suggests the high
semiotic complexity required in applying this property where different graphical and
symbolic representations, numerical values of parameters and statistics, concepts
and properties, and actions and arguments need to be related, as shown later in
Example 7.
The scant number of students who interpreted the kurtosis coefficient, as
compared with the application of symmetry and unimodality, is also revealing.
Regarding the parameters, although most students used this idea correctly, errors
still remain. Some students correctly compared the relative position of the measures
of central position in symmetrical and asymmetrical distributions, although some of
them just based their selection on this property and argued it was enough to assure
normality.
Arguments
The use of graphical representations was predominant in producing arguments.
In addition to leading to many errors, this also suggests the students’ difficulty in
producing high-level arguments such as analysis and synthesis. Most students just
applied or checked a single property, generally symmetry. They assumed that one
necessary condition was enough to assure normality. This is the case in Example 7,
270
CARMEN BATANERO ET AL.
where the student correctly interprets symmetry from the symmetry plot and then
assumes this is enough to prove normality.
Example 6
“We can graphically check the symmetry of Time spent to run 30 Mts. in December
with the symmetry plot (Figure 4), as we see the points approximately fit the line;
therefore the normal distribution will fit these data” (Student 6).
Figure 4. Symmetry plot.
In other cases the students checked several properties, although they forgot to
check one of the conditions that is essential for normality, such as in the following
interview, where the student studied the type of variable (discrete, continuous),
unimodality, and relative position of mean, median and mode. However, he forgot to
assess the value of the kurtosis coefficient, which is too high for a normal
distribution (Student 7):
Teacher: In the exam you selected Time to run 30 Mts. in December as a normal
distribution. Why did you choose that variable?
Student: I first rejected all the discrete variables since you need many different
values for a discrete variable to be well fitted to a normal distribution.
Since the two variables Time to run 30 Mts. in December and Time to run
30 Mts. in September are continuous I took one of them at random. I just
might also have taken Time to run 30 Mts. in September. Then I realized
the variable has only one mode, the shape was very similar to the normal
distribution, mean and median were similar.
Teacher: Did you do any more analyses?
Student: No, I just did those.
A small number of students applied different elements of meaning, and carried
out an analysis of each property. Seven percent of them produced a final synthesis,
such as the following student.
REASONING ABOUT THE NORMAL DISTRIBUTION
271
Example 8
“The variable Heartbeats after 30 press-ups is what I consider best fits a normal
distribution. It is a numerical variable. The variable is symmetrical, since both the
histogram and the frequency polygon (Figure 5) are approximately symmetrical. On
the other hand the skewness coefficient is close to zero (0.0109) and standard
skewness coefficient falls into the interval (–2, +2). We also observe that the kurtosis
coefficient is close to zero (–0.1977) which suggests the variable can fit a normal
distribution.
Furthermore, we know that in normal distributions, mean median and mode coincide
and in this case the three values are very close (Mean = 123.4; Mode = 120; Median
= 122). Moreover there is only one mode. As for the rule 68,95,99.7 in the interval (µ
– σ, µ + σ) (105.08.141.82) there are 68.75% of the observations, in the interval
(µ – 2σ, µ + 2σ) (86.81,160.19) there is 95.84% and in the interval (µ – 3σ, µ +
3σ) (68.34,178.56) we found 100% of the data. These data are very close.
Therefore you can fit a normal distribution to these data" (Student 8).
Figure 5. Histogram and frequency polygon for Heartbeats after 30 press-ups.
In this answer, the student relates the property of symmetry (concept) to the
histogram and frequency polygon (representations). He is able to compute (action)
the skewness and kurtosis coefficients (numerical summaries) and compares their
values with those expected in normal distributions (properties and concepts). He
also applies and relates the property of relative positions of central tendency
measure and central intervals in a normal distribution, being able to operate the
software (action) in order to produce the required graphs and summaries, which are
correctly related and interpreted. This type of reasoning requires the integration of
many different ideas and actions by the student.
Other students provided incorrect variables, even when they were able to use the
software and to correctly produce a great number of different graphs. In Example 9
the student is able to plot different graphs and compute the quartiles. However, he is
neither able to extract the information needed to assess normality from these graphs
nor capable of relating the different results with the concepts behind them. No
arguments linking these different representations or supporting his election are
given. Moreover, he did not relate the high kurtosis coefficient to a lack of
normality. The graphs and statistics produced are presented in Figure 6.
272
CARMEN BATANERO ET AL.
Example 9
“I selected Height since the normal distribution is used to describe real data. And
describing the students’ height is a real biological problem. This is also a quantitative
variable and normal distribution describes quantitative variables” (Student 9).
Stem-and-Leaf Display for HEIGHT: unit = 1.0 1|2 represents 12.0
2
13|88
6
14|0000
16
14|5566777799
40
15|000000002222223333444444
(28)
15|5555555566667777777788889999
28
16|0000001122222244
12
16|555577
6
17|11
HI|182,0 182,0 185,0 185,0
Summary Statistics for Height: Count = 96, Median = 155.5, Lower quartile = 151.0
Upper quartile = 160.0, Stnd. skewness = 3.4341, Stnd. kurtosis = 4.46366
Figure 6. Graphical representations and statistical summaries for Height.
Discussion
Many students grasped the idea of model, and showed a good understanding of
the usefulness of models, density curves, and areas under the normal curve. Our
analysis of the various actions, representations, concepts, properties, and arguments
used by the students in solving the tasks suggests that many students were able to
correctly identify many elements in the meaning of normal distribution and to relate
one to another. Some examples are as follows:
•
•
Relating concepts and properties. For example, relating the idea of symmetry
to skewness coefficient or to relative position of mean, median, and mode in
Examples 6, 7, and 8.
Relating graphical representations to concepts. For example, relating the
empirical histogram and density curve shapes to the theoretical pattern in a
normal curve (e.g., in Example 8).
REASONING ABOUT THE NORMAL DISTRIBUTION
•
•
•
273
Relating the various graphic representations and data summaries to the
software options and menus they need to produce them (relating
representations and actions in all the examples).
Relating the definition and properties of normal distribution to the actions
needed to check the properties in an empirical data set (e.g., in Example 8).
There was a good understanding of the idea of mean and standard deviation
and its relationship to the geometrical properties of the normal curve (e.g.,
Example 2).
There was also a clear disagreement between the personal meaning of normal
distribution acquired by the students and the meaning we tried to teach them. Here
we describe the main difficulties observed:
1. Perceiving the usefulness of theoretical models to describe empirical data.
This is shown in the following transcript (Student 10):
Teacher: Now that you know what the normal distribution is, can you tell me
what it is useful for or in which way you can apply the normal
distribution?
Student: For comparing, isn’t it? For example to compare data and tables, it is
difficult to explain. … You have some data and you can repeat with
the computer what we did in the classroom.
2. Interpreting areas in frequency histograms and computing areas in the cases
when a change in the extremes of intervals is needed. This point is not
specific to the normal distribution or to the use of computers, and the student
should have learned it at the secondary school level. However, in the
following interview transcript, the student is not aware of the effect of
interval widths on the frequency represented, which is given by the area
under the histogram (Student 10):
Teacher: How would you find the frequency in the interval 0–10 in this
histogram?
Student: The frequency is 5, this is the rectangle height.
Teacher: What about the frequency for the interval 10–30?
Student: It is 10, that is the height of this rectangle.
3. Interpreting probabilities under the normal curve. The graphical
representation of the areas under the normal curve is the main didactic tool
for students to understand the computation of probabilities under the curve
and, at the same time to solve different problems involving the normal
distribution. However, for some students with no previous instruction, this
computation was not easily understood and performed.
4. We also observed difficulties in discriminating between empirical data and
mathematical models, interpreting some statistical summaries and graphs,
274
CARMEN BATANERO ET AL.
and a lack of analysis and synthesis ability to relate all these properties when
making a decision (Student 11).
Teacher: When you computed the median and quartiles in question 3, which
data did you use: the theoretical normal distribution you fit to the data
or the real data?
Student: I … I am not very sure. Well, I used the data file …
5. There was a great deal of difficulty in discriminating between the cases
where a discrete quantitative variable can and cannot be fitted by a normal
distribution (e.g., in Example 5) and even in distinguishing between the
different types of variables.
6. Other students misinterpreted the skewness coefficient or assumed that the
equality of mean, median and mode was enough to show the symmetry of
the distribution, accepted as normal a distribution with no horizontal
asymptote, made a rough approximation when formally or informally
checking the rule (µ – kσ, µ + kσ), accepted too many outliers in a normal
distribution, or misinterpreted the values of kurtosis.
Even when most of the students were able to change from the local to the global
view of data (Ben-Zvi & Arcavi, 2001) in taking into account the shape of graphs as
a whole, the idea of distribution as a property of a collective, and the variability of
data, there is still a third level of statistical reasoning many of these students did not
reach. This is the modeling viewpoint of data, where students need to deal at the
same time with an empirical distribution as a whole (therefore, they need to adopt a
global viewpoint of their data) and the mathematical model (the normal distribution
in our research). In this modeling perspective, students need to concentrate on the
different features of the data set as a whole and on the different features of the model
(type of variable, unimodality, skewness, percentage of central cases, horizontal
asymptote, etc., in our case). In addition to understanding the model as a complex
entity with different components, they should be able to distinguish the model from
the real data, to compare the real data to the model, and to make an accurate
judgment about how well the model fits the data.
There was also difficulty in using secondary menu options in the software—
which, however, are frequently essential in the analysis. Finally, the students
showed scant argumentative capacity, in particular regarding analysis and synthesis
(e.g., in Example 9).
IMPLICATIONS FOR TEACHING NORMAL DISTRIBUTIONS
The main conclusion in this study is that the normal distribution is a very
complex idea that requires the integration and relation of many different statistical
concepts and ideas. Recognizing this complexity, our work also suggests that it is
possible to design teaching activities that facilitate the learning of basic notions
REASONING ABOUT THE NORMAL DISTRIBUTION
275
about normal distribution. Since the learning of computational abilities is no longer
an important objective, an intuitive understanding about basic concepts is possible
for students with moderate mathematical knowledge, whenever we choose
appropriate tasks.
Working with computer tools seemed to promote graphical understanding, as
students in our experiment easily recognized and used many different plots (such as
density curves, histograms, etc.) to solve the problems proposed. Moreover, they
also showed a good understanding of many abstract properties, such as the effect of
parameters on the density curve shape, and made extensive use of graphs as part of
their argumentation. This suggests the essential role of computers to facilitate
students’ exploration of these properties and representations.
It is important that students understand basic concepts such as probability,
density curve, spread and skewness, and histograms before they start the study of
normal distribution; its understanding is based on these ideas. They should also be
confident in the use of software before trying to solve problems related to the normal
distribution, since they often misinterpret or confuse results from different software
options.
The student’s difficulties in discriminating between theoretical models and
empirical data suggest that more activities linking real data with the normal model
are needed. Simulating data from normal distributions and comparing them with real
data sets might also be used as an intermediate step between mathematical model
and reality. As a didactic tool it can serve to improve students’ probabilistic
intuition, to teach them the different steps in the work of modeling (Dantal, 1997),
and to help them discriminate between model and reality. Simulation experiences
and dynamic visualization can contribute, as analyzed by Biehler (1991), to provide
students with a stochastic experience difficult to reach in the real world.
Finally, it is important to take into account the different components of meaning
and understanding when assessing students’ learning. Computer-based assessment
tasks in which students are asked to analyze simple data sets and provide a sound
argument for their responses—such as those presented in this paper—are a good tool
to provide a complete picture of students’ understanding and ways of reasoning.
REFERENCES
Batanero, C., Tauber, L., & Meyer, R. (1999). From data analysis to inference: A research project on the
teaching of normal distributions. Bulletin of the International Statistical Institute: Proceedings of the
Fifty-Second Session of the International Statistical Institute (Tome LVIII, Book 1, pp. 57–58).
Helsinki, Finland: International Statistical Institute.
Batanero, C., Tauber, L., & Sánchez, V. (2001). Significado y comprensión de la distribución normal en
un curso introductorio de análisis de datos (Meaning and understanding of normal distributions in an
introductory data analysis course). Quadrante, 10(1), 59–92.
Ben-Zvi, D. (2000). Towards understanding the role of technological tools in statistical learning.
Mathematics Thinking and Learning, 2(1&2), 127–155.
Ben-Zvi, D., & Arcavi, A. (2001). Junior high school student’s construction of global views of data and
data representations. Educational Studies in Mathematics, 43, 35–65.
Biehler, R. (1991). Computers in probability education. In R. Kapadia & M. Borovcnick (Eds.), Chance
encounters: Probability in education (pp. 169–211). Dordrecht, The Netherlands: Kluwer.
276
CARMEN BATANERO ET AL.
Dantal, B. (1997). Les enjeux de la modélisation en probabilité (The challenges of modeling in
probability). In Enseigner les probabilités au lycée (pp. 57–59). Reims: Commission Inter-IREM
Statistique et Probabilités.
delMas, R. C., Garfield, J. B., & Chance, B. (1999). Exploring the role of computer simulations in
developing understanding of sampling distributions. Paper presented at the Annual Meeting of the
American Educational Research Association, Montreal, Canada.
Gal, I. (1997). Assessing students’ interpretations of data: Conceptual and pragmatic issues. In B. Phillips
(Ed.), Papers on Statistical Education presented at ICME-8 (pp. 49–58). Swinburne, Australia:
Swinburne University of Technology.
Garfield, J. B. (1991). Evaluating students’ understanding of statistics: Developing the statistical
reasoning assessment. In R. G. Underhill (Ed.), Proceedings of the 13th Annual Meeting of the North
American Chapter of the International Group for the Psychology of Mathematics Education (Vol. 2,
pp. 1–7). Blacksburg, VA: Comité organizador.
Godino, J. D. (1996). Mathematical concepts, their meaning and understanding. In L. Puig & A. Gutiérrez
(Eds.), Proceedings of the 20th Conference of the International Group for the Psychology of
Mathematics Education (Vol. 2, pp. 417–424). Valencia: Comité organizador.
Godino, J. D., & Batanero, C. (1998). Clarifying the meaning of mathematical objects as a priority area of
research in mathematics education. In A. Sierpinska & J. Kilpatrick (Eds.), Mathematics Education as
a research domain: A search for identity (pp. 177–195). Dordrecht: Kluwer.
Huck, S., Cross, T. L., & Clark, S. B. (1986). Overcoming misconceptions about z-scores. Teaching Statistics,
8(2), 38–40.
Méndez, H. (1991). Understanding the central limit theorem. Ph.D. diss., University of California,
University Microfilm International number 1-800-521-0600.
Piaget, J., & Inhelder, B. (1951). La génese de l'idée de hasard chez l'enfant. Paris: Presses Universitaires
de France.
Rubin, A., Bruce, B., & Tenney, Y. (1991). Learning about sampling: Trouble at the core of statistics. In
D. Vere-Jones (Ed.), Proceedings of the Third International Conference on Teaching Statistics (pp.
314–319). Voorburg, The Netherlands: International Statistical Institute.
Schuyten, G. (1991). Statistical thinking in psychology and education. In D. Vere-Jones (Ed.),
Proceedings of the III International Conference on Teaching Statistics (Vol. 2, pp. 486–489).
Dunedin, Australia: University of Otago.
Vallecillos, A. (1996). Inferencia estadística y enseñanza: Un análisis didáctico del contraste de
hipótesis estadísticas (Statistical inference and teaching: A didactical analysis of statistical tests).
Madrid: Comares.
Vallecillos, A. (1999). Some empirical evidence on learning difficulties about testing hypotheses. Bulletin
of the International Statistical Institute: Proceedings of the Fifty-Second Session of the International
Statistical Institute (Tome LVIII, Book 2, pp. 201–204). Helsinki, Finland: International Statistical
Institute.
Wild, C., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–265.
Wilensky, U. (1995). Learning probability through building computational models. In D. Carraher y L.
Meira (Eds.), Proceedings of the 19th PME Conference (Vol. 3, pp. 152–159). Recife, Brazil:
Organizing Committee.
Wilensky, U. (1997). What is normal anyway? Therapy for epistemological anxiety. Educational Studies
in Mathematics, 33, 171–202.
Chapter 12
DEVELOPING REASONING ABOUT SAMPLES
Jane M. Watson
University of Tasmania, Australia
INTRODUCTION
Although reasoning about samples and sampling is fundamental to the legitimate
practice of statistics, it often receives little attention in the school curriculum. This
may be related to the lack of numerical calculations—predominant in the
mathematics curriculum—and the descriptive nature of the material associated with
the topic. This chapter will extend previous research on students’ reasoning about
samples by considering longitudinal interviews with 38 students 3 or 4 years after
they first discussed their understanding of what a sample was, how samples should
be collected, and the representing power of a sample based on its size. Of the six
categories of response observed at the time of the initial interviews, all were
confirmed after 3 or 4 years, and one additional preliminary level was observed.
THE PROBLEM
Although appropriate sampling is the foundation of all inferential statistics, the
topic rarely achieves a high profile in curriculum documents at the school level.
Whether this is because the topic is more descriptive and less numerical than most in
the mathematics curriculum or because it is acknowledged to be difficult for students
to appreciate fully (National Council of Teachers of Mathematics [NCTM], 2000, p.
50) is unknown. Data collection is mentioned as part of Data Analysis and
Probability in the NCTM’s Principles and Standards but rarely with the emphasis—
for example, on the importance of randomness (p. 326)—that might be expected.
Perhaps the most salient reminder of the importance of sampling is found in the
Australian Education Council’s (AEC) National Statement on Mathematics for
Australian Schools (1991) in the context of a general statement on statistical
inference:
277
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 277–294.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
278
JANE M. WATSON
The dual notions of sampling and of making inferences about populations, based on
samples, are fundamental to prediction and decision making in many aspects of life.
Students will need a great many experiences to enable them to understand principles
underlying sampling and statistical inference and the important distinctions between a
population and a sample, a parameter and an estimate. Although this subheading
[Statistical Inference] first appears separately in band C [secondary school], the
groundwork should be laid in the early years of schooling in the context of data
handling and chance activities. (AEC, 1991, p. 164)
Related to this, upper primary students should “understand what samples are,
select appropriate samples from specified groups and draw informal inferences from
data collected” (p. 172), and high school students should “understand what samples
are and recognize the importance of random samples and sample size, and draw
inferences and construct and evaluate arguments based on sample data” (p. 179).
Again it is noteworthy that calculations involving mathematical formulas are not
involved in these statements, hence the reasoning involved may not be based on a
preliminary mathematics skill base. Developing reasoning related to sampling may
be associated with developing literacy and social reasoning skills rather than
developing numeracy skills. This is potentially an unusual situation for the
mathematics curriculum.
THE LITERATURE AND BACKGROUND
Except for research with college students on issues of sample size and
representativeness that grew from the early work of Tversky and Kahneman (e.g.,
1971, 1974), little research has taken place until recently on school students’
understanding of sampling. In this context, however, reviewers (e.g., Shaughnessy,
1992; Shaughnessy, Garfield, & Greer, 1996) have suggested that school students
are susceptible to the representativeness heuristic; that is, they have difficulty with
the idea of variability in populations, have too much confidence in small samples,
and do not appreciate the importance of sample size in random samples. In the early
1990s the research of Wagner and Gal (1991) with elementary students found that
responses in comparing two groups depended on whether students assumed
homogeneity or appreciated natural variation, whereas Rubin, Bruce, and Tenney
(1991) found, with senior high school students, a tension between the reality of
variability within samples and the need for sample representativeness. Although not
specifically addressing sampling issues, Mokros and Russell (1995) in their study of
students’ understanding of average, discovered an increasing awareness of
representativeness associated with the perceived need to measure an average to
represent a set. In work with upper elementary students, Jacobs (1997, 1999) and
Schwartz, Goldman, Vye, Barron, and The Cognition and Technology Group at
Vanderbilt (1998) found that students preferred biased sampling methods, such as
voluntary participation, due to perception of fairness, allowing everyone an
opportunity and not forcing anyone to participate. Metz (1999) interviewed
elementary students who had been involved in designing their own science
experiments, finding many supporting the power of sampling for appropriate reasons
REASONING ABOUT SAMPLES
279
and few succumbing to the “law of small numbers,” that is, putting too much faith in
small samples. There were, however, also many who argued against sampling due to
the need to test all members of the population or due to the variability in the
population.
The appreciation of the need to consider sampling and other aspects of statistical
reasoning in social contexts, for example based in media reports, led Watson (1997)
and Gal (2000) to suggest structures for considering student progress to reach the
goal of statistical literacy for participation in decision making in society (Wallman,
1993). Watson’s three-tiered hierarchy will be discussed below; and Gal’s fourdimensional framework includes the importance of background knowledge, the skills
to read and comprehend statistical information in context, a set of critical questions
to apply in contexts, and the dispositions and beliefs that allow for questioning and
acknowledgment that alternative explanations are possible.
The previous research most closely related to the current study is that of Watson,
Collis, and Moritz (1995) based on surveys of 171 and interviews of 30 girls in a
South Australian school and of Watson and Moritz based on large-scale longitudinal
surveys of over 3,000 students (2000b) and interviews with 62 students (2000a)
throughout the state of Tasmania. These studies were based on the three survey items
in Figure 1 and the interview protocol in Figure 2. The analysis of Watson et al.
(1995) also included a fourth part of the interview protocol with a sampling task
comparing expected average values from samples with given population averages
(Tversky & Kahneman, 1971). This task will not be considered as part of the current
study.
Three theoretical frameworks were used as part of the earlier research. The first
was related to the statistical content on sampling as reflected in the literature for
students at various levels (e.g., Corwin & Friel, 1990; Landwehr, Swift, & Watkins,
1987; Moore, 1991; Orr, 1995). The second was a cognitive development taxonomy
based on the structure of observed learning outcomes (SOLO) of Biggs and Collis
(1982; 1991). The main interest in terms of analyzing responses that addressed
sampling issues was the increased structural complexity shown. Unistructural (U)
responses employed single elements of the tasks and did not recognize contradictions
if they arose. Multistructural (M) responses used more than one element in a
sequential fashion, often recognizing contradictions but unable to resolve them.
Relational (R) responses integrated elements of the tasks to produce complete
solutions free of contradictions. The third framework was Watson’s (1997) threetiered hierarchy of statistical literacy applied to sampling. Tier 1 related to
understanding terminology associated with sampling. Tier 2 covered the application
and understanding of sampling terminology as it occurs in context, particularly social
contexts found in the media. Tier 3 was associated with the critical skills required to
question claims about samples made without proper statistical foundation.
280
JANE M. WATSON
Q1. If you were given a “sample,” what would you have?
Q2.
ABOUT 6 in 10 United States high school students say they
could get a handgun if they wanted one, a third of them within
an hour, a survey shows. The poll of 2,508 junior and senior
high school students in Chicago also found 15% had actually
carried a handgun within the past 30 days, with 4% taking one
to school.
(a)
Would you make any criticisms of the claims in this article?
(b)
If you were a high school teacher, would this report make you refuse a job offer
somewhere else in the United States, say Colorado or Arizona? Why or why
not?
Q3.
Decriminalize drug use: poll
SOME 96% of callers to youth radio
station Triple J have said marijuana
use should be decriminalized in
Australia.
The phone-in listener poll, which
closed yesterday, showed 9,924—out
of the 10,000-plus callers—favored
decriminalization, the station said.
Only 389 believed possession of the
drug should remain a criminal
offense.
Many callers stressed they did not
smoke marijuana but still believed in
decriminalizing its use, a Triple J
statement said.
(a)
What was the size of the sample in this article?
(b)
Is the sample reported here a reliable way of finding out public support for the
decriminalization of marijuana? Why or why not?
Figure 1. Sampling items from written surveys.
The statistical framework was a basis for all three earlier studies (Watson et al.,
1995; Watson & Moritz, 2000a, 2000b). The Biggs and Collis (1982, 1991)
taxonomy was used by Watson et al. and Watson and Moritz (2000b) as a major
classification device. The 1995 study identified two U-M-R cycles for responses to
the tasks set where the second cycle represented a consolidation of the idea of
sample into a single construct and the increasingly complex application of it in the
contexts presented (see Figures 1 and 2). In the large survey study (Watson &
Moritz, 2000b), the first U-M-R cycle and a consolidation phase based on
questioning bias in Items 3 and 4 in Figure 1 were identified. In the interview study
(Watson & Moritz, 2000a) the taxonomy was used in conjunction with clustering
REASONING ABOUT SAMPLES
281
techniques (Miles & Huberman, 1994) to identify six categories of performance with
respect to the tasks in Figures 1 and 2. These categories were related to the threetiered statistical literacy hierarchy (Watson, 1997) as shown in Figure 3. The
hierarchy was also used with the survey outcomes in relation to the SOLO taxonomy
to suggest the possibility of parallel development of U-M-R cycles within the three
tiers once a basic unistructural definition provides a starting point for development.
1. (a) Have you heard of the word sample before?
Where? What does it mean?
(b) A newsperson on TV says:
“In a research study on the weight of Grade 5 children, some researchers
interviewed a sample of Grade 5 children in the state.”
What does the word sample mean in this sentence?
2. (a) Why do you think the researchers used a sample of Grade 5 children, instead
of studying all the Grade 5 children in the state?
(b) Do you think they used a sample of about 10 children? Why or why not?
How many children should they choose for their sample? Why?
(c) How should they choose the children for their sample? Why?
3.
The researchers went to 2 schools:
One school in the center of the city and 1 school in the country.
Each school had about half girls and half boys.
The researchers took a random sample from each school:
50 children from the city school
20 children from the country school
One of these samples was unusual: It had more than 80% boys.
Is it more likely to have come from
the large sample of 50 from the city school, or
the small sample of 20 from the country school, or
are both samples equally likely to have been the unusual sample?
Please explain your answer.
Figure 2. Three parts of the interview protocol for sampling.
282
JANE M. WATSON
Tier 1—Understanding Terminology
Small Samplers without Selection (Category 1)
•
may provide examples of samples, such as food products
•
may describe a sample as a small bit, or more rarely as a try/test
•
agree to a sample size of less than 15
•
suggest no method of selection, or an idiosyncratic method
Small Samplers with Primitive Random Selection (Category 2)
•
provide examples of samples, such as food products
•
describe a sample as either a small bit, or a try/test
•
agree to a sample size of less than 15
•
suggest selection by “random” means without description, or a simple expression
to choose any, perhaps from different schools
Tier 2—Understanding Terminology in Context
Small Samplers with Pre-Selection of Results (Category 3)
•
provide examples of samples, such as food products
•
describe a sample as both a small bit, and a try/test
•
agree to a sample size of less than 15
•
suggest selection of people by weight, either a spread of fat and skinny, or people
of normal weight
Equivocal Samplers (Category 4)
•
provide examples and descriptions of samples
•
may indicate indifference about sample size, sometimes based on irrelevant aspects
•
may combine small size with appropriate selection methods or partial sensitivity to
bias, or large sample size with inappropriate selection methods
Large Samplers with Random/Distributed Selection (Category 5)
•
provide examples of samples, such as food products
•
describe a sample as both a small bit, and a try/test
•
may refer to term average
•
suggest a sample size of at least 20 or a percentage of the population
•
suggest selection based on a random process or distribution by geography
Tier 3—Critical Questioning of Claims Made without Justification
Large Samplers Sensitive to Bias (Category 6)
•
provide examples of samples, sometimes involving surveying
•
describe a sample as both a small bit, and a try/test
•
may refer to the terms average or representative
•
suggest a sample size of at least 20 or a percentage of the population
•
suggest selection based on a random process or distribution by geography
•
express concern for selection of samples to avoid bias
•
identify biased samples in newspaper articles reporting on results of surveys
Figure 3. Characteristics of six categories of developing concepts of sampling with respect to
the three tiers of statistical literacy (Watson, 1997).
REASONING ABOUT SAMPLES
283
The current study aimed to extend the research of these three studies by
considering longitudinal interviews with 38 students who were interviewed 3 or 4
years after their original interview.
SUBJECTS AND METHODS USED
The subjects in the current study were 22 Tasmanian students interviewed 4
years after their original interview (19 from Watson & Moritz [2000a] and 3 from
earlier pilot interviews) and 16 South Australian students interviewed 3 years later.
During the intervening years students in both states had been exposed to
mathematics influenced by the National Statement (AEC, 1991), but there was no
intervention in relation to this research study in that time. The data set is limited to
students who were still enrolled in the South Australian school or who could be
traced to another school or university within Tasmania, and who had been
interviewed on the sampling protocol in both interviews (7 students had incomplete
data). A summary of the students’ grades in the data set is given in Table 1. Grade 13
refers to first year at university.
Table 1. Distribution of 38 longitudinal interviews by state and grade
Tasmania
Grades
3→ 7
6 → 10
9 → 13
Number
6
12
4
South Australia
Grades
Number
3→ 6
2
5→ 8
6
7 → 10
3
9 → 12
5
When it is of interest to compare groups of students at different stages in school,
the following three groups will be considered: Elementary, 8 students initially in
Grade 3 and later in Grades 6 or 7; Middle School, 21 students initially in Grades 5,
6, or 7 and later in Grades 8 or 10; High School, 9 students initially in Grade 9 and
later in Grades 12 or 13. These groups are based on the fact that elementary school
ends in Grade 6 in Tasmania and Grade 7 in South Australia.
All students were interviewed using the protocol in Figure 2 as part of a longer
interview including other concepts in the chance and data curriculum. Thirty-one
students in Grade 12 and below were interviewed in their schools under conditions
similar to the original interview. Three Grade 6/10 students and the four Grade 9/13
students were interviewed on the university campus and paid a small remuneration
for coming to the campus. All interviews were videotaped and subsequently
transcribed.
The method of analysis followed the model set by Watson and Moritz (2000a) in
clustering responses into categories described in that earlier study. For 19 Tasmanian
students, initial categories were assigned from the previous research. For the original
South Australian data, three pilot interviews in Tasmania, and all longitudinal data,
students were assigned to categories by the two researchers familiar with the data,
284
JANE M. WATSON
based on a reading of all transcripts. After independently classifying the responses,
there were four discrepancies between the researchers (representing 89%
agreement), and these were decided after discussion. Not all students who
participated in the longitudinal interviews were asked the two media questions (Q2
and Q3 in Figure 1). Where there was consequently some doubt about critical
questioning and recognition of bias, this will be acknowledged.
RESULTS
The results are presented in two parts: a summary of the outcomes for the 38
students and examples of responses that illustrate the change taking place over the 3or 4-year period.
Summary of Outcomes
Of the 38 longitudinal interviews and 19 initial interviews that were classified for
the first time in relation to the six categories in Figure 3, two interviews were found
difficult to classify. One of these was from a Grade 3 student who in the initial
interview did not display any understanding of what a sample was. In the survey, for
Q1 (Figure 1) she wrote, “a cube” and in the interview she gave the following
responses:
S1:
[Part 1a] It means that you have an object or something. You could have a
dice, or a board or a chalk. [Part 1b] Well, that they had a lot of children
having these tests and they probably did well and they are probably are talking
about the ones that got them all right. [Part 2a] Some people. [Why?] Well
mostly, they couldn’t go around, they couldn’t put it on television, some people
might miss out and also if they went around looking for people and telling
everybody they wouldn’t come in because they probably had something to do
on that day. [Part 2b] Maybe 12. [Why?] Well most people wanted to have a
turn and they probably really wanted to have this interview or something and
well I’d say they would have about 12 and they would get a fair chance. [Part
2c] Well I’d get a sheet and say what is 100 and well, something, and you are
only allowed to guess and the person nearest guess or if there were two I would
probably say well you two people had the closest number so I would let you go.
This student was classified as Prestructural (Category 0) with respect to the
concept of sampling, using imaginative stories as part of her reasoning.
The other response that was unusual was from a longitudinal interview of a
Grade 6/10 student. This was the only instance of a student insisting on a population
view of the interview questions, although knowing the basic idea of what a sample is.
S2:
[Part 1a] Yes, like in the grocery store, you can sample something, you can try
it before you buy. Like a part of, whatever. [Part 1b] A few children. [Part 2a]
I don’t know, I don’t know what they were thinking. I think they should
interview everyone because then they would get it more correct. [How many
REASONING ABOUT SAMPLES
285
should they choose?] I think they should choose everyone because otherwise
you won’t get it right. Because you might choose 10 lightweighted people and
that would be wrong then, wouldn’t it? Because there might be a lot of fat
people in the school.
Although a stronger statement than that made by one nonlongitudinal student in
the initial data set (Watson & Moritz, 2000a, p. 62–3, S15), it was also placed in the
Equivocal category.
Table 2 contains a summary of students’ classifications at the two times,
recorded by whether they were elementary school students (E), middle school (M),
or high school (H). The representation displays the improved performance of 78% of
the students who could improve, the ceiling effect for four high school students, and
the decreased performance for four students (13%). All of the elementary students
improved. The greatest possible improvement (from category 1 to 6) occurred for
two of the middle school students. Of those below the highest category initially, 12%
performed at the same level later. Of those whose performance deteriorated or stayed
the same, in each case half were middle school and half were high school students.
Table 2. Initial and longitudinal performance for elementary (E), middle school (M), and high
school (H) students
Final
Category
1
2
3
4
5
6
Total
0
1
E
M
MM
EEE
1
E
MM
9
Initial Category
2
3
M
EM
3
M
H
EMM
EH
7
4
5
6
Total
HH
HHHH
6
1
2
7
3
12
13
38
MM
MM
MHH
7
M
MM
MM
5
Examples of Performance on the Two Interviews
The examples provided in this section will show improved performance,
diminished performance, and unchanged outcomes.
Improvement
The student, S1, whose response was judged to be prestructural on the initial
interview, was later classified in Category 3 (Small Samplers with Preselection) in
Grade 7.
S1:
[Part 1a] You can get a sample as, in tasting sample they give you something
and you feel it or whatever you do with it. [Part 1b] They took maybe 4 or 5
children. [Part 2a] Some people might be the same weight. [Part 2b]
Depending on how much there is, say there were 7 fat children and 7 skinny
286
JANE M. WATSON
children. Probably ask about 3 skinny and maybe 4 fat. [Part 2c] And the
weight I guess they just thought well this person and this person they look very
different and this person is sort of in between those people and so …
Several other elementary and middle school students gave similar longitudinal
responses.
Student, S2, who in Grade 10 insisted that all students be “sampled,” had earlier
in Grade 6 given a Category 2 response (Small Samplers with Primitive Random
Selection).
S2:
[Part 1a] Science … To take something from somewhere and test it. [Part 2b]
About 10 because it would be shorter. [Part 2c] Any children, just pick any!
One from one school, or a couple from a school.
Student S3 was a middle school student whose responses changed from Category
1 to 6 over the 3-year period:
S3:
S3:
(Grade 5) [Part 1a] Sample could be as in food you could try and … it could
be just something you try before the big thing comes out. [Part 1b] … a few
not all … [Part 2a] Because it probably would have taken too long. [Part 2b]
They could have done that many children. [Part 2c] It doesn’t really matter.
They could choose any 10 because they would all be different.
(Grade 8) [Part 2a] Well studying all the Grade 5 children would take quite a
while to do, so using this sample you can get the basic idea. … If you just take
them randomly, it just gives you a basic idea on the rest of them. [Part 2b] I
think 10 children is probably a bit too few, because they might all be skinny or
they all might [be] slightly more overweight, so if you use say a whole class or
a couple of classes, maybe even a hundred students, the higher number the
more chance you’re getting of equally weight or um. Like if they were slightly
less in weight or slightly higher in weight, so you’ve got more of a chance
people in both areas. [Part 2c] [How ‘random’?] Sort of not set, so like you
wouldn’t choose everybody who looks like really skinny or everybody who
looks really overweight. You could just go through and right and say you, you
and you, without really paying any attention to why you are really saying that,
just taking the people because they are there. Not picking them for like how big
or how much weight was on them. [Part 3] I’d say that one the sample of 20,
because you’ve got less people, and so if you just took a sample of 20 people,
you might have more boys than girls. … You’d have um, yeah, the percentage
would be higher from a lower number than it would in a higher number.
Although not asked the media questions, this student was sensitive to bias in
describing sampling and able to suggest the smaller class in Part 3.
Three students’ responses changed from Category 3 to Category 5 over time; one
of those was the following student, quoted in Watson and Moritz (2000a, p. 58, S7),
with an elaborate story of preselecting small and tall people. Four years later, in
Grade 7, she responded as follows:
S4:
[Part 1a] It is part of, a little bit of, to try. [Part 2a] Because what they get
from the sample they can probably figure that it is probably going to be the
same for most of every one else. [Part 2b] Umm, probably more than 10
REASONING ABOUT SAMPLES
287
because in [the state] there are a lot of Grade 5 children and 10 could be
different from the rest. Probably about maybe 200. [Part 2c] Just have them
pick a number out of a hat.
Except for the comment that “10 could be different from the rest,” the student did
not use other opportunities in the survey or interview (e.g., Part 3) to display
sensitivity to relative sample size and bias.
One of the Equivocal Samplers in the initial interviews (S14 of Watson &
Moritz, 2000a, p. 62) improved to the highest category 4 years later in Grade 10.
S5:
[Survey Q2b] This evidence is only from Illinois, with no mention of Colorado
or Arizona. It would make me investigate where my job offer came from.
[Survey Q3b] Triple J is a youth station, therefore only surveying only one age
group from the population … not a reliable sample ... if people have no interest
in the topic … they will not respond. … A compulsory referendum of the whole
population would be required to be reliable. Even then, many … would be too
young to vote. [Part 1b] A sample, say there’s 10,000 Grade 5 kids in [the
state], or something, they could have only interviewed 100 of them or like a
fairly small number of them, I mean enough to be able to obtain reasonable
results from. But for the sample they may have only taken a class then like a
class from one area of the state, which would have been less accurate, or they
could have taken like some Grade 5 kids from all over the state which would
have been a better sample. So a sample in this sentence is just a small amount
of the Grade 5 kids in [the state]. [Part 3] If they take a random sample then
there is more chance of a smaller sample that you will have an inequality in the
amount in the ratios say, because it is just a smaller sample.
Diminished Performance
The only falls in category of response were from Category 5 to Category 4 or 3.
For a student initially in Grade 6, this resulted in an Equivocal response in Grade 10.
S6:
S6:
(Grade 6) [Part 1a] A sample is a small portion of something big. [Part 1b] I
would say it means, for a school, um, a sample of the school would maybe, I
would think mean from all, a little, say 2, from each class. [Part 2b] It might
not be enough people … um to like actually … know what the people in that
class, they might have to take a few more to know what that grade … um, is,
like, the children from that grade like, what they behave like, and what they like
and all. [How many?] 30.
(Grade 10) [Part 1a] … Shampoo or something … You try it … [Part 1b] …
Just a handful … [Part 2a] Um, well order, picking, picking order should
perhaps be random, maybe because I don’t know, just because it gives it more
true sort of, end results, if you pick a rich person or a poor person or something
like that you know. It’s just like, vary it a bit, so go random … [Part 2b] … I
suppose there'd probably be a fair few different weights maybe, um, so I
suppose, yeah, ten is fair … well maybe pick more people just because there
could be a lot of varied weight, you know … like ten or so children is you
know, just the same as … fifty, sort of, … ten children would kind of be the
same, different types, fifty maybe or something.
288
JANE M. WATSON
This student had great difficulty coming to a decision about sample size but in
discussing selection suggested students from “every school” chosen at random,
“names out of a hat or something.”
One of the Equivocal Samplers from the initial interviews (S16 of Watson &
Moritz, 2000a, p. 63) continued to be equivocal about a small sample size and
appeared to suggest preselection of results (Category 3):
S7:
[Part 2b] It doesn’t matter, I don’t think. [Part 2c] Just get all different size,
forms. [Why?] To be, to make it fair. If you just picked 5 fat ones, you would
say everyone was fat.
Unchanged Performance
Eight students gave responses that were classified in the same category at both
interviews. One middle school student was considered a Small Sampler without
Selection (Category 1) both times.
S8:
S8:
(Grade 5) [Survey Q1] A packet of something. [Part 1a] A sample of grain.
[Part 1b] A bunch of them. [Part 2a] Because it might have been too many or
they might have just wanted to pick out the smart ones. [Part 2b] A different
number. [How many?] About 5. [Why?] Because I think that for them to use 10
is too many, maybe 5 at a time. [Part 2c] Maybe they are interested in
something they do.
(Grade 8) [Part 1a] … Food you can taste … [Part 1b] … About five children
in [the state]. [Part 2b] Probably about ten because it’s a um, probably because
it’s not too many like if you had 23 or something then you’d be getting a bit too
big and um, if you write data and stuff on it, ten children wouldn’t be that
many. [Part 2c] Um, suppose it doesn’t really matter. They can just pick them
out.
One of the students who gave Category 5 responses each time was a Grade 7
student initially. Although suggesting a stratified sample larger than 10, she did not
recognize bias in Item Q2 (Figure 1).
S9:
S9:
(Grade 7) [Survey Q1] A little bit of something like a test. A little bit of
shampoo or maybe a teabag. [Survey Q2a] I think they should tell us about it so
that if we know someone in the same position we can stop them doing it.
[Survey Q2b] Yes because one day I might get into a fight with one of the
students and he or she might shoot me or even kill me. … It’s too dangerous
too be teaching at a school with those sorts of people. It would be a very scary
life and I don’t think I'd like it. [Part 2b] Umm I think more than that ’cos
there’s lots of Grade 5s in [the state]. They could have got a bit from each
school. [Why?] Because some might be out in the country and they might be a
different weight. They might be city … don’t know, just to get it a bit different.
[Part 3] … Probably the [city] school ’cos there’s more people from it.
(Grade 10) [Part 2b] They would have used more than that. For all of [the
state], that wouldn’t even be from each school or anything, because you need
like different people from different areas and different schools and things like
that, not from schools. [How many?] About a couple of hundred or something
REASONING ABOUT SAMPLES
289
like that, or maybe even a bit more depending on [how many] children there
are. [Part 2c] Just randomly somehow. I don’t know, just [choose] their names!
Just choose them somehow. [Why random?] So they are not all the same, they
live somewhere different, they come from [inaudible] backgrounds, different
schools, like different religions maybe even, things like that. [Part 3] It’s hard
to tell. It could of like come from the [city] one because there’s more people,
but ... then again it could have come from the country school because if it’s
selected by random, you can’t really tell … like if each school had half girls
and half boys, it would probably be like equal or …
Although not surveyed the second time, this student did not take up opportunities
to suggest possible bias and gave interview responses very similar to earlier.
One of the four students in Category 6 each time was the following student, who
although not asked the survey questions the second time was consistent in the
understanding displayed and sensitivity to bias.
S10:
S10:
(Grade 9) [Survey Q1] An average part of a larger group that represents that
larger group as a whole to be analyzed/studied. [Survey Q2a] It claims a
sample of the U.S.A. students when there was only a sample of Chicago.
[Survey 3b] Because there is not a fair representation of the population would
listen to Triple J or bother to call. [Part 2b] I don’t think they would have
used 10 children because it’s not a fair amount to judge the whole of [the
state] … but a lot more than 10 or 100. … I’d go for about one quarter. [Part
2c] They should randomly choose. They shouldn’t have any preference from
one Grade 5 to another. [Part 3] It would have come from the 20 in the
country school because there was more … fairer amount from the [city] school
because there were 50 … the more you have in your sample the better the
research will be ...
(Grade 13) [Part 2b] 10 children … would be … too small to make any
conclusions. … They should have chosen about … 5% of the number of
children … around that figure. [Part 2c] They should somehow choose them
randomly, draw their names out of a hat … and not just pick them from the
one [city] school or something. And they should also make sure that they get a
similar number of girls and boys to measure and that they get a similar number
of ages. There may be one or two years variation but it’s really not that
important since they are all the same grade ... if you take them by what they
look, skinny or heavy, then you are pretty sure that the skinny ones will weigh
less, the weighty ones would weigh more. … I think that you would be
influencing your results beforehand. [Part 3] I’d expect that the 80% boys that
are randomly chosen would be from the country school … not because there
are more boys in the country but the number of children, the more that you
have in the sample, the more the distribution would be similar to the
population, so the smaller the more likely that it is not. So therefore the
country one since it has 20 instead of 50 like the city one, would be more
likely.
290
JANE M. WATSON
DISCUSSION
The improved outcomes for students’ responses to questions on sampling after 3
or 4 years are encouraging in terms of increased representation in higher categories,
particularly a doubling in Categories 5 and 6. Whether life experiences or the
mathematics curriculum is responsible is impossible to determine. Some of the older
students’ responses to Part 3 of the interview protocol suggest that instruction on
sample size may have been a factor. One quote, however, from a student in Grade 10
at the time of the second interview indicates that if the mathematics curriculum is
having an influence, it is not always absorbed.
S11:
[Part 1a] It means like a small dose or something. Like everywhere, like a
small sample of perfume or like to try something in a supermarket or
something. [Are there ways you might use it in maths or somewhere else
talking about school subjects?] You would use it like in science to like test a
sample of something but I don’t think you would use it in maths. You would
but, it is not like the word average, not like something you would use all the
time.
This view may reflect that of Derry, Levin, Osana, and Jones (1998) about the
lack of calculations associated with statistical reasoning. This may mean that some
students (or teachers) fail to realize that ideas about sampling are important.
The students who participated in this longitudinal study are likely to have
participated as well in other longitudinal studies of topics such as average (Watson
& Moritz, 2000c), beginning inference (Watson, 2001), and pictographs (Watson &
Moritz, 2001). Although the criteria for classification were not identical in the
studies, they were hierarchical in nature. Of 43 students interviewed longitudinally
on average, for example, there were no students who performed at a lower level in
their second interview; 12 performed at the same level, but 4 of these could not
improve. Hence of those who could display improved understanding, 79% did so on
the topic of average. This is nearly the same percentage as in the current study. The
difference in the percent for diminished performance (13% here compared to 0% for
average) may reflect the greater emphasis of the school curriculum on the topic of
average in the middle and high school years, the years of the gap between interviews
for these students. Also learning calculations associated with averages may have
reinforced general understanding of average, something unlikely to have happened
with the topic of sampling.
Several limitations are associated with the design and implementation of this
study. The interview format was time-consuming and hence a limited number of
students could be involved. This and the consequent dropout rate meant that data
from only 38 students could be analyzed in the longitudinal study. There was also no
control over the distribution of the remaining students across grades. Although
students represented two different Australian states and public and private education,
most of those interviewed were females, and it would be desirable to have an even
more representative group of students. The ideal, however, is rarely achieved in
educational research; and given the range of understanding displayed, it is felt that a
rich picture of developing understanding has been gained that will be of benefit to
REASONING ABOUT SAMPLES
291
educational planners. If numbers are not the only criteria for research respectability,
then the descriptions of understanding sampled should compensate for some of the
other limitations.
Further questions arising from this research might involve the monitoring of
learning experiences that occur during the time gap between interviews or the
planning and implementation of a specific program aimed at improving
understanding. Such a program might be based on the outcomes observed in this
study, particularly with respect to ideas for moving students’ understanding into Tier
3 of the statistical literacy hierarchy (Watson, 1997). Such further research, however,
is potentially very expensive unless sponsored within an educational system
committed to providing the support necessary for well-documented implementation.
Outside researchers will find it very difficult.
One of the issues in studying students’ development of statistical understanding
over time is whether cross-cohort studies are sufficient, or if longitudinal interviews
are necessary. The major advantage of longitudinal interviews is the constancy of the
individual and hence the confidence in the change observed as actual for that person.
On the other hand, longitudinal studies usually cannot control for many factors that
can influence outcomes—for example, school curriculum, which may be different for
different students; and dropout rates, which for older students may skew outcomes to
higher categories. Cross-cohort studies carried out simultaneously do not suffer the
last two disadvantages, and if enough students are sampled in a representative
manner, then confidence in the outcomes in terms of a developmental model is quite
high. In the current study, it is not possible to make direct comparisons of the
distribution of outcomes at the same grade levels in the different years. The 21
students in Grades 8 and 10 in their second interviews, did not perform as well
generally as the 20 in the Grade 9 cohort originally (Watson & Moritz, 2000a). The
different states and education systems represented in the later data set may contribute
to its greater variation and somewhat lower level of performance. The expense and
difficulty of conducting adequate longitudinal studies, and general trends for
improvement observed from them, however, suggest that cross-cohort studies may be
an acceptable substitute for most purposes.
IMPLICATIONS
One of the interesting educational issues to arise from a study of these interviews
on samples and sampling is the dilemma for students in catering for variation that
they know exists in the population. As noted by Wild and Pfannkuch (1999), this
“noticing and acknowledging variation” is an aspect of the general consideration of
variation that is fundamental to statistical thinking. It is a critical starting point.
Some students, usually but not always younger, select a sample to ensure variation.
Obviously the idea of using random selection to allow for a chance process to ensure
appropriate variation is a fairly sophisticated idea. Combined with various forms of
stratification as noted by some students, random selection caters for the needs that
might occur in studying the weight of Grade 5 children in a state. Although most
students are either in Category 3 or at least Category 5, a few have great difficulty
292
JANE M. WATSON
distinguishing between allowing for variation and forcing it to occur, and suggest
methods for both. One Grade 6 student, for example, suggested “choose an average
person” as well as “one from each school [for] a wider variety of students.” It is
important for discussion to take place in the classroom to distinguish these situations
explicitly for students. Perhaps class debates could be used to address the issue. As
noted by Metz (1999), even a large proportion of students who have been involved
in devising and carrying out their own research are not convinced of the power of
sampling. The range of views she reported would be an excellent starting point for
discussion.
As well as confirming the six categories of response reported in Watson and
Moritz (2000a), this study identified a student who had not yet entered Category 1 or
Tier 1 of the statistical literacy hierarchy. In terms of movement among tiers over the
period of the study, no one reverted to Tier 1 understanding. Hence once students
became involved in relating the idea of sample to a context, they did not lose the
ability to do so. Of the 19 students originally responding in Tier 2, 37% were able to
respond in Tier 3, three or four years later, whereas 20% of those originally in Tier
1, responded later in Tier 3. That the movement to Tier 3 was not stronger, and that
2 of 6 originally in Tier 3 dropped to Tier 2, is disappointing but perhaps not
unexpected. It may reflect the lack of emphasis in mathematics classrooms, and in
subjects other than mathematics, on bias in media reporting and other settings.
The observations in this study in terms of longitudinal change among the three
tiers of understanding (Watson, 1997) reflect those of Watson and Moritz (2000a) in
terms of cohort differences. The importance of emphasizing variation and
representativeness in the transition from Tier 1 to Tier 2, and the recognition of bias
in the transition from Tier 2 to Tier 3, is supplemented by the realization that for
some younger children, examples of samples with appropriate associated meaning
will be needed to introduce the idea of sample to students for Tier 1 understanding.
Recognizing how structurally complex the construction of meaning of “sample” is
(Watson & Moritz, 2000b, Table 2) implies that talking about “shampoo” is not
sufficient. The idea of representation must be supplemented and distinguished from
“just like the real thing.” Student S2, for example, even at Grade 10, appeared to
have a view of sampling from the grocery store that implied a perfect representation
of the population and which then necessitated choosing all students from the Grade 5
population in order to “get it right.”
Returning to the statement from the AEC (1991) on the importance of sampling
and making inferences about populations, it is certain that the first is the foundation
for the second. In perusing some of the responses reported in this study, it must be
said that a more concerted effort is required throughout the middle and high school
years in order to consolidate this foundation for many students.
ACKNOWLEDGMENTS
This research was funded by an Australian Research Council grant (No. A79800950) and
a small grant from the University of Tasmania. Jonathan Moritz conducted the longitudinal
interviews and corroborated the classifications of the author.
REASONING ABOUT SAMPLES
293
REFERENCES
Australian Education Council. (1991). A national statement on mathematics for Australian schools.
Carlton, Vic.: Author.
Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York:
Academic Press.
Biggs, J. B., & Collis, K. F. (1991). Multimodal learning and the quality of intelligent behavior. In H. A.
H. Rowe (Ed.), Intelligence: Reconceptualization and measurement (pp. 57–76). Hillsdale, NJ:
Erlbaum.
Corwin, R. B., & Friel, S. N. (1990). Statistics: Prediction and sampling. A unit of study for grades 5–6
from used numbers: Real data in the classroom. Palo Alto, CA: Dale Seymour.
Derry, S. J., Levin, J. R., Osana, H. P., & Jones, M. S. (1998). Developing middle-school students’
statistical reasoning abilities through simulation gaming. In S. P. Lajoie (Ed.), Reflections on
statistics: Learning, teaching and assessment in grades K–12 (pp. 175–195). Mahwah, NJ: Erlbaum.
Gal, I. (2000). Statistical literacy: Conceptual and instructional issues. In D. Coben, J. O’Donoghue, &
G. E. Fitzsimons (Eds.), Perspectives on adults learning mathematics: Research and practice (pp.
135–150). Dordrecht, The Netherlands: Kluwer Academic Publications.
Jacobs, V. R. (1997, March). Children’s understanding of sampling in surveys. Paper presented at the
Annual Meeting of the American Educational Research Association, Chicago.
Jacobs, V. R. (1999). How do students think about statistical sampling before instruction? Mathematics
in the Middle School, 5, 240–263.
Landwehr, J. M., Swift, J., & Watkins, A. E. (1987). Exploring surveys and information from samples.
Palo Alto, CA: Seymour.
Metz, K. E. (1999). Why sampling works or why it can’t: Ideas of young children engaged in research of
their own design. In F. Hitt & M. Santos (Eds.), Proceedings of the 21st annual meeting of the North
American Chapter of the International Group for the Psychology of Mathematics Education (Vol. 2,
pp. 492–498). Cuernavaca, Mexico: PME.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.).
Thousand Oaks, CA: Sage.
Mokros, J., & Russell, S. J. (1995). Children's concepts of average and representativeness. Journal for
Research in Mathematics Education, 26(1), 20–39.
Moore, D. S. (1991). Statistics: Concepts and controversies (3rd ed.). New York: Freeman.
National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics.
Reston, VA: Author.
Orr, D. B. (1995). Fundamentals of applied statistics and surveys. New York: Chapman and Hall.
Rubin, A., Bruce, B., & Tenney, Y. (1991). Learning about sampling: Trouble at the core of statistics. In
D. Vere-Jones (Ed.), Proceedings of the Third International Conference on Teaching Statistics. Vol.
1 (pp. 314–319). Voorburg: International Statistical Institute.
Schwartz, D. L., Goldman, S. R., Vye, N. J., Barron, B. J., & The Cognition and Technology Group at
Vanderbilt. (1998). Aligning everyday and mathematical reasoning: The case of sampling
assumptions. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching and assessment in
grades K–12 (pp. 233–273). Mahwah, NJ: Erlbaum.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D. A.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 465–494). New
York: NCTM & Macmillan.
Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data handling. In A. J. Bishop, K. Clements, C.
Keitel, J. Kilpatrick, & C. Laborde (Eds.), International handbook of mathematics education, Part 1
(pp. 205–237). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76(2),
105–110.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185,
1124–1131.
Wagner, D. A., & Gal, I. (1991). Project STARC: Acquisition of statistical reasoning in children
(Annual Report: Year 1, NSF Grant No. MDR90-50006). Philadelphia: Literacy Research Center,
University of Pennsylvania.
Wallman, K. K. (1993). Enhancing statistical literacy: Enriching our society. Journal of the American
Statistical Association, 88(421), 1–8.
294
JANE M. WATSON
Watson, J. M. (1997). Assessing statistical literacy using the media. In I. Gal & J. B. Garfield (Eds.), The
Assessment Challenge in Statistics Education (pp. 107–121). Amsterdam: ISO Press and the
International Statistical Institute.
Watson, J. M. (2001). Longitudinal development of inferential reasoning by school students. Educational
Studies in Mathematics, 47, 337–372.
Watson, J. M., Collis, K. F., & Moritz, J. B. (1995, November). The development of concepts associated
with sampling in grades 3, 5, 7 and 9. Paper presented at the Annual Conference of the Australian
Association for Research in Education, Hobart.
Watson, J. M., & Moritz, J. B. (2000a). Developing concepts of sampling. Journal for Research in
Mathematics Education, 31, 44–70.
Watson, J. M., & Moritz, J. B. (2000b). Development of understanding of sampling for statistical
literacy. Journal of Mathematical Behavior, 19, 109–136.
Watson, J. M., & Moritz, J. B. (2000c). The longitudinal development of understanding of average.
Mathematical Thinking and Learning, 2(1&2), 11–50.
Watson, J. M., & Moritz, J. B. (2001). Development of reasoning associated with pictographs:
Representing, interpreting, and predicting. Educational Studies in Mathematics, 48, 47–81.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. [And Discussions].
International Statistical Review, 67, 223–265.
Chapter 13
REASONING ABOUT SAMPLING
DISTRIBUTIONS
Beth Chance1, Robert delMas2, and Joan Garfield2.
California Polytechnic State University, USA1, and University of Minnesota, USA2
INTRODUCTION
This chapter presents a series of research studies focused on the difficulties students
experience when learning about sampling distributions. In particular, the chapter
traces the seven-year history of an ongoing collaborative research project
investigating the impact of students’ interaction with computer software tools to
improve their reasoning about sampling distributions. For this classroom-based
research project, three researchers from two American universities collaborated to
develop software, learning activities, and assessment tools to be used in introductory
college-level statistics courses. The studies were conducted in five stages, and
utilized quantitative assessment data as well as videotaped clinical interviews. As
the studies progressed, the research team developed a more complete understanding
of the complexities involved in building a deep understanding of sampling
distributions, and formulated models to explain the development of students’
reasoning.
THE PROBLEM
Many published research reports, as well as popular media accounts, utilize
ideas of statistical confidence and significance. Consequently, a large proportion of
the introductory statistics courses at the tertiary level is concerned with statistical
inference. While many students may be able to carry out the necessary calculations,
they are often unable to understand the underlying process or properly interpret the
results of these calculations. This stems from the notoriously difficult, abstract topic
of sampling distributions that requires students to combine earlier course topics such
as sample, population, distribution, variability, and sampling. Students are then
295
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 295–323.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
296
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
asked to build on these ideas to make new statements about confidence and
significance. However, student understanding of these earlier topics is often shallow
and isolated, and many students complete their introductory statistics course without
the ability to integrate and apply these ideas. Our experience as teachers of statistics
suggests that the statistical inference calculations that students perform later in a
course tend to become rote manipulation, with little if any conceptual understanding
of the underlying process. This prevents students from being able to properly
interpret research studies.
To address this problem, research and education literature has suggested the use
of simulations for improving students’ understanding of sampling distributions (e.g.,
Behrens, 1997; Davenport, 1992; Glencross, 1988; Schwarz & Sutherland, 1997;
Simon, 1994). Many of these articles discuss the potential advantage of simulations
to illustrate this abstract idea by providing multiple examples of the concept and
allowing students to experiment with all of the variables that form the concept. In
particular, technology allows students to be directly involved with the “building up”
of the sampling distribution, focusing on the process involved, instead of presenting
only the end result. Recently, numerous instructional computer programs have been
developed that focus on use of simulations and dynamic visualizations to help
students develop their understanding of sampling distributions and other statistical
concepts: ConStatS (Cohen, 1997), HyperStat (Lane, 2001), Visual Statistics
(Doane, Tracy, & Mathieson, 2001), StatPlay (Thomason & Cummings, 1999),
StatConcepts (Newton & Harvill, 1997), ExplorStat (Lang, Coyne, & Wackerly,
1993), and ActivStats (Velleman, 2003).
Despite this development of software programs, little has been published that
evaluates the effectiveness of simulation activities to improve students’ reasoning
about statistics. Some papers have cited anecdotal evidence that students are more
engaged and interested in learning about statistics with such simulations (e.g.,
Simon, 1994), but even fewer studies have gathered and presented empirical data,
especially in the context of the college statistics classroom (see Mills, 2002). Of the
empirical studies that have been conducted, most demonstrated only very modest, if
any, gains in student learning (Schwartz, Goldman, Vye, & Barron, 1997; Well,
Pollatsek, & Boyce, 1990).
For example, Earley (2001) found that an instructor-led demonstration using the
Sampling SIM (delMas, 2001) program was not sufficient to “convince” students of
various features of the Central Limit Theorem. They could recognize facts, but were
not able to consistently apply their knowledge. However, Earley noted evidence that
the students referred to the images from the program later in the course and used
them as a building block when the course proceeded to hypothesis testing. Saldanha
and Thompson (2001) documented the difficulties high school students exhibited
during two teaching experiments about sampling distributions. These included use
of computer simulations to investigate what it means for the outcome of a stochastic
experiment to be unusual. They found students had difficulty grasping the
multilevel, stochastic nature of sampling distributions and often did not sufficiently
participate in the teaching activity. Studies by Hodgson (1996) revealed that
simulations may actually contribute to the formation of misconceptions (e.g., the
belief that inference required multiple samples). As a consequence, Hodgson and
REASONING ABOUT SAMPLING DISTRIBUTIONS
297
Burke (2000) suggest ways of ensuring that students attend to the more salient
features of simulation activities and highlight the importance of pre-organizers,
ongoing assessment, debriefing, and follow-up exercises.
There is at least one exception to the findings of only modest gains by empirical
studies. Sedlmeier (1999), using an adaptive algorithms perspective, designed
software based on a “flexible urn” model to train students on sampling distribution
problems. The adaptive algorithms perspective argues that the urn model is similar
to frequency information that people deal with regularly and to which the human
mind has adapted through evolution. Translating sampling distribution problems
into the urn model is thought to make the task more understandable and facilitate
reasoning. Sedlmeier found significant immediate and long-term effects from the
flexible urn training. One question that is not addressed in Sedlmeier’s studies is
whether students develop an abstract understanding of sampling distributions
through these activities or remain dependent on translation to the urn model.
BEGINNING A SERIES OF CLASSROOM-BASED RESEARCH STUDIES
To investigate the potential impact of simulation software on students’
understanding of sampling distributions, the Sampling Distribution program, a
precursor of the Sampling SIM program (delMas, 2001), was developed. Initial
development of this software was guided by literature in educational technology and
on conceptually enhanced simulations (e.g., Nickerson, 1995; Snir, Smith, &
Grosslight, 1995). An activity was created to guide the students’ interaction with the
simulation software based on ideas from literature in learning and cognition (e.g.,
Holland, Holyoak, Nisbett, & Thagard, 1987; Perkins, Schwartz, West, & Wiske,
1995). Assessment tasks were designed to determine the extent of students’
conceptual understanding of sampling distributions.
The three classroom researchers began using the software, activity, and
assessments in different settings: a small, private university (University of the
Pacific), a College of Education, and a Developmental Education College (the latter
two both at the University of Minnesota). These were fairly standard algebra-based
introductory courses, presumed to be students’ first exposure to the material. The
courses used common introductory textbooks (Moore & McCabe, 2002; Moore,
2000; and Siegel & Morgan, 1996), and included numerous classroom activities and
uses of technology. A primary goal of the classroom research was to document
student learning of this challenging topic, while providing feedback for further
development and improvement of the software and the learning activity. Four
questions guided the investigation: how the simulations could be utilized more
effectively, how to best integrate the technology into instruction, why particular
techniques appeared to be more effective, and how student understanding of
sampling distributions was affected by use of the program. Five sequential research
studies were conducted, each building on the previous work. These are described in
the following sections.
298
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
First Study: Assessing the Impact of Instruction with Simulation Software
An initial version of the instructional activity asked students to use the Sampling
Distribution program to change settings such as the population shape and sample
size and to summarize the results for the different empirical sampling distributions
they observed. Graphics-based test items were used to determine whether students
could demonstrate a visual understanding of the implications of the Central Limit
Theorem for a sample mean. Each item presented a population distribution and
required students to choose which of five empirical distributions of sample means
best represented a potential sampling distribution for a specified sample size.
Figure 1. A graph-based item. (Correct answers: D, C)
REASONING ABOUT SAMPLING DISTRIBUTIONS
299
For example, the item in Figure 1 asked students which distribution of sample
means they thought best represented an empirical distribution for 500 samples of
size n = 4 and also for size n = 25. Students were asked to justify their choice of
graphs and explain their reasoning in writing. These responses were then
categorized so that future instruments asked students to select which statement best
described their own reasoning. Students were given these test instruments before
using the program and immediately after using the program. Comparing the pre- and
posttest scores isolated the change in students’ understanding from interacting with
the program and activity.
While there were some positive changes, several students still did not appear to
be developing correct reasoning about sampling distributions. See delMas, Garfield,
and Chance (1998) for more details of the program and instructional activities.
Reflection on these results led to further development of the program and
instructional activities.
Second Study: Applying a Conceptual Change Approach
Despite the software’s capability to provide an excellent visualization of the
abstract process of creating sampling distributions, students were still having
difficulty understanding and applying the Central Limit Theorem. Research on
conceptual change theory in science education offered a different approach to this
problem (e.g., Posner, Strike, Hewson, & Gertzog, 1982; Lord, Ross, & Lepper,
1979; Jennings, Amabile, & Ross, 1982; Ross & Anderson, 1982). An attempt was
made to build on this theory in redesigning the activity to engage students in
recognizing their misconceptions and to help them overcome the faulty intuitions
that persisted in guiding their responses on assessment items. In the new activity,
students were first asked to give their response to the graphical test items, as in
Figure 1, for five different populations, each at two different sample sizes, and then
to use the Sampling SIM program to produce an empirical sampling distribution
under the same conditions. They were then asked to compare the simulation results
to their earlier responses and comment on whether their answer agreed or disagreed
(and if so how) with what the program revealed about the behavior of the sample
means. This predict/test/evaluate model forced students to more directly confront the
misconceptions in their understanding, which resulted in statistically significant
improvements in their performance on the posttest (delMas, Garfield, & Chance,
1999a).
Third Study: Conceptual Analysis of Prior Knowledge and Misconceptions
While many students did demonstrate better understanding after this revised
activity, many still exhibited misconceptions as indicated by responses on an
immediate posttest as well as final exam items (delMas, Garfield, & Chance,
1999b). Since the topic of sampling distributions requires students to integrate many
concepts from earlier in the course, gaps in the students’ prerequisite knowledge
300
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
were considered as a plausible cause. For example, when discussing the variability
of the sampling distribution and how variability decreases as the sample size
increases, it appeared that some students were not able to fully understand or
identify variability, nor to properly read a histogram when presented in a different
context. Therefore, this study involved a series of conceptual analyses related to
student understanding of sampling distributions. These analyses were based on the
experiences and observations of the classroom researchers, contributions of
colleagues, and analyses of students’ performance on assessment items.
The first analysis produced a thorough list of what students should know before
learning sampling distributions (Garfield, delMas, & Chance, 2002; see Table 1).
This list guided the development of a set of pretest questions that were given to
students
and
discussed
in
class
before
instruction
(see
http://www.gen.umn.edu/faculty_staff/delmas/stat_tools/, click the MATERIALS
button, and scroll to the Sampling Distributions Activity section). Using the items to
diagnose areas that students did not understand provided some review and
remediation to students before proceeding to the new topic of sampling
distributions. This analysis led to more detailed descriptions of what is meant by
“understanding sampling distributions,” including a detailed list of the necessary
components of understanding (Table 2), a list of what students should be able to do
with their knowledge of sampling distributions (Table 3), and a list of common
misconceptions that students exhibit about sampling distributions (Table 4). These
lists guided revisions of the activity, pretests, and posttests. This analysis was
helpful in identifying types of correct and incorrect understanding to look for in
students’ reasoning, and enabled more detailed examination of individual student
conceptions via clinical interviews.
Table 1. Prerequisite knowledge to learning about sampling distributions
•
•
•
•
The idea of variability. What is a variable? What does it mean to say observations
vary? Students need an understanding of the spread of a distribution in contrast to
common misconceptions of smoothness or variety.
The idea of a distribution. Students should be able to read and interpret graphical
displays of quantitative data and describe the overall pattern of variation. This includes
being able to describe distributions of data; characterizing their shape, center, and
spread; and being able to compare different distributions on these characteristics.
Students should be able to see between the data and describe the overall shape of the
distribution, and be familiar with common shapes of distributions, such as normal,
skewed, uniform, and bimodal.
The normal distribution. This includes properties of the normal distribution and how a
normal distribution may look different due to changes in variability and center.
Students should also be familiar with the idea of area under a density curve and how
the area represents the likelihood of outcomes.
The idea of sampling. This includes random samples and how they are representative
of the population. Students should be comfortable distinguishing between a sample
statistic and a population parameter. Students should have begun considering or be
able to consider how sample statistics vary from sample to sample but follow a
predictable pattern.
REASONING ABOUT SAMPLING DISTRIBUTIONS
301
Table 2. What students should understand about sampling distributions
•
•
•
•
•
•
•
•
•
•
•
•
•
A sampling distribution of sample means (based on quantitative data) is a distribution
of all possible sample means (statistics) for a given sample size randomly sampled
from a population with mean µ and standard deviation σ. It is a probability distribution
for the sample mean.
The sampling distribution for means has the same mean as the population.
As the sample size (n) gets larger, the variability of the sample means gets smaller (a
statement, a visual recognition, and predicting what will happen or how the next
picture will differ).
Standard error of the mean is a measure of variability of sample statistic values.
The building block of a sampling distribution is a sample statistic.
Some values of statistics are more or less likely than others to be drawn from a
particular population.
The normal approximation applies in some situations but not others.
If the normal approximation applies, then the empirical rule can be applied to make
statements about how often the sample statistic will fall within, say, 2 standard
deviations of the mean.
Different sample sizes lead to different probabilities for the same statistic value (know
how sample size affects the probability of different outcomes for a statistic).
Sampling distributions tend to have the shape of a normal distribution rather than the
shape of the population distribution, even for small samples.
As sample sizes get very large, all sampling distributions for means look alike (i.e.,
have the same shape) regardless of the population from which they are drawn.
Averages are more normal and less variable than individual observations.
Be able to distinguish between a distribution of observations in one sample and a
distribution of x statistics (sample means) from many samples (sample size n greater
than 1) that have been randomly selected.
Table 3. What students should be able to do with their knowledge of sampling distributions of
the sample mean
•
Describe what a sampling distribution would look like for different populations and sample sizes
(based on shape, center and spread, and where most of the values would be found).
•
Interpret and apply areas under the (theoretical sampling distribution) curve as probability
statements about sample means.
•
•
•
•
Describe which values of the sample mean are likely, and which are less likely. This may include
ability to apply the empirical rule to the distribution of sample means.
Describe the size of the standard error of the mean and how or when it changes.
Describe the likelihood of different values of the sample mean. In particular, make statements
about how far a sample statistic is likely to vary from the population proportion. For example,
explain how often the sample mean should fall within two standard deviations of the population
mean, and whether a random set of outcomes is unusual based on given population
characteristics.
Describe the mean of the sample means for different-shaped populations.
302
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
Table 4. Some common student misconceptions
•
•
•
•
•
•
•
•
Believe sampling distribution should look like the population (for sample size n > 1).
Think sampling distribution should look more like the population as the sample size
increases (generalizes expectations for a single sample of observed values to a
sampling distribution).
Predict that sampling distributions for small and large sample sizes have the same
variability.
Believe sampling distributions for large samples have more variability.
Do not understand that a sampling distribution is a distribution of sample statistics.
Confuse one sample (real data) with all possible samples (in distribution) or potential
samples.
Pay attention to the wrong things, for example, heights of histogram bars.
Think the mean of a positive skewed distribution will be greater than the mean of the
sampling distribution for samples taken from this population.
Fourth Study: Student Interviews and a Developmental Model
To gather more detailed information about how students’ conceptions of related
concepts (e.g., distribution, variability) as well how they actually develop reasoning
about sampling distributions, several guided interviews were conducted. The
interviews were also designed to capture students’ interaction with the Sampling
SIM program in an individualized setting (Garfield, 2002). The students were
enrolled in a graduate-level introductory statistics course. Interviews, which lasted
from 45 to 60 minutes, asked students to respond to several open-ended questions
about sampling variability while interacting with the Sampling SIM software. The
interviews were videotaped, transcribed, and viewed many times to determine
students’ initial understanding of how sampling distributions behave and how
feedback from the computer simulation program helped them develop an integrated
reasoning of concepts. While conducting and later reviewing these interviews, the
authors noted some differences between students as they progressed throughout the
interview and activity. These findings initially suggested a developmental model
might describe the stages students appeared to progress through in going from faulty
to correct reasoning. Based on the work of Jones and colleagues (Jones et al., 2000;
Jones et al., 2001; Mooney, 2002), who had proposed developmental models of
statistical thinking and reasoning in children, a framework was developed that
describes stages of development in students’ statistical reasoning about sampling
distributions. An initial conception of the framework is as follows (Garfield,
delMas, & Chance, 1999):
Level 1—Idiosyncratic Reasoning The student knows words and symbols related to
sampling distributions, uses them without fully understanding them, often
incorrectly, and may use them simultaneously with unrelated information.
REASONING ABOUT SAMPLING DISTRIBUTIONS
303
Level 2—Verbal Reasoning The student has a verbal understanding of sampling
distributions and the implications of the Central Limit Theorem, but cannot apply
this to the actual behavior of sample means in repeated samples. For example, the
student can select a correct definition, but does not understand how key concepts
such as variability and shape are integrated.
Level 3—Transitional Reasoning The student is able to correctly identify one or two
characteristics of the sampling process without fully integrating these
characteristics. These “characteristics” refer to three aspects of the Central Limit
Theorem: understanding that the mean of the sampling distribution is equal to the
population mean, that the shape of the sampling distribution becomes more normal
as the sample size increases, and that the variability in the sample means decreases
as the sample size increases. A student who understands only one or two
characteristics might state only that large samples lead to more normal-looking
sampling distributions, or that larger samples lead to narrower sampling
distributions.
Level 4—Procedural Reasoning The student is able to correctly identify the three
characteristics of the sampling process but does not fully integrate them or
understand the predictable long-term process. For example, the student can correctly
predict which sampling distribution corresponds to the given parameters, but cannot
explain the process, and does not have full confidence when predicting a distribution
of sample means from a given population for a given sample size.
Level 5—Integrated Process Reasoning The student has a complete understanding
of the process of sampling and sampling distributions, in which rules and stochastic
behavior are coordinated. For example, students can explain the process in their own
words, describing why the distribution of sample means becomes more normal and
has less variability as the sample size increases. They also make predictions
correctly and confidently.
Fifth Study: Defining Dimensions of Reasoning
Having described these levels of student reasoning, it was important to validate
the levels through additional interviews across the three environments. A nine-item
diagnostic assessment was developed to identify students who were potentially at
different levels of statistical reasoning (see the Appendix for items 5–9). The test
contained graph- and text-based questions, and asked students to rank their level of
confidence in their answers. The first item tested students’ understanding of the
relationship between sample size and the variability of a sample estimate. The
second and third items required students to apply their knowledge of the standard
error of the mean. Students were expected to have some understanding of the
empirical rule as well. The fourth and fifth problems were graph-based items that
assessed students’ ability to make correct predictions about the behavior of sampling
distributions, as well as their ability to identify reasonable descriptions and
comparisons of distributions. The sixth through eighth items required students to
304
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
apply their understanding of the Central Limit Theorem. The final item assessed
students’ definitional knowledge of the Central Limit Theorem.
The assessment was administered to 105 undergraduates at the University of
Minnesota currently enrolled in an introductory statistics course that utilized the
Sampling SIM software. At the start of the second half of the course, these students
used Sampling SIM to complete an activity on sampling distributions, and then took
the diagnostic test at the start of the following class session. Nine statistics majors at
Cal Poly who were enrolled in a senior-level capstone course but had never
interacted with Sampling SIM also completed the diagnostic test. Altogether, the
114 students showed substantial variation in their responses to the questions. With
respect to the two graph-based problems (items 4 and 5; see Appendix), only 10%
made correct choices for both graphs in both problems, and another 22% made
correct choices for both graphs in only one of the problems. Many students (47%)
made choices that were not correct, but indicated an understanding that sampling
variability decreases as sample size increases. Of the remaining students, 19% made
choices for at least one problem that suggested a belief that sampling variability
increases with increases in sample size, and two of the students chose the same
graph for both sample sizes.
Not all of the student responses to the questions about the distribution shape and
relative variability in problems 4 and 5 were consistent with their graph choices.
Concerning shape, only 49% of the students made choices that were completely
consistent. There were two questions about shape for each problem (questions c and
g) with a total of four questions between the two problems. The average percentage
of consistent shape choices per student varied from 0% to 100% with an average of
80% (SD = 24.8). Regarding variability comparisons, an even smaller percentage of
students (33%) made completely consistent choices. Students were asked to make
three comparisons of variability for each problem (questions d, h, and i) for a total of
six comparisons between the two problems. The percentage of consistent variability
comparisons varied from 0% to 100% with an average of 72% (SD = 29.6).
On average, students correctly answered 61% of the remaining non-graph items
of the diagnostic test (SD = 16.2). Most of the students (88%) answered the first
problem correctly regarding the relationship between sample size and estimation
accuracy. The students did not fare as well with the second and third problems that
involved an understanding of the standard error of the mean and application of the
empirical rule. Only 4% answered both questions correctly, and another 7%
answered at least one of the items correctly. On the three items that required
application of the Central Limit Theorem (items 6, 7, and 8 in the Appendix), most
of the students correctly answered either all three items (22%) or two of the three
(39%). They also demonstrated a definitional understanding of the Central Limit
Theorem in that 55% correctly answered all four questions in problem 9, while
another 26% answered three of the four questions correctly.
Of the 114 students who took the diagnostic test, 37 signed a consent form
indicating they would be willing to participate in an interview. All students who
gave consent received an invitation to participate in an interview, and nine accepted.
Statistics on the nine students’ diagnostic test performance are presented in Table 5.
305
REASONING ABOUT SAMPLING DISTRIBUTIONS
These nine students represent a fair amount of variation in performance across the
items on the diagnostic test.
Table 5. Diagnostic test performance of undergraduates who participated in an interview
Variability
Comparisons
Shape
Comparisons
Item 5
Item 4
Average Confidence
Graph-Based Items
Agreement
with Graphs
Choice Pattern
All Items
Definition
Item 9
Application
Items 6–8
SEM Application
Items 2 & 3
Sample Size Item 1
Student
Non-Graph Items
Central Limit
Theorem
Kelly
Correct
100% 100% 100% 100%
G
C
80.0 100% 100%
Jack
Correct
0%
67%
75%
60%
C
C
76.3 100%
83%
Mitzy
Correct
0%
33% 100%
60%
G
C
70.0
75% 100%
David
Correct
0%
0%
50%
30%
G
C
66.3
75% 100%
Karen
Correct
0%
67%
75%
60%
L-S
G
53.8 100%
67%
Marie
Correct
100%
33%
75%
70%
L-S
L-S
70.0 100%
67%
Martha
Correct
50%
33%
50%
50%
L-S
*
77.5
50%
50%
Susan
Correct
0%
33%
75%
30%
S-L
L-S
86.3 100% 100%
Elaine
Incorrect
0%
33%
75%
40%
L-S
S-L
27.5
50% 100%
Legend.
C
Correct choice of graph for both sample sizes.
G
Good or reasonable choice of graphs; graph for smaller sample size is like population shape with
less variance than the population, graph for larger sample size is bell-shaped with less variance
than the small sample size graph.
L-S
Neither C nor G, but graph for larger sample size has less variance than graph for smaller sample
size.
S-L
Graph for larger sample size has more variance than graph for smaller sample size.
*
The student did not answer this item.
With the hope of adding more variability to the interview pool, a group of
master’s-level students enrolled in a graduate-level introductory statistics course at
the University of Minnesota were also invited to participate in an interview. Items
from one of their course exams that were similar to items on the diagnostic test were
identified. Students’ performance on these items, along with the students’ grades,
were used to select four students who would potentially represent a variety of levels
in statistical reasoning. All four of the graduate students participated in an interview.
An initial interview protocol (e.g., Clement, 2000) was developed, and the
problems and script were piloted with three of the undergraduate students. In the
initial protocol, students were asked to solve only a sampling distribution question
(see part 3 of the final version of the interview protocol). After reviewing the pilot
interviews, it became clear that responses to the individual question did not provide
enough information to clearly understand the reasons for students’ responses. More
information was needed on students’ understanding of basic statistical definitions,
306
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
concepts, and processes to better understand their statements and choices.
Consequently the number of tasks was expanded from one to four, and the interview
script and technique were revised to better probe students’ reasoning. Interviews
took 30–45 minutes, after which students were debriefed.
Identical interview scripts were used at each location. The final interview
protocol consisted of four parts:
Part 1—Students were asked to describe the population distribution presented in
Figure 2. They were then given empty axes and asked to draw a representation of
two samples from the population, one for a sample of size n = 4 and one for a
sample of size n = 16.
Part 2—Students were given 5 graphs representing possible empirical sampling
distributions for 500 samples (see Figure 3). They were asked to judge which
graph(s) best corresponded to empirical sampling distributions for samples of size 4
and for samples of size 16.
Part 3—Participants were asked to make true/false judgments for a series of
statements about samples and sampling distributions, as shown in Figure 4.
Part 4—They were shown the same population used in Parts 1 and 2 and asked to
select which graph in Figure 5 best represented a potential sample of size 50 from
that population.
Figure 2. Population distribution used in the student interviews.
307
REASONING ABOUT SAMPLING DISTRIBUTIONS
Figure 3. Population distribution and potential empirical sampling distributions.
1.
2.
3.
As the sample size increases, the samples look more like the normal
distribution, each sample will have the same mean as the population, and each
sample will have a smaller standard deviation than the population.
FALSE
As the sample size increases, the sampling distribution of means looks more
like the population, has the same mean as the population, and has a standard
deviation that is similar to the population.
FALSE
As the sample size increases, the sampling distribution of means looks more
like the normal distribution, has a mean that is the same as the population, and a
standard deviation that is smaller than the population standard deviation.
TRUE
TRUE
TRUE
FALSE
Figure 4. Item used in part 3 of the student interviews. (Correct answers: false, false, true)
308
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
Figure 5. Population distribution and potential empirical samples.
At the end of each part of the interview, students were asked to explain why they
had made their particular statements or choices and to indicate their confidence in
their choices.
Initially the videotapes were reviewed for evidence that students were at the
different levels of reasoning described earlier, as suggested by their written test
results. However, it was difficult to place students into a particular level of
reasoning when observing the videos. The data from the posttest and the clinical
interviews did not support the idea that a student is at one particular level or stage of
statistical reasoning as initially hypothesized. Instead, these interviews suggested
that students’ reasoning is more complex and develops along several interrelated
REASONING ABOUT SAMPLING DISTRIBUTIONS
309
dimensions. Building on other theoretical perspectives (e.g., Perkins, Crismond,
Simmons, & Unger, 1995; Case, 1985; Biggs & Collis, 1982), the following
“components” or “dimensions” of statistical reasoning behavior are proposed:
1. Fluency—how well the student understands and uses appropriate
terminology, concepts, and procedures
2. Rules—the degree to which a student identifies and uses a formal rule to
make predictions and explanations
3. Consistency—the presence or absence of contradictory statements
4. Integration—the extent to which ideas, concepts, and procedures are
connected
5. Equilibrium—the degree to which a student is aware of any inconsistencies
or contradictions in his or her statements and predictions
6. Confidence—the degree of certainty in choices or statements
The videotapes were reexamined to identify students who exhibited the extremes
of each dimension. Some examples follow.
Example 1—From part 1 of the interview, students were asked to describe the
population displayed. Two students, Jack and Allie, gave noticeably different
responses.
Jack:
Allie:
Um, some of it’s, you know, up, some of it’s down. There’s differences, so
it’s not like, you know, perfect. Skewed in a way, I guess, you could say …
I can tell you like the median and stuff.
Okay, it’s a multimode, uh graph, distribution, and it’s pretty symmetric in
one sense. I see that the average is 60 and the median is pretty close, so
that’s it’s … (inaudible) … pretty much normal, like a symmetric
distribution. Standard deviation is 7.5, which is actually pretty big if you
look at the scores here, which means there’s lots of variation in the
population.
Allie is able to use relevant statistical terms much more freely than Jack. She
immediately jumps to characteristics of shape, center, and spread on her own,
providing an evaluation of the amount of variability in the population and focusing
on the multimodal nature as a distinctive feature of the distribution.
Example 2—In part 2 of the interview, Karen and Allie correctly chose graph B for
n = 4 and graph A for n = 16 (Figure 3). They were then asked if they could explain
why the sampling distributions behaved this way.
Karen:
Because I just remember learning in class that it goes to a … when you
draw, it goes to a normal distribution, which is the bell-shaped curve. So, I
just look at graphs that are tending toward that … That’s just how … I
don’t know like the why, the definition. That’s just what I remember
learning.
310
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
Allie:
If you keep plotting averages; if you keep taking averages, the outliers are
going to actually be less, um, have less big effect on your data. So you’re
actually always dropping out those outliers. So it’s getting more and
more … they call it also, I think, regression toward the mean. I don’t know
if that term actually is used in this kind of situation, but since you’re
already … you’re always taking averages, the outliers get less efficient. No,
no, that’s not a word I’m looking for. Um, will have less effect, I’ll just use
that word, so it will get more and more narrower.
While both students made correct predictions, it would be difficult to classify
them at different levels of understanding. Karen appears to have partial mastery of
the formal terminology related to sampling distributions (normal distribution, the
bell-shaped curve), but is not able to explain in her own terms the process behind the
sampling distribution of the mean. Allie has more trouble finding the standard terms
and incorrectly applies the phrase “regression to the mean,” but seems to be viewing
the sampling distribution as a long-term process and to have more understanding of
the influences of the sample size on that process. Thus, students with different
mastery levels of the formal terminology can also show different abilities to
integrate the concepts and understand their statements.
Example 3—In part 1 of the interview, students were asked to draw a second
empirical sampling distribution when the sample size was changed from n = 4 to n =
16.
Betty:
Still keep the mean at 60 like I did with n = 4, but it’s going to have a …
oh, I’m not an artist I’m a nurse. It’s going to have a higher bell shape and a
narrower distribution. The standard deviation will be bigger for n = 4 than
16. So the standard deviation is narrower, it has a normal distribution
shape, a normal density curve, and the mean isn’t going to move anywhere.
The center of the density curve is always going to be in the mean of the
population.
Betty appears to be able to focus on both the shape and the spread of the
sampling distribution, using both dimensions in making her choices. When asked
about a sample of size 50 (part 4), she again appeals to the idea that “If it’s
randomly selected it should hopefully be going more toward a normal distribution,”
and considers graph D as too normal for a sample size of 50 and so chooses graph E
(both incorrect choices; see Figure 3). She did not appear to consider variation in
selecting the graphs, despite her earlier comments on variability; and she was not
able to clearly differentiate between the sample and the sampling distribution, thus
exemplifying the student who does not have complete integration of different
components of the concept.
Example 4—From part 2 of the interview (see Figure 3), Martha attempted to
correct an earlier response.
REASONING ABOUT SAMPLING DISTRIBUTIONS
311
Martha: I’m going to go for C for n = 4 and then 16 for … n = 16 for A. (laughs)
Yeah. And partially because … with n = 4, I’m thinking you’re going to
have a larger range … yeah, a larger range for n = 4 than you would for
n = 16. Because before I was guessing and I thought that the standard
deviation for a larger sample would be closer to the original than the
standard deviation for n = 4.
Further discussion with this student reveals that she has not quite settled on
whether the standard deviation of the sampling distribution will decrease as sample
size n increases, or will approach the population standard deviation. She recognizes
her inconsistency and continues to try to reconcile these two “rules” in subsequent
parts of the interview.
DISCUSSION
Sampling distributions is a difficult topic for students to learn. A complete
understanding of sampling distributions requires students to integrate and apply
several concepts from different parts of a statistics course and to be able to reason
about the hypothetical behavior of many samples—a distinct, intangible thought
process for most students. The Central Limit Theorem provides a theoretical model
of the behavior of sampling distributions, but students often have difficulty mapping
this model to applied contexts. As a result, students fail to develop a deep
understanding of the concept of sampling distributions and therefore often develop
only a mechanical knowledge of statistical inference. Students may learn how to
compute confidence intervals and carry out tests of significance, but they are not
able to understand and explain related concepts, such as interpreting a p-value.
Most instructors have welcomed the introduction of simulation software and
web applets that allow students to visualize the abstract process of drawing repeated
random samples from different populations to construct sampling distributions.
However, our series of studies revealed that several ways of using such software
were not sufficient to affect meaningful change in students’ misconceptions of
sampling distributions. Despite the ability of a software program to offer interactive,
dynamic visualizations, students tend to look for rules and patterns and rarely
understand the underlying relationships that cause the patterns they see. For
example, students noticed that the sampling distribution became narrower and more
normal as the sample size increased, but did not necessarily understand why this was
happening. Therefore, when asked to make predictions about plausible distributions
of samples for a given sample size, students would resort to rules, often
misremembered or applied inconsistently, rather than think through the process that
might have generated these distributions. As a result, we often noticed students’
confusion when asked to distinguish between the distribution of one sample of data
and the distribution of several sample means.
By experimenting with different ways of having students interact with a
specially designed simulation program, we have explored ways to more effectively
312
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
engage students in thinking about the processes and to construct their own
understanding of the basic implications of the Central Limit Theorem. Our research
has identified several misconceptions students have about sampling and sampling
distributions, and has documented the effects of learning activities that are designed
to directly confront these misconceptions. For example, having students make
predictions about distributions of sample means drawn from different populations
under different conditions (such as sample size), and then asking them to use the
technology to determine the accuracy of their predictions, appears to improve the
impact of the technology on their reasoning. By forcing students to confront the
limitations of their knowledge, we have found that students are more apt to correct
their misconceptions and to construct more lasting connections with their existing
knowledge framework. These learning gains appear to be significantly higher than
from using the technology solely for demonstrations by the instructor or asking
students to record and generalize specific observations made from the software.
Part of the problem in developing a complete understanding of sampling
distributions appears to be due to students’ less than complete understanding of
related concepts, such as distribution and standard deviation. We have found our
own research progressing backward, studying the instruction of topics earlier in the
course and the subsequent effects on students’ ability to develop an understanding of
sampling distributions. For example, initially we explored student understanding of
the effect of sample size on the shape and variability of distributions of sample
means. We learned, however, that many students did not fully understand the
meanings of distribution and variability. Thus, we were not able to help them
integrate and build on these ideas in the context of sampling distributions until they
better understood the earlier terminology and concepts. We are now studying ways
to help students better understand the basic ideas of distribution, variability, and
models to see if this will facilitate student learning about sampling distributions.
In the process of studying students’ reasoning about sampling distribution, we
identified several possible dimensions of student reasoning. These dimensions
provide an initial vocabulary for describing student behavior and for comparing
students. Accurate placement of students along the different dimensions will
facilitate prescription of specific interventions or activities to help students more
fully develop their reasoning. Interviews with students already suggest some
interesting relationships. For example, students with the least amount of fluency
(inaccurate definitions, misuse of terms) had the most difficulty reasoning about
sampling distributions. Some students were very consistent in their reasoning, while
others made contradictory and inconsistent statements. Among those who were
inconsistent, some were aware of the inconsistencies in their remarks and others
were not. It may be that students in a state of disequilibrium are more motivated to
learn about the sampling distribution process and more receptive to instruction.
There is also evidence that students can develop and follow rules to correctly predict
the behavior of sample means and still be unable to describe the process that
produces a sampling distribution. This suggests that they have not fully integrated
information about sampling distributions, despite their ability to correctly predict
behavior.
REASONING ABOUT SAMPLING DISTRIBUTIONS
313
A current goal of our research program is to explore the following set of
questions related to these dimensions of student reasoning:
•
•
•
•
How are the dimensions related to each other?
How does instruction affect each dimension? If instruction affects one
dimension, are other dimensions also affected (in positive or negative ways)?
How universal are the dimensions? Do they play a role in other types of
statistical reasoning?
Which dimensions are more important to the development of statistical
reasoning?
We believe these dimensions extend beyond the topic of sampling distributions
and can provide a window into how students’ more general statistical reasoning
develops.
To improve our ability to place students along each dimension, to explore the
relationships of the dimensions, and to document how these dimensions are affected
by particular learning activities, there is a clear need for new assessment tools. The
tools we developed in the course of studying students’ conceptions of sampling
distributions are quite different from the types of assessment typically used to
evaluate students learning of this topic. Figures 6 and 7 display two items we have
used to assess students’ ability to apply their understanding of sampling distributions
to problems with a context. While these multiple-choice items are quick to score,
they require students to apply their knowledge and allow deeper diagnostic analysis
of student responses than many traditional items. Additional assessment tools are
needed to better reveal the complexity of students’ reasoning about sampling
distributions.
Scores on a particular college entrance exam
are NOT normally distributed. A distribution
of scores for this college entrance exam is
presented in the figure below. The
distribution of test scores is very skewed
toward lower values with a mean of 20 and a
standard deviation of 3.5.
A research team plans to take a simple random sample of 50 students from different high schools
across the United States. The sampling distribution of average test scores for samples of size 50
will have a shape that is: (CIRCLE ONE)
a.
very skewed toward lower values.
b.
skewed toward lower values, but not as much as the population.
c.
shaped very much like a normal distribution.
d.
It’s impossible to predict the shape of the sampling distribution.
Explain your choice in detail: ____________________________________
Figure 6. College entrance exam item. (Correct answer: C)
314
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
American males must register at a local post office when they turn 18. In addition to other
information, the height of each male is obtained. The national average height for 18-yearold males is 69 inches (5 ft. 9 in.). Every day for one year, about 5 men registered at a
small post office and about 50 men registered at a large post office. At the end of each day,
a clerk at each post office computed and recorded the average height of the men who
registered there that day.
Which of the following predictions would you make regarding the number of days on
which the average height for the day was more than 71 inches (5 ft. 11 in.)?
a. The number of days on which average heights were over 71 inches would be greater
for the small post office than for the large post office.
b. The number of days on which average heights were over 71 inches would be greater
for the large post office than for the small post office.
c. There is no basis for predicting which post office would have the greater number of
days.
Explain your choice and feel free to include sketches in your explanation.
Figure 7. Post office item (based on an item from Well et al., 1990). (Correct answer: A,
since there will be more sampling variability with a smaller sample size.)
IMPLICATIONS
The few pages given in most textbooks, a definition of the Central Limit
Theorem, and static demonstrations of sampling distributions are not sufficient to
help students develop an integrated understanding of the processes involved, nor to
correct the persistent misconceptions many students bring to or develop during a
first statistics course. Our research suggests that it is vital for teachers to spend
substantial time in their course on concepts related to sampling distributions. This
includes not only the ideas of sampling, distributions of statistics, and applications
of the Central Limit Theorem but also foundational concepts such as distribution
and variability. Focus on these early foundational concepts needs to be integrated
throughout the course so students will be able to apply them and understand their
use in the context of sampling distributions.
While technological tools have the potential to give students a visual and more
concrete understanding of sampling distributions, mere exposure to the software is
unlikely to significantly change students’ deep understanding. More careful
implementation of the technology needs to be conducted to ensure students reach the
highest learning potential. The following recommendations stem from our own
research and research on conceptually enhanced simulations tools:
REASONING ABOUT SAMPLING DISTRIBUTIONS
•
•
•
•
•
•
315
Use the technology to first explore samples and compare how sample
behavior mimics population behavior. Instructional time needs to be spent to
allow students to become more familiar with the idea of sampling, to visually
see how individual samples are not identical to each other or identical to the
population, but that they do follow a general model. Furthermore, students
will then have sufficient knowledge of the software so that it is a more
effective tool instead of another distraction when they move to the more
complicated statistical concept of sampling distribution.
Provide students with the experience of physically drawing samples.
Activities such as having students take samples of colored candies from a
bowl, or using a random-number table to select observations from a
population list, give them a meaningful context to which they can to relate
the computer simulations. Otherwise, the computer provides a different level
of abstraction and students fail to connect the processes.
Allow time for both structured and unstructured explorations with the
technology. Students need to be guided to certain observations, but they also
need freedom to explore the concepts and to construct and test their own
knowledge. Some students will require a higher number of discrediting
experiences before they will correct their knowledge structure. Student
exploration and establishment of disequilibrium can also make them more
receptive to follow-up instruction.
Discuss the students’ observations after completing the activity. Students
need opportunities to describe their observations and understandings. This
can take place either in the classroom or through student writing
assignments. These discussions allow the instructor to highlight the most
crucial details that students need to pay attention to, so that students do not
feel overwhelmed with too much information or disconnected pieces of
knowledge, or focus on unimportant details such as the heights of the
simulated distributions rather than their shape and spread.
Repeatedly assess student understanding of sampling distributions. It is
important to carefully assess what students are learning and understanding
about sampling distributions at multiple times following the class activities.
Students need many opportunities to test their knowledge, and feedback
should be frequent and increasingly rigorous. Students’ knowledge will be
tenuous and needs to be reinforced. Additional assessment tools also need to
be developed and used.
Build on students’ understanding of sampling distributions later in the
course. It is important to build on these concepts throughout subsequent
units on inference. Instructors should actively refer to students’ tactile and
technological experiences with sampling distributions as they introduce ideas
of confidence and significance.
We hope that continued exploration of students’ reasoning as they interact with
simulation software will lead to better ways to help students develop a complete
understanding of the process of sampling distribution. Once this has been achieved,
316
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
students will be better able to develop an understanding of concepts related to
statistical inference, such as statistical confidence and statistical significance, and
should be more successful in their statistics courses.
REFERENCES
Behrens, J. T. (1997). Toward a theory and practice of using interactive graphics in statistics education. In
J. B. Garfield & G. Burril (Eds.), Research on the role of technology in teaching and learning
statistics: Proceedings of the 1996 International Association for Statistics Education (IASE)
roundtable (pp. 111–122). Voorburg, The Netherlands: International Statistical Institute.
Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York:
Academic Press.
Case, R. (1985). Intellectual development: From birth to adulthood. New York: Academic Press.
Clement, J. (2000). Analysis of clinical interviews. In A. Kelly & R. Lesh (Eds.), Handbook of research
design in mathematics and science education (pp. 547–589). Mahwah, NJ: Erlbaum.
Cohen, S. (1997). ConStatS: Software for Conceptualizing Statistics. Tufts University: Software
Curricular Studio. Retrieved April 23, 2003, from
http://www.tufts.edu/tccs/services/ css/ConStatS.html
Davenport, E. C. (1992). Creating data to explain statistical concepts: Seeing is believing. In Proceedings
of the Section on Statistical Education of the American Statistical Association (pp. 389–394).
delMas, R. (2001). Sampling SIM (version 5). Retrieved April 23, 2003, from http://www.gen.umn.edu/
faculty_staff/delMas/stat_tools
delMas, R., Garfield, J., & Chance, B. (1998). Assessing the effects of a computer microworld on
statistical reasoning. In L. Pereira-Mendoza, L. S. Kea, T. W. Kee, & W. Wong (Eds.), Proceedings
of the Fifth International Conference on Teaching Statistics (pp. 1083–1089), Nanyang Technological
University. Singapore: International Statistical Institute.
delMas, R., Garfield, J., & Chance, B. (1999a). Assessing the effects of a computer microworld on
statistical reasoning. Journal of Statistics Education, 7(3). Retrieved April 23, 2003, from
http://www.amstat.org/publications/jse/secure/v7n3/delmas.cfm
delMas, R., Garfield, J., & Chance, B. (1999b). Exploring the role of computer simulations in developing
understanding of sampling distributions. Paper presented at the Annual Meeting of the American
Educational Research Association, Montreal, Canada.
Doane, D. P., Tracy, R. L., & Mathieson, K. (2001). Visual Statistics, 2.0. New York: McGraw-Hill.
Retrieved April 23, 2003, from
http://www.mhhe.com/business/opsci/doane/show_flash_intro.html
Earley, M. A. (2001). Improving statistics education through simulations: The case of the sampling
distribution. Paper presented at the Annual Meeting of the Mid-Western Educational Research
Association, Chicago, IL.
Garfield, J. (2002). The challenge of developing statistical reasoning. Journal of Statistics Education,
10(3). Retrieved April 23, 2003, from http://www.amstat.org/publications/jse/
Garfield, J., delMas, R., & Chance, B. (1999). Developing statistical reasoning about sampling
distributions. Presented at the First International Research Forum on Statistical Reasoning, Thinking,
and Literacy (SRTL), Kibbutz Be’eri, Israel.
Garfield, J., delMas, R., & Chance, B. (2002). Tools for teaching and assessing statistical inference
[website]. Retrieved April 23, 2003, from:
http://www.gen.umn.edu/ faculty_staff/delmas/stat_tools/index.htm
Glencross, M. J. (1988). A practical approach to the Central Limit Theorem. In R. Davidson & J. Swift
(Eds.), Proceedings of the second international conference on teaching statistics (pp. 91–95).
Victoria, B.C.: Organizing Committee for the Second International Conference on Teaching Statistics.
Hodgson, T. R. (1996). The effects of hands-on activities on students’ understanding of selected
statistical concepts. In E. Jakubowski, D. Watkins, & H. Biske (Eds.), Proceedings of the Eighteenth
Annual Meeting of the North American Chapter of the International Group for the Psychology of
Mathematics Education (pp. 241–246). Columbus, OH: ERIC Clearinghouse for Science,
Mathematics, and Environmental Education.
REASONING ABOUT SAMPLING DISTRIBUTIONS
317
Hodgson, T. R., & Burke, M. (2000). On simulation and the teaching of statistics. Teaching Statistics,
22(3).
Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. R. (1987). Induction: Processes of inference,
learning, and discovery. Cambridge, Mass.: MIT Press.
Jennings, D., Amabile, T., & Ross, L. (1982). Informal covariation assessment: Data-based versus theorybased judgments. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty:
Heuristics and biases (pp. 211–230). Cambridge, UK: Cambridge University Press.
Jones, G. A., Langrall, C. W., Mooney, E. S., Wares, A. S., Jones, M. R., Perry, B., et al. (2001). Using
students’ statistical thinking to inform instruction. Journal of Mathematical Behavior, 20, 109–144.
Jones, G. A., Thornton, C. A., Langrall, C. W., Mooney, E., Perry, B., & Putt, I. (2000). A framework for
characterizing students’ statistical thinking. Mathematical Thinking and Learning, 2, 269–308.
Lane, D. M. (2001). HyperStat. Retrieved April 24, 2003, from http://davidmlane.com/hyperstat/
Lang, J., Coyne, G., & Wackerly, D. (1993), ExplorStat—Active Demonstration of Statistical Concepts,
University of Florida. Retrieved April 24, 2003, from http://www.stat.ufl.edu /users/dwack/
Lord, C., Ross, L., & Lepper, M. (1979). Biased assimilation and attitude polarization: The effects of
prior theories on subsequently considered evidence. Journal of Personality and Social Psychology,
37, 2098–2109.
Mills, J. D. (2002). Using computer simulation methods to teach statistics: A review of the literature.
Journal of Statistics Education, 10(1). Retrieved April 24, 2003, from:
http://www.amstat.org/publications/jse/v10n1/mills.html
Mooney, E. S. (2002). A framework for characterizing middle school students’ statistical thinking.
Mathematical Thinking and Learning, 4(1), 23–63.
Moore, D. (2000). Basic Practice of Statistics. New York: Freeman.
Moore, D., & McCabe, G. (2002). Introduction to the Practice of Statistics (4th ed.). New York:
Freeman.
Newton, H. J., & Harvill, J. L. (1997). StatConcepts: A visual tour of statistical ideas (1st ed.). Pacific
Grove, CA: Brooks/Cole.
Nickerson, R. S. (1995). Can technology help teach for understanding? In D. N. Perkins, J. L. Schwartz,
M. M. West, & M. S. Wiske (Eds.), Software goes to school: Teaching for understanding with new
technologies (pp. 7–22). New York: Oxford University Press.
Perkins, D. N., Crismond, D., Simmons, R., & Unger, C. (1995). Inside understanding. In D. N. Perkins,
J. L. Schwartz, M. M. West, & M. S. Wiske (Eds.), Software goes to school: Teaching for
understanding with new technologies (pp. 70–87). New York: Oxford University Press.
Perkins, D. N., Schwartz, J. L., West, M. M., & Wiske, M. S. (Eds.). (1995). Software goes to school:
Teaching for understanding with new technologies. New York: Oxford University Press.
Posner, G. J., Strike, K. A., Hewson, P. W., & Gertzog, W. A. (1982). Accommodation of a scientific
conception: Toward a theory of conceptual change. Science Education, 66(2), 211–227.
Ross, L., & Anderson, C. (1982). Shortcomings in the attribution process: On the origins and
maintenance of erroneous social assessments. In D. Kahneman, P. Slovic, & A. Tversky (Eds.),
Judgment under uncertainty: Heuristics and biases (pp. 129–152). Cambridge, UK: Cambridge
University Press.
Saldanha, L. A., & Thompson, P. W. (2001). Students’ reasoning about sampling distributions and
statistical inference. In R. Speiser & C. Maher (Eds.), Proceedings of The Twenty-Third Annual
Meeting of the North American Chapter of the International Group for the Psychology of
Mathematics Education (pp. 449-454), Snowbird, Utah. Columbus, Ohio: ERIC Clearinghouse.
Schwarz, C. J., & Sutherland, J. (1997). An on-line workshop using a simple capture-recapture
experiment to illustrate the concepts of a sampling distribution. Journal of Statistics Education, 5(1).
Retrieved April 24, 2003, from http://www.amstat.org/publications/jse/v5n1/schwarz.html
Schwartz, D. L., Goldman, S. R., Vye, N. J., Barron, B. J., & the Cognition Technology Group at
Vanderbilt. (1997). Aligning everyday and mathematical reasoning: The case of sampling
assumptions. In S. Lajoie (Ed.), Reflections on statistics: Agendas for learning, teaching and
assessment in K-12 (pp. 233–273). Hillsdale, NJ: Erlbaum.
Sedlmeier, P. (1999). Improving statistical reasoning: Theoretical models and practical implications.
Mahwah, NJ: Erlbaum.
Siegel, A., & Morgan, C. (1996). Statistics and data analysis: An introduction (2nd ed.). New York:
Wiley.
318
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
Simon, J. L. (1994). What some puzzling problems teach about the theory of simulation and the use of
resampling. American Statistician, 48(4), 290–293.
Snir, J., Smith, C., & Grosslight, L. (1995). Conceptually enhanced simulations: A computer tool for
science teaching. In D. N. Perkins, J. L. Schwartz, M. M. West, & M. S. Wiske (Eds.), Software goes
to school: Teaching for understanding with new technologies (pp. 106–129). New York: Oxford
University Press.
Thomason, N., & Cummings, G. (1999). StatPlay. School of Psychological Science, La Trobe University,
Bandora, Australia. Retrieved April 24, 2003, from:
http://www.latrobe.edu.au/ psy/cumming/statplay.html
Velleman, P. (2003). ActivStats, Ithaca, NY: Data Description, Inc. Retrieved April 24, 2003, from
http://www.aw.com/catalog/academic/product/1,4096,0201782456,00.html
Well, A. D., Pollatsek, A., & Boyce, S. J. (1990). Understanding the effects of sample size on the
variability of the mean. Organizational Behavior and Human Decision Processes, 47, 289–312.
REASONING ABOUT SAMPLING DISTRIBUTIONS
319
APPENDIX
Questions 4 through 9 from Sampling Distributions Posttest—Spring 2001
4)
The distribution for a population of test scores is displayed below on the left. Each of the
other five graphs labeled A to E represent possible distributions of sample means for
random samples drawn from the population.
Figure 1. Population Distribution.
320
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
4a) Which graph represents a distribution of sample means for 500 samples of size 4?
(Circle one.) A
B
C
D
E
4b) How confident are you that you chose the correct graph? (Circle one of the values
below.)
20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%
- Answer each of the following questions regarding the sampling distribution you chose for
question 4a.
4c) What do you expect for the shape of the sampling distribution? (Check only one.)
Shaped more like a NORMAL DISTRIBUTION.
Shaped more like the POPULATION.
- Circle the word between the two vertical lines that comes closest to completing the
following sentence.
4d) I expect the sampling distribution to have
less
the same
more
VARIABILITY than / as the
POPULATION
4e) Which graph represents a distribution of sample means for 500 samples of size 16?
(Circle one.) A
B
C
D
E
4f) How confident are you that you chose the correct graph? (Circle one of the values
below.)
20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%
- Answer each of the following questions regarding the sampling distribution you chose for
question 4e.
4g) What do you expect for the shape of the sampling distribution? (Check only one.)
Shaped more like a NORMAL DISTRIBUTION.
Shaped more like the POPULATION.
- Circle the word between the two vertical lines that comes closest to completing the
following sentences.
4h) I expect the sampling distribution to have
4i) I expect the sampling distribution
I chose for question 4e to have
less
the same
more
less
the same
more
VARIABILITY than / as the
POPULATION
VARIABILITY than / as the sampling
distribution I chose for question 4a.
REASONING ABOUT SAMPLING DISTRIBUTIONS
5.
321
The distribution for a second population of test scores is displayed below on the left.
Each of the other five graphs labeled A to E represent possible distributions of sample
means for random samples drawn from the population.
Figure 2. Population Distribution.
322
BETH CHANCE, ROBERT DELMAS, AND JOAN GARFIELD
5a) Which graph represents a distribution of sample means for 500 samples of size 4?
(Circle one.) A
B
C
D
E
5b) How confident are you that you chose the correct graph? (Circle one of the values
below.)
20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%
- Answer each of the following questions regarding the sampling distribution you chose for
question 5a.
5c) What do you expect for the shape of the sampling distribution? (Check only one.)
Shaped more like a NORMAL DISTRIBUTION.
Shaped more like the POPULATION.
- Circle the word between the two vertical lines that comes closest to completing the
following sentence.
5d) I expect the sampling distribution to have
less
the same
more
VARIABILITY than / as the
POPULATION
5e) Which graph represents a distribution of sample means for 500 samples of size 16?
(Circle one.) A
B
C
D
E
5f) How confident are you that you chose the correct graph? (Circle one of the values
below.)
20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%
- Answer each of the following questions regarding the sampling distribution you chose for
question 5d.
5g) What do you expect for the shape of the sampling distribution? (Check only one.)
Shaped more like a NORMAL DISTRIBUTION.
Shaped more like the POPULATION.
- Circle the word between the two vertical lines that comes closest to completing the
following sentences.
5h) I expect the sampling distribution to have
5i) I expect the sampling distribution
I chose for question 5e to have
less
the same
more
less
the same
more
VARIABILITY than / as the
POPULATION
VARIABILITY than / as the sampling
distribution I chose for question 5a.
REASONING ABOUT SAMPLING DISTRIBUTIONS
323
6.
The weights of packages of a certain type of cookie follow a normal distribution with
mean of 16.2 oz. and standard deviation of 0.5 oz.
Simple random samples of 16 packages each will be taken from this population. The
sampling distribution of sample average weight (_ these will still be x ’s) will have:
a. a standard deviation greater than 0.5
b. a standard deviation equal to 0.5
c. a standard deviation less than 0.5
d. It’s impossible to predict the value of the standard deviation.
7.
The length of a certain species of frog follows a normal distribution. The mean length in
the population of frogs is 7.4 centimeters with a population standard deviation of .66
centimeters.
Simple random samples of 9 frogs each will be taken from this population. The sampling
distribution of sample average lengths (the average, _) will have a mean that is:
a. less than 7.4
b. equal to 7.4
c. more than 7.4
d. It’s impossible to predict the value of the mean.
8.
Scores on a particular college entrance exam are NOT normally distributed. The
distribution of test scores is very skewed toward lower values with a mean of 20 and a
standard deviation of 3.5.
A research team plans to take simple random samples of 50 students from different high
schools across the United States. The sampling distribution of average test scores (the
average, _) will have a shape that is:
a. very skewed toward lower values.
b. skewed toward lower values, but not as much as the population.
c. shaped very much like a normal distribution.
d. It’s impossible to predict the shape of the sampling distribution.
9.
Consider any possible population of values and all of the samples of a specific size (n)
that can be taken from that population. Below are four statements about the sampling
distribution of sample means. For each statement, indicate whether it is TRUE or
FALSE.
a. If the population mean equals µ, the average of the sample means TRUE
FALSE
in a sampling distribution will also equal µ.
b. As we increase the sample size of each sample, the distribution of TRUE
sample means becomes more like the population.
FALSE
c. As we increase the sample size of each sample, the distribution of TRUE
sample means becomes more like a normal distribution.
FALSE
d. If the population standard deviation equals σ, the standard
deviation of the sample means in a sampling distribution is equal TRUE
FALSE
to σ n
Correct Answers:
4. (a) D, (c) normal, (d) less, (e) C, (g) normal, (h) less, (i) less
5. (a) C, (c) normal, (d) less, (e) E, (g) normal, (h) less, (i) less
6. C
7. B
8. C
9. true, false, true, true
PART III
INSTRUCTIONAL, CURRICULAR
AND RESEARCH ISSUES
Chapter 14
PRIMARY TEACHERS’ STATISTICAL
REASONING ABOUT DATA
William T. Mickelson and Ruth M. Heaton
University of Nebraska—Lincoln, USA
OVERVIEW
This study offers a descriptive qualitative analysis of one third-grade teacher’s
statistical reasoning about data and distribution in the applied context of classroombased statistical investigation. During this study, the teacher used the process of
statistical investigation as a means for teaching about topics across the elementary
curriculum, including dinosaurs, animal habitats, and an author study. In this
context, the teacher’s statistical reasoning plays a central role in the planning and
orchestration of the class investigation. The potential for surprise questions,
unanticipated responses, and unintended outcomes is high, requiring the teacher to
“think on her feet” statistically and react immediately to accomplish content
objectives as well as to convey correct statistical principles and reasoning. This
study explores the complexity of teaching and learning statistics, and offers insight
into the role and interplay of statistical knowledge and context.
THE PROBLEM
With the call for more statistics in the elementary curriculum (NCTM, 2000),
there is a need to consider ways to make statistics not only accessible and
understandable to K–6 teachers but also useful to their teaching practice. Recently,
the idea of teachers designing and implementing statistical investigations as a means
to teach topics across the elementary curricula, as well as the statistical skills and
reasoning involved in collecting, organizing, summarizing, and interpreting data has
been examined (Heaton & Mickelson, 2002; Lehrer & Schauble, 2002). Statistical
investigation within the established elementary curriculum has the potential to lend
meaningful and purposeful contexts for data collection, summarization, and
interpretation of data that are similar to the notion of authentic pedagogy and
327
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 327–352.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
328
WILLIAM T. MICKELSON AND RUTH M. HEATON
assessment advocated by Newmann & Wehlage (1997). This context of purposeful
investigation into nonstatistical curriculum topics is often absent in many
predeveloped or “canned” statistical activities, or when statistical content is viewed
only as isolated topics and the end of learning within a mathematics curriculum
(Lehrer & Schauble, 2002). In mathematics education, the goal of teaching
mathematical topics in meaningful ways (NCTM, 2000) clearly places great
intellectual demands on the teacher (Fennema & Nelson, 1997; Heaton, 2000;
Schifter, 1996), and mathematics educators are trying to better understand and
respond to these demands of practice. Analogously, a better understanding is needed
of teachers’ conceptions of statistics (Shaughnessy, 1992), the pedagogical content
knowledge required for teaching (Shulman, 1986), and how statistical knowledge is
used by teachers in teaching (Ball, Lubienski, & Mewborn, 2001) if we want to
understand and support their efforts.
BACKGROUND
The process of statistical investigation (Friel & Bright, 1997; Graham, 1987) is a
central topic in national statistics education guidelines in the United States. The
NCTM Standards (2000) state that students should:
formulate questions that can be answered using data … learn how to collect data,
organize their own or others’ data, and display the data in graphs and charts that will
be useful in answering their own questions. This Standard also includes learning
some methods for analyzing data and some ways of making inferences and
conclusions from data. (p. 48)
Schaeffer, Watkins, and Landwehr (1998), and Friel and Bright (1998), all argue
against teaching statistics in isolation from other areas of the curriculum. They
support extending and integrating statistics with other subjects and argue that such
an approach is an ideal way to improve students’ knowledge of both the content area
and the process of statistical investigation.
Despite the growing number of studies of students’ reasoning about statistical
information, only recently has attention been paid to teachers’ understanding of
statistical ideas. The methodology used to study teachers’ knowledge has focused
primarily on giving teachers isolated statistics problems (Watson, 2001; Watson,
2000) or studying teachers’ understanding of statistics based on their work as
learners of statistical content in a professional development setting (Confrey &
Makar, 2001) and assessing their competency. Implied in this approach is the aim to
help teachers acquire a competency of statistics comparable to proficient learners.
Our study approaches an investigation of teacher knowledge differently. In this
study, we examine a teacher's knowledge of data and distribution as it appears in the
records of a teacher’s practice, complementing previous work on teacher knowledge
in statistics education by situating how and for what purposes a teacher reasons
about statistical ideas while teaching. The empirical findings of this study support
PRIMARY TEACHERS' STATISTICAL REASONING
329
and explicate the report of what elementary teachers need to learn related to the
topic of statistics found in The Mathematical Education of Teachers (CBMS, 2001)
and contribute to meeting a portion of the agenda for research in mathematics
education found in the RAND Report (Ball, 2002). Our work addresses the need to
“distinguish teachers’ knowledge from the knowledge held and used by others who
draw from that same discipline in their work” (Ball, 2002, p. 16), specifically around
the topic of reasoning with data and distribution. What this means is that the
reasoning about data and distribution done by a teacher applying statistical
reasoning in teaching takes a form and has complexities different from students
merely learning to reason about data and distribution.
SUBJECT AND METHOD
This study employs a descriptive qualitative analysis to examine one third-grade
teacher’s statistical reasoning about data and distribution (Creswell, 1998) in the
context of three classroom investigations with students. A profile of the teacher is
followed by descriptions of the context and investigations, and methods of data
collection and analysis.
The Teacher and Context
The third-grade teacher featured in this study, Donna (pseudonym), has been
teaching elementary school for 16 years. She was highly regarded by her principal
and peers, being described as “extremely thoughtful and planful in all that she does,
structured and organized yet flexible” (Interview, 2/2/01). Because of her interest
and attitude toward the notion of statistical investigation as a means for teaching
content across the elementary curriculum, we purposefully selected Donna for this
study. Previously, Donna was a participant in the American Statistical Association’s
(ASA) Quantitative Literacy Project (Scheaffer, 1988), where she experienced and
learned statistics through hands-on activities designed to teach statistical concepts.
Furthermore, she has supported children’s participation in ASA project competitions
with help from members of the local ASA chapter.
We initially interacted with Donna as facilitators of a professional development
workshop on merging statistical investigation with topics of the K–6 curriculum.
During the workshop, Donna routinely demonstrated highly competent statistical
reasoning skills about data and distribution, successfully completing data collection,
graphing, and interpretation activities such as, “Is Your Shirt Size Related to Your
Shoe Size,” and “Gummy Bears in Space: Factorial Designs and Interactions”
(Schaeffer, Gnanadesikan, Watkins, & Witmer, 1996). Collaboration with Donna
carried over into the subsequent academic year when she agreed to participate in this
study. Between August and December 2000, Donna created and implemented seven
units that merged the process of statistical investigation with topics in the thirdgrade curriculum. These investigations, detailed in Table 1, vary in degree of
330
WILLIAM T. MICKELSON AND RUTH M. HEATON
complexity with regard to framing the problem, the analysis, and the teacher’s
familiarity with conducting the investigation or one similar.
During most of these investigations, Donna continued to exemplify correct
statistical reasoning. She routinely guided her class through the identification of
salient variables, collection and organization of relevant data, summarization of data
into correct graphical representations, interpretation of findings in context, and the
correct instruction of statistical concepts like distribution and its importance in
making predictions.
Donna’s Steven Kellogg Author Study, Animal Habitats, and Dinosaur
Investigations were selected for presentation in this study, for several reasons. First,
each had a larger purpose beyond teaching the process of statistical investigation,
with clear learning goals about content. Second, constructing and implementing the
investigations to suit the teacher’s content area learning goals challenged the
teacher’s statistical reasoning ability. Finally, these three activities highlighted
different strengths and limitations in the teacher’s statistical knowledge use in
teaching. For each of these examples, description of the activity is given and the
statistical issues articulated and analyzed. Therefore, what this study represents are
the best efforts of a highly experienced and competent teacher, with previous
training and interest in statistics, to teach in ways that merge statistical investigation
with the study of topics in the elementary curriculum.
Table1. Statistical investigations of participating teacher
Investigation
Topic
Getting to
Know You
Dates
Duration
Purpose
Aug. 21
– Sept. 1
2 weeks
Animal
Habitats
Steven Kellogg
Author Study
Election
Oct.
2–20
Oct.
9–13
Nov.
1–7
3 weeks
Math
Nov.
1–7
Nov.
13–14
1 week
For students to get acquainted with each other at the
beginning of the semester. Variables: favorite foods,
books, and pets. Students in class make up sample. Bar
graphs used to summarize data. Statistical lesson on
distribution and prediction.
To study natural habitats of animals and compare to
human-made habitats of zoo animals. Described in text.
To study the writing characteristics of a children’s author.
Described in text.
To do a poll and compare school results with district, city,
and national outcomes. Variables: for/against candidates
and local issues on ballot. Sample selected from school at
large. Bar graphs and percentages used to summarize
data. Statistical lesson: graphing and prediction.
Lessons to read and interpret different types of graphs.
Nov.
6–30
3 weeks
Cold Remedies
Dinosaurs
1 week
1 week
1 week
Health and social studies related topic to survey families
about cold remedies. Variables: family demographics,
preferred cold remedies (homemade versus store-bought
drugs), prevention. Sample included students’ extended
families and neighbors. Multiple comparative bar graphs
to summarize and interpret data. Statistical lesson:
variability, comparing graphs by differences in
distribution.
To apply research skills to learn about dinosaurs, create
graphs and draw conclusions. Described in text.
PRIMARY TEACHERS' STATISTICAL REASONING
331
Merging Statistical Investigation with Curriculum Topics
In each statistical investigation, to the extent possible, Donna situated student
learning in a much broader context so that the investigation was more than data
collection and graphing. In the broader context, she connected the investigation to
numerous other curriculum standards and content learning outcomes that she was
required to cover and teach over the course of a year. During an interview, Donna
commented on her goals and the overall nature of the Dinosaur Investigation. She
stated,
“The thing for me was to correlate the curriculum.” This means she took into account
the entire scope of district learning standards and objectives, looking for ways to
simultaneously address as many as possible. In summary, Donna stated, “So you can
tell that with this we cover just about every part of the curriculum in one way or
another, and all these main objectives that I need to meet. And I did them all basically
thinking about this project and then branching out from there” (Interview 2/15/01).
Data Collection
Friel and Bright (1998) noted in the study of their own effort at teacher
education, “much may be ascertained about teachers’ understanding of [statistics]
content by watching them teach” (p. 106). We embraced this perspective and
remained observers during the data collection phase, not attempting to intervene or
influence what happened. Any statistics-related question initiated by the teacher
during the data collection phase was promptly answered. Concerns we might have
had, however, were retained until the conclusion of this phase when we debriefed
the teacher. This approach is consistent with Jaworski’s (1994) idea of investigative
teaching.
All lessons involving the process of statistical investigation were videotaped.
The video camera was placed in the subject’s classroom for the entire semester. One
of the researchers tried to be present during each lesson; however, there were times
when this was not possible. In these few instances, the teacher ran a video camera
stationed on a tripod throughout the class period. Data sources collected in the
classroom context include videotapes and transcripts of classroom interactions,
classroom artifacts including student work and teacher produced materials,
researcher field notes, and the teacher’s written plans and reflections. Additional
data include audiotaped interviews and transcripts, and products from the summer
workshop. Data from the classroom context documents the teacher’s current
practice, while the summer workshop data gives evidence of the statistical skills and
abilities the teacher had prior to conducting statistical investigations with her thirdgrade students.
332
WILLIAM T. MICKELSON AND RUTH M. HEATON
Data Analysis
Data analysis focuses primarily on records of practice from the classroom
statistical investigations. Initially, classroom artifacts and end products from each of
the statistical investigations were examined to look for evidence of the teacher’s
statistical reasoning about data and distribution. The teacher’s choices for organizing
and representing data are embedded in the artifacts. From these we infer the
teacher’s statistical knowledge and reasoning. We supplement our analysis with the
teacher’s own language as she teaches, found in transcripts of classroom
interactions, and as she reflects on teaching, found in transcripts of interviews.
Additionally, selected transcripts were analyzed for evidence of the teacher’s use of
statistical knowledge. In several instances the teacher’s statistical knowledge, as
evidenced during the summer workshop and other classroom investigations like
Getting to Know You (Table 1), were compared to findings of teacher knowledge
within the classroom data. Instances of disparity in findings of teacher knowledge
from these two contexts prompted detailed analysis of artifacts as well as associated
videotapes and transcripts, and gives evidence about the importance of context in
teacher knowledge acquisition, its application, and research. Observation notes and
the teacher’s written plans and reflections offered opportunities to triangulate
findings.
RESULTS
For the three investigations studied, we describe the investigation. We also
articulate, analyze, and infer Donna’s statistical reasoning about data and
distribution.
The Steven Kellogg Author Study Investigation
Throughout the elementary years, children are commonly introduced to a variety
of authors and illustrators through Author Studies. According to Donna, the purpose
of an Author Study is
To have them begin to see writing styles … I want them to be able to say, this is his
voice, this is what he does … I want them to know about his style … We’ve been
learning about character traits and that’s real hard for them. (Interview, 10/5/00)
Steven Kellogg is one author and illustrator whose works are part of the thirdgrade curriculum set by the district and the object of this investigation.
After reading five books written or illustrated by Steven Kellogg, Donna posed
two questions to her students. She asked, “What are the things that are most true
about Steven Kellogg?” and, “What are we observing from reading his books?”
PRIMARY TEACHERS' STATISTICAL REASONING
333
These questions launched the statistical investigation and generated the following
brainstormed list of ideas generated by students:
•
•
•
•
•
•
•
•
•
•
Likes to draw
Fantasy/imagination
Likes animals
No pets as a child
Drawings—accuracy and detail
Puts Pinkerton in many books
Thought bubbles
Signs
Uses his pets in his stories
Dedicates his books to his family
This list became the basis for further data collection. Based on this list, Donna
reported,
I made graphing forms for each student, typed a list of all Kellogg’s books in the
room, and assigned each book a letter. Students fill in spaces on their form for author
characteristics by using the letter for that book. That way when we compile our
results as a class we won’t record a book twice … The part I see as difficult and yet
exciting for me is to figure out how to help them compile their information. If we had
unlimited time we could hammer more of this out together. (Interview 10/10/00)
The first task was to make the transition from individual student data to a whole
class set of data. Having individual students first collect their own data
accomplished several objectives. Students could make choices about specific books
read, many Kellogg books could be read by the class as a whole, and each student
had responsibility for making judgments about author characteristics. About
compiling the class data, Donna said:
When they were all done reading, I just did this as a class exercise. They’d raise their
hand. I’d say, okay, does anybody have a book that used thought bubbles? And then
as quickly as we could, we put this together. (Interview 2/00)
Figure 1 is a reproduction of the final product. It simultaneously represents both
the organizational structure and summary of this data. This graph became the means
by which Donna was able to converse with students about the differing qualities or
traits observed in Kellogg’s work and how often those qualities or traits were
noticed in his writing or illustrations. In Figure 1, the horizontal axis is a list of traits
similar to those brainstormed by students. The vertical axis is the number of books
that exhibit the trait. The identity of each book is preserved through the letter codes
Donna assigned. The codes are used in the development of the body of the bar
graph. Notice that there is no intended ordering of the letters; rather, it is a function
of which students raised their hands and the order the teacher called on them.
334
WILLIAM T. MICKELSON AND RUTH M. HEATON
Figure 1. Graphical artifact of author study investigation.
PRIMARY TEACHERS' STATISTICAL REASONING
335
Statistical Issues and Reasoning during the Author Study Investigation
Inferring Donna’s understanding of data and distribution from what is present
and missing is one mechanism to analyze the graph of Figure 1. The data
summarized in this graph do allow Donna and the class to answer the basic
investigation question, “What are we observing from reading his (Kellogg’s)
books?” Donna used the graph in Figure 1 during class in three ways. First, the class
discussed the list of observed traits to understand the varying techniques and content
Kellogg includes in his books. Second, Donna’s statistical questioning helped
students determine which of the observed traits was more or less prevalent in the
books that were read. Finally, the class used the graph to refer to and describe
individual books. They accomplished this by focusing on a letter and determining
which columns (or traits) contain that letter and which do not. As such, Donna sees
the investigation as being successfully completed. The type of analysis and
discussion the class had with the summarized data in Figure 1 is consistent with the
typical types of questions posed and answered when analyzing graphs in lessons in
their mathematics textbook. From the evidence presented thus far, Donna’s
statistical reasoning includes correct identification of the case (or experimental unit)
of this study, care in preventing double counting since the case is not a person,
facilitation of a comprehensive sample (census) covering Kellogg’s work, and a
graphical summarization of data indicating a conception of distribution as tallies or
counts of a trait.
A closer look at the graph in Figure 1 reveals that the salient features of a bar
graph, namely the mutually exclusive or exhaustive condition that defines
categories, are missing. The intent of a bar graph is to display the distribution of data
for a categorical variable (Utts, 1999; Bright & Friel, 1998). Donna uses the 10
observed traits stemming from the brainstorming session as the categories in the
graph. In doing this, each of Kellogg’s books is counted in multiple categories
because the traits are not mutually exclusive or exhaustive. As such, Figure 1 is not
a true bar graph; what this implicitly and instructionally does is give students a
limited and incorrect perspective on bar graphs and what information they convey.
By the nature of the graph, Donna was precluded from being able to teach and
discuss the statistical concept of distribution of a categorical variable, resulting in
questions focused only on identifying traits with the most and least counts. In
addition, the graph does not help answer the investigation question, “What are the
things that are most true about Steven Kellogg?” since the graph does not give
information on the alternatives to the listed traits. For example, Thought Bubbles
may not be the most prevalent (i.e., most true) mechanism to convey character’s
thoughts since the alternatives are not given. Also, Figure 1 does not address the
overarching purpose of an Author Study in Donna’s conception, namely, conveying
the “writing style” of the author.
One could infer from this data that Donna’s reasoning about data and
distribution is at a low or naïve level since Figure 1 conveys a conception of data
that is merely counts, the bar graph is not technically correct, and it does not address
the overarching purpose of the investigation. This interpretation, however, is
336
WILLIAM T. MICKELSON AND RUTH M. HEATON
contrary to Donna’s performance in the contexts of the summer workshop and the
other classroom investigations like Getting to Know You (Table 1). In these
contexts, Donna created technically correct bar graphs and taught the concept of
distribution. The discrepancy between Donna’s classroom performance during the
Author Study and prior performance in other contexts raises the questions, what are
the contextual differences, and how did the context influence her reasoning? The
main difference is that the Author Study investigation does not have a well-defined
or naturally occurring implicit set of variables connected to the study. For example,
if Donna wanted to focus on writing styles, she would have to define style and
operationalize how to measure it. Donna’s pedagogical decision to give ownership
to students through their brainstorming of observed traits resulted in the data for this
investigation being tallies and counts, which was consistent with and addressed the
investigation, “What are we observing from reading Kellogg’s books?” The
subsequent process of data collection, summarization, and discussion being so
similar to her prior experience teaching from a textbook gave Donna no reason to
question the investigation in any deeper sense. Lack of experience with the process
of statistical investigation and the overall success of the investigation in terms of
meeting her content expectations may have contributed to her oversight of not
checking to see if the result addressed the original purpose of the investigation
(NCTM, 2000). At the same time, there was not a well-defined, compelling, contentdriven need to derive additional meaning from the graph of Figure 1 that would
prompt Donna to make this connection between purpose and product of the
investigation.
What was lost was an excellent opportunity to illustrate how to define and create
multiple variables, construct multiple graphs on different aspects of the books, and
to synthesize information across multiple graphs to characterize Kellogg as a writer
and illustrator. Despite these difficulties, the Author Study was a dramatic
improvement over most author studies that focus on personal information about the
author without ever attending directly to the author’s writing style, and many of the
nonstatistical learning goals Donna had for students were achieved through this
approach. Donna’s statistical reasoning about data and distribution is at a relatively
low level, particularly in terms of the lack of discussion on distribution; however,
her reasoning is contextually influenced and not consistent with her statistical
reasoning in other similar contexts.
The Animal Habitat Investigation
Animal Habitats is a unit Donna routinely teaches. Typically, students do library
research and presentations on the habitat of an animal of their choice. During the
year of this case, Donna planned to merge her usual unit on Animal Habitats with a
statistical investigation to evaluate human-made animal habitats found at the local
children’s zoo. Donna obtained a list of animals residing at the zoo, and students
each selected one to study. First, students researched their animal’s natural habitat in
the library. This phase was guided by the four components of an animal’s habitat
initially studied in science class, specifically focusing on climate, food and water,
PRIMARY TEACHERS' STATISTICAL REASONING
337
shelter, and space. Other topics researched by students included the animal’s main
form of protection, ways it reproduces, places it lives, and other facts of interest.
Library inquiry was followed by a statistical investigation evaluating the local
zoo. The statistical investigation was based on the following questions: “Is your
animal being treated at the zoo like you think it should be based on your research on
how they live in the wild?” and, “Is the zoo doing a good job providing a proper
habitat for your animal?” To quantitatively address these questions, Donna and the
class used their library research to develop a rating scale. Relying on previous
experience with rating scales from their self-evaluations of writing assignments in
language arts, Donna and the students made a table with numbers 1–5 running
across the top and the four components of habitat (space, climate, shelter, food and
water) listed down the far left side. For each component, students described both the
best and worst possible conditions. The best conditions were likened to what is
found in an animal’s natural habitat, the worst were far from resembling the natural
habitat. As students generated ideas, Donna wrote descriptors under the “1” for
worst and “5” for best. Donna realized that “the best” descriptors represented
students’ understanding of the best conditions in the wild. This prompted reframing
the discussion to be about “the best” that could be obtained in zoo exhibits (see
Table 2).
Table 2. Beginnings of a rating scale
HABITAT
NEEDS
Food and
Water
Space
Shelter
Climate
1
2
Pellets
Polluted Water
Not enough
Too many animals
Small cage
Cement floor
Wrong type
No shelter
Too hot, natural (i.e.,
open to outdoors)
Too cold
Too changeable
3
4
5
Natural food
Clean, pure water
Plenty
Plenty of room
Soil, plants, floor
Correct type
Appropriate shelter
Same as (i.e., same as
their original natural
habitat)
In planning the zoo field trip, Donna decided students should not concern
themselves with applying the rating scale during the zoo visit. Rather, she wanted
them to observe the habitat, take field notes, and carefully consider what they had
observed. Donna gave each student a small notebook with directions for taking
detailed descriptive field notes. The students’ field notes became the primary data
source for the statistical investigation and were converted into the numerical rating
scores.
When the students transferred their observation notes to numerical ratings,
Donna reviewed how they could use the numbers of the rating scale to represent the
338
WILLIAM T. MICKELSON AND RUTH M. HEATON
quality of their zoo animal’s habitats. Helping students realize that numbers in a
rating scale have an assigned meaning was Donna’s way of teaching number sense
(Lajoie, 1998). One student likened the scale to a report card and framed the task as
assigning the zoo a grade for each observed component of habitat. To further help
students transfer observations to ratings of quality, Donna used faces with different
types of expressions to illustrate different levels of quality. A smiling face was
associated with 5, a frowning face associated with 1, and an expressionless face
associated with 3. Each student read over his or her observation notes and rated each
component of habitat in their particular animal’s zoo exhibit.
In preparation for the statistical analysis, Donna and the class decided on
additional variables to help evaluate how well the zoo was caring for the animals.
These variables were size of the animal (small, medium, or large) and animal’s
habitat in wild natural settings (grassland, rain forest, desert, forest, and inland
waterway). Donna helped students identify variables and delineate attributes to draw
conclusions about whether
“they (the zoo) do better in one habitat than another? Better with large animals or
small animals?” (Interview, 10/5/00).
To facilitate data collection and organization, Donna distributed four 3×5
colored cards of the same color to each student. The purpose of the colored cards
was to represent the different types of habitat in the wild. Students wrote their
animal’s name on each of four cards and placed a symbol in the lower right-hand
corner to represent the size of their animal. Donna constructed a huge table on paper
that crossed the habitat needs of the animals by levels of the rating scale. She hung it
at the front of the room, and each student placed their four colored cards in the table
to indicate a rating within each habitat category. Figure 2 represents the final table
developed by the class. From Figure 2, the class attempted to observe patterns in the
data and draw inferences about the zoo’s performance in caring for the animals
residing there. Toward the end of the unit, Donna invited the zoo director to visit
and discuss student findings and questions.
Statistical Issues and Reasoning during the Animal Habitat Investigation
In the context of the Animal Habitat investigation, unlike in the Author Study,
Donna understands the need to create variables to evaluate the zoo’s performance in
creating animal habitats. In developing the rating scale, Donna transferred
knowledge from a similar situation she experienced during language arts,
demonstrated knowledge about the content and concept of a rating scale, and was
pedagogically and statistically sophisticated in developing this knowledge with
children. Donna used multiple mechanisms, like description, smiling faces, and
analogy, to convey the meaning of the rating scale used to self-evaluate writing by
students to help them develop number sense in relationship to a new scale. She
concerned herself with the potential distribution of scores—as demonstrated by her
revision of the scale when it became clear that the scale, as originally conceived,
PRIMARY TEACHERS' STATISTICAL REASONING
339
would not show much variability in the data. Finally, she was concerned about
objectivity demonstrated by having students translate written field notes into rating
scores instead of immediately applying the rating rubric at the site. Donna exhibits
exemplary reasoning about data, teaching students how to create new variables to
suit investigator purposes and how people attribute meaning to numbers.
A potentially curious juxtaposition occurs where the focus of the investigation
changes from creating variables and collecting data, to organizing, summarizing,
and interpreting the data. Lehrer & Schauble (2002) stress the importance of
organizing and structuring the data in a table format. In such a structured data table,
the rows typically correspond to the individual animals (cases) and columns
correspond to the different variables—like food and water rating, shelter rating,
climate rating, space rating, habitat of origin category, and size of animal category.
Table 3 illustrates how the Animal Habitat data could have been organized
following these principles. Figure 2 represents how Donna structured and
summarized the data for interpretation with the class. The teacher used a table to
organize data from the Animal Habitat investigation. The organizing structure to the
table, however, was the rating scale. With color coding (i.e., shading) for the habitat
of origin and symbols for the size of the animal, the data structure in Figure 2
attempts to capture the totality of data across all variables. Donna made no other
reorganization or graphical summarization of the data during the remainder of the
investigation.
The decision at the beginning of the statistical analysis to structure the data table
using the numbers of the rating scale as the organizational feature of the table raises
potential questions about Donna’s conception of the rating scale, variables, data, and
the process of structuring, organizing, and graphing data. The temptation, as in the
Author Study, is to quickly infer that Donna has a deficit in statistical understanding
or a naïve, low level of statistical reasoning about data and distribution. This
oversimplifies the context of the Animal Habitat investigation and the complexity of
understanding Donna’s statistical reasoning in the context of teaching. In the
Dinosaur Investigation (see next section), Donna implements a structured data table
consistent with the recommendation of Lehrer and Schauble (2002). During the
summer workshop, Donna successfully completed two data collection and graphing
activities that employed multiple variables. From this evidence, we know she is
capable of correctly structuring a data table and graphing complex data.
For evidence of how Donna is reasoning about data and distribution, we turn to
her comments during an interview:
And the one thing we can do by looking at this (Figure 2) is say, well, how do you
think the zoo did? Well, they thought they did pretty well because they have a whole
lot more cards in the fours and fives. I wanted them to come up with did they do
better in one habitat than another. Better with large animals or small animals? And
that did not come out exactly the way that I wanted. (Interview, 10/21/00)
340
WILLIAM T. MICKELSON AND RUTH M. HEATON
1
Gila Monster
(m)
Otter (m)
SPACE
Otter (M)
2
NG Singing Dog
(m)
Blood Python
(m)
Dwarf Crocodile
(m)
Tamarin (s)
Spec. Bear (l)
3
Tree Kangaroo
(m)
Meerkat (s)
4
Zebra Mice (s)
5
Baboon (l)
Spec. Bear (l)
Bald Eagle (m)
Boa (m)
Poison Arrow
Frog (s)
Iguana (m)
Standing Gecko
(s)
Amur Leopard
(l)
Gila Monster
(m)
Bactrian Camel
(l)
Tamarin (s)
CLIMATE
Otter (M)
Zebra Mice (s)
Meerkat (s)
Bald Eagle (m)
Tamarin (s)
SHELTER
Tree Kangaroo
(m)
Dwarf Crocodile
(m)
Spec. Bear (l)
Zebra Mice (s)
Golden Pheasant
(m)
Tree Kangaroo
(m)
Bald Eagle (m)
NG Singing Dog
(m)
Poison Arrow
Frog (s)
Iguana (m)
Bactrian Camel
(l)
Gila Monster
(m)
Amur Leopard
(l)
Standing Gecko
(s)
Bactrian Camel
(l)
Amur Leopard
(l)
Standing Gecko
(s)
Zebra Mice (s)
Blood Python
(m)
Tree Kangaroo
(m)
Dwarf Crocodile
(m)
Baboon (l)
Bald Eagle (m)
Meerkat (s)
Boa (m)
NG Singing Dog
(m)
Blood Python
(m)
Baboon (l)
Boa (m)
Poison Arrow
Frog (s)
Iguana (m)
FOOD & WATER
NG Singing Dog
(m)
Dwarf Crocodile
(m)
Standing Gecko
(s)
Otter (m)
Gila Monster
(m)
Baboon (l)
Iguana (m)
Bactrian Camel
(l)
Spec. Bear (l)
Amur Leopard (l)
Poison Arrow
Frog (s)
Legend
Forest
Grasslands
Desert
Inland Waterway
Figure 2. Animal habitat evaluation data.
Rainforest
PRIMARY TEACHERS' STATISTICAL REASONING
341
Table 3. Structured data table for the investigation of animal habitat at the local children’s zoo
Animal Name
Gila Monster
Otter
Size
Medium
Medium
NG Singing Dog
Blood Python
Dwarf Crocodile
Tamarin
Tree Kangaroo
Meerkat
Bald Eagle
Standing Gecko
Amur Leopard
Zebra Mice
Speckled Bear
Boa
Bactrian Camel
Baboon
Poison Arrow Frog
Iguana
Medium
Medium
Medium
Small
Medium
Small
Medium
Small
Large
Small
Large
Medium
Large
Large
Small
Medium
Habitat
Origin
Desert
Inland
waterway
Grasslands
Rain forest
Rain forest
Rain forest
Rain forest
Desert
Forest
Rain forest
Grasslands
Desert
Grasslands
Rain forest
Desert
Grasslands
Rain forest
Rain forest
Space
Rating
1
1
Climate
Rating
3
1
Shelter
Rating
4
1
Food &
Water
Rating
5
5
2
2
2
2
3
3
3
3
3
4
4
4
4
5
5
5
4
5
5
2
5
5
5
5
5
5
2
5
4
5
4
4
5
5
3
3
3
3
2
4
4
2
3
5
4
5
5
5
4
5
4
3
3
3
3
4
5
2
5
5
5
5
5
5
Looking closely at Figure 2, if we ignore shading and the animal size indicator,
then focus on one habitat quality, like space, the data summarization is that of a bar
graph. The numbers of the rating scale indicate membership in an evaluative
category. Furthermore, by crossing the numbers of the rating scale with habitat
components (space, climate, shelter, food and water), as in a contingency table,
Figure 2 simultaneously conveys the distributions of four bar graphs. This is a very
creative means of organizing and summarizing the data so that the main
investigation question can be addressed. Donna’s desire to have students “come up
with did they [zoo] do better in one habitat than another” is met because the table
allows performance comparison across all four components of habitat. At the same
time, this approach is not effective for the natural habitat in the wild and size of the
animal variables, which require a restructuring or regraphing of the data table to
make these comparisons.
In the Animal Habitat investigation, considerable class time was devoted to
library research, development, and student understanding of the rating scale, data
collection, and application of the rating scale. The timing of the visit by the zoo
director had a major impact on the data organization and analysis. In fact, Donna did
not perceive herself or the students as done with the analysis at the time of the zoo
director visit. Donna states:
I had hoped to bring more closure to it (Animal Habitat investigation) but (1) we ran
out of time due to end of the quarter testing and Red Ribbon anti-drug lesson, (2) it
342
WILLIAM T. MICKELSON AND RUTH M. HEATON
seemed anticlimactic after our visit from the zoo director, (3) I couldn’t decide how
to revisit the rating scale (i.e., data table) and make it meaningful to the students. My
instincts said it was time to stop. (Interview, 10/21/00)
In a journal entry (10/12/00), Donna comments on the data table of Figure 1 and
the overall investigation:
We just weren’t able to draw many conclusions from our rating scale. Maybe too
much information? I’m sure the process could have been improved but I really feel
we’ve gained from the muddling through this. The children and I have been exposed
to new ways of doing research, asking questions, and thinking.
The “too much information” comment refers to the color coding (shading) for
wild habitat variable and numbers indicating animal size that made the data table
busier and hard to interpret. Complementing this perspective, Donna later
commented on the other variables of the investigation included but not analyzed in
the data table:
Maybe the numbers didn’t show it, but it opened up discussion. As we discussed it,
they (the students) realized that it was easier to do a good job with the camel who
lives in cold and hot climates, just like our state, and therefore it could be outside and
it was okay, better than an animal from the rain forest that had to live inside,
especially a large animal from the rain forest, those kind of things. And so we did get
some information out of the table. (Interview 10/21/00)
For Donna’s class, this is a very different type of discussion and process than
what usually happened before and after a zoo trip.
Within the Animal Habitat investigation, Donna’s reasoning about data and
distribution is mixed and contextually driven. Donna exhibits exemplary teaching of
data and variable creation to suit the purpose of the investigation, while the product
of the investigation (Figure 2) is more complex. Under the time pressure imposed by
the visit of the zoo director, Donna uses basic counts within categories (level of
rating scale) to summarize the data. Indicating a very flexible understanding of data
organization and summarization, Donna augments the count data with a creative use
of a pseudo-contingency table format to display four distributions for comparative
purposes in one table. The comparison of the components of habitat to evaluate zoo
performance was at a rudimentary level, “because they (one component) have a
whole lot more cards in the fours and fives.” Decisions at the beginning of the
investigation, like those to evaluate zoo performance relative to animal size and wild
habitat of origin, were initially appealing and interesting. Under time constraints,
however, these variables were secondary, their analysis forgone for the sake of the
main investigation question, yet they were the source of confusion as Donna tried to
preserve this information in the data table. In addition to the Animal Habitat
investigation being an exemplar of authentic teaching and learning, it also illustrates
the degree to which a teacher’s reasoning about data and distribution is inextricably
connected to the pedagogical decisions the teacher makes.
PRIMARY TEACHERS' STATISTICAL REASONING
343
The Dinosaur Investigation
The Dinosaur unit was one Donna also had conducted numerous times in the
past with her third-grade classes. Similarly, the statistical investigation component
was a natural extension of her previous mode of teaching the topic. Her written
plans included the following as the goals of this investigation:
To apply research skills to learn about the characteristics of dinosaurs. To create a
graph of one of those characteristics in a cooperative group situation. Students will
then use that graph and the graphs of other groups to classify and draw conclusions
about additional characteristics of dinosaurs. (Teacher planning document)
In many ways, the Dinosaur Investigation resembles the type of data analysis
activity a teacher might find in a textbook. In fact, there is a similar published data
analysis activity on dinosaurs in which a table of North American dinosaur data is
provided for students to construct various types of graphs (Young, 1990).
Donna’s variation on the dinosaur theme had students collect data through
research at the library. Students selected and were responsible for a specific dinosaur
to research. The teacher maintained a list of dinosaurs with information known to be
available in the school library, and the students used the list in the selection process.
Donna prepared the students for library research by structuring their note taking
around the dinosaur’s height, weight, and diet; what era they lived in; and where
they lived. These variables were purposely selected by the teacher to teach specific
ideas and concepts about the dinosaurs. Specifically, she wanted students to learn
that (a) the sizes of dinosaurs changed between eras, (b) the size of dinosaurs is
related to diet, and (c) dinosaurs were widely dispersed around the world during all
of the eras studied. As Donna stated before one class period, “they need to learn
dinosaurs are not always these huge meat-eating creatures that sometimes they think
they are” (researcher field notes).
The information found during the library research became the quantitative data
for this investigation. Donna helped the class compile and organize the dinosaur
information from all students into a structured data table. Students transferred
information from notes to note cards, and attached the cards to a huge data table
rolled out on the floor of the classroom. Table 4 reproduces the original 19-foot-long
data table. The data table was used to construct multiple graphs to address the
dinosaur content learning goals of the investigation. Figures 3 and 4 are
reproductions of two graphs developed by the whole class.
In addition to creating graphs, students mapped locations of different dinosaurs
on a world map, wrote dinosaur reports, and created dioramas as ways of
representing what they had learned. As a culminating event, all Dinosaur
Investigation products were displayed throughout the classroom. During a visit by
fourth graders, the third-grade students were expected to explain their work.
344
WILLIAM T. MICKELSON AND RUTH M. HEATON
Statistical Reasoning during the Dinosaur Investigation
Up front, Donna defined specific concepts about dinosaurs that she wanted
students to learn as a result of the investigation. She took a more structured approach
to the statistical aspects of the study than she had used in the previous investigations.
Here, Donna identified the variables of the study while students collected and
organized the data in a structured data table (Table 3). The data table was correctly
constructed with dinosaurs as cases and the characteristics of the dinosaurs (e.g.,
heights, weights, eras, diet) as variables. Finally, specific learning outcomes about
dinosaurs were connected to statistical analyses through the interpretation of
multiple graphs.
The content learning goals Donna had for students necessitated an analysis that
examined relationships between variables. In this situation and context, the analyses
were graphical, with each graph representing the data from two or more variables.
Donna used the graphs in Figures 2 and 3 to convey answers to the investigation
questions on the relationships between the dinosaur size and era in which they lived,
and the dinosaur size and diet, respectively. Figure 3 has the named dinosaurs listed
on the x-axis with weight in tons on the y-axis. The dinosaur names listed on the xaxis are sorted by the categories of the Era variable. The Eras are color coded
(hatching) so that data points between adjacent dinosaurs within an Era are
connected via a hatched line. In Figure 4, the graph is identical to that of Figure 3
except for a switch of the categorical comparative variable from Era Lived to Diet.
The order in which the dinosaur names are listed on the x-axis remains unchanged
from the graph of Figure 3, no longer sorting dinosaurs relative to the categorical
variable of diet.
As in the previous two investigations, initial examination of these two graphs
raises potential questions about Donna’s knowledge of data, distribution, and
graphing. The analysis of her reasoning, evidenced by products of this investigation,
needs to be tempered by a more in-depth consideration of the context of the case.
This investigation marks the first and only investigation using continuous variables
that Donna conducts with children. Donna made the pedagogical decision to give
students ownership of individual dinosaurs, similar to the data collection methods
she employed in the other investigations. When a colleague suggested the use of line
graphs to represent the data, Donna saw a means for students to see their individual
contribution to the overall graph and proceeded to implement the colleague’s
suggestion.
The graph in Figure 3 does address variability in the data, as can be easily seen
in the jaggedness of the line, or the lack thereof. Since the individual dinosaurs were
sorted by Era, the comparison of groups is possible through the line graph.
Furthermore, line graphs are precursors to bar graphs in the development of
statistical reasoning about data (Bright & Friel, 1998) and are not inappropriate for
this situation and context. Unfortunately, Donna loses sight of the need to sort the
data relative to the categories of the comparison variable in Figure 4. This causes
confusion for the class and ends the investigation.
PRIMARY TEACHERS' STATISTICAL REASONING
345
Table 4. Class data table for Dinosaur Investigation
Prehistoric
Animal
Pteranodon
When?
Where?
Every continent
except
Antarctica
Plesiosaurs
Jurassic and Europe and
Ichthyolosaur- Cretaceous North America
us
Archaeopteryx Jurassic
Europe
Apatosaurus
Triassic
Jurassic
Cretaceous
Triceratops
Cretaceous
Tyrannosaurus
(T-Rex)
Pachgycephal
osaurus
Parasaurolophas
Late
Cretaceous
Cretaceous
Cretaceous
Corythosaurus Cretaceous
Glyptodon
Pleistocene
Smilodon
Pleistocene
Eahippus
Eocene
Hyracotheriun
Mammoths
Pleistocene
Diet? Researcher
Meat
Drew
Ichthyolosaurus—30 feet;
weight 200 pounds
Meat
Noah B
1.5 feet wingspan; 1 foot length Meat
(beak to tail); 11–18 ounces
30–40 tons; 70 feet
Plant
Mitchell
North American
and Europe
North America 3 pounds; 30 inches; 7 pounds
Compsognat- Jurassic
nus
Brachiosaurus Jurassic and Colorado,
Cretaceous Algeria,
Tanzania,
Europe, Africa,
North America
Stegosaurus
Late
North America
Jurassic
and Europe
Allosaurus
Jurassic
North America,
Africa, and
Australia
Dienonychus Cretaceous North America
Ankylosaurus
Size?
(Height, Length, Weight)
33 pounds, 6 feet, 30 foot
wingspan
North and South
America,
Europe, and
Asia
Western North
America
North America,
Asia, China
North America
South America,
North America,
Asia, Europe
North America
South America,
North America
North America,
South America
Asia
Asia
Meat
Amanda
Trey
75–80 feet; 40 feet in height; 66 Plant
tons
Allison
5 feet long; 11 feet tall; 2 tons
Plant
Steffie
Height: 35; Weight: 2–3 tons;
Length: 39 ft long
Meat
Keaton
10 feet; Weight: 175 lbs.
Meat
Logan
17 feet long, 6 feet wide, 4 foot
high, and weighed about 5 tons
Plant
Dylan
30 feet long, 3 feet length; all
30–35 feet; weight: 6 tons
Weight: 7 tons; Length: 18 1/2
feet; Height: 18 feet
Weight: 950 lbs; Length: 15
feet; Height: 8 feet
33 feet; skull 10 inch thick; 3
tons
plant
Joshua
Meat
Taylor
Plant
Remi
Plant
Shasta
33 feet long; 4–5 tons
Plant
Kody
Long: 10 feet; Weight: 2 tons;
Tall: 5 feet
Height: 4 at shoulder; Length:
10 feet; Weight: 500–600 lbs
2 feet 60 (cm); 15–20 lbs; 89
inches
11–14 feet; 5 tons
Plant
Marshall
Meat
Stephanie
Plant
Hannah
Plant
Heather
346
WILLIAM T. MICKELSON AND RUTH M. HEATON
Figure 3. Artifact of Dinosaur Investigation comparing size of dinosaurs across eras.
PRIMARY TEACHERS' STATISTICAL REASONING
347
Figure 4. Artifact of Dinosaur Investigation comparing size of dinosaurs by diet.
How the Dinosaur Investigation concluded is explainable when considering
some potentially contributing factors. First, basic manipulation of data, like sorting,
is often overlooked in statistics education, especially when analyses are conducted
by hand. Donna had no prior experience with sorting a data table, and we have no
348
WILLIAM T. MICKELSON AND RUTH M. HEATON
evidence that the task occurred to her. Second, as in the other investigations, Donna
felt pressed for time and found herself using “cracks in the day” to try to make even
more progress. She often used the students’ snack time to discuss the graphs of the
investigation. Third, Donna, being a highly organized and effective teacher, wanted
to facilitate the investigation as expeditiously as possible. In her preparations, she
constructed templates for graphing dinosaur data prior to class time, not recognizing
the need for sorting data. The class completed both graphs in the same session.
Without computers with graphing software, it was not possible to quickly
reconfigure the graphing task. Donna found herself in a situation where she expected
a successful resolution to the investigation; but instead, she was unable to help
students draw a reasonable conclusion from Figure 4. She became unsure what to do
next, given time constraints.
Given the outcome of the Dinosaur Investigation, it is arguable that Donna’s
reasoning about data and distribution is very low level, coupled with the added
problem of not being able to distinguish between different types of graphs and know
when a certain type of graph is more applicable. Donna did not try to reconfigure or
redo the dinosaur graphs herself “on the fly” with students. She was unwilling to
risk creating more graphs that could be equally confusing or unproductive, or to use
a lot more class time for this project. She also did not pose the problem she saw with
the graphs to the students. She could have made it a class task to take the data table
and create various types of graphs for this data. Again, time was a contributing
factor in the decision. Instead, she ended the investigation; but for the first and only
time during the entire sequence of investigations, she directed specific questions to
the authors about the graphical techniques she used, her implementation, and her
statistical content knowledge. What is particularly interesting about the Dinosaur
Investigation, however, was that in the self-contained context of the Gummy Bears
in Space (Scheaffer et al., 1996) activity during the summer workshop, Donna
successfully created complex graphs that combined categorical and continuous
variables. When reminded of this and the similarity of the graphing tasks, Donna
cringed and exclaimed, “this is why I need a statistician in the closet” (Interview,
12/1/00).
DISCUSSION
Donna’s statistical reasoning about data and distribution was examined in the
context of how she applied this knowledge in the action of conducting applied
statistical investigations to help her third-grade students learn about other topics in
the curriculum. What is interesting and perplexing is that Donna exhibits strong
statistical reasoning skills in one contextual setting, but that same knowledge or skill
does not necessarily transfer to all of her teaching work with children. For example,
she creates important variables connected to the purpose of the investigation for the
Animal Habitat investigation to evaluate the performance of the zoo, but does not do
this well in the Steven Kellogg Author Study. Similarly, she develops a wellstructured data table in the Dinosaur Investigation, but fails to do this in the Author
PRIMARY TEACHERS' STATISTICAL REASONING
349
Study and Animal Habitat investigations. In another example, Donna constructs
complex multivariable graphs during the summer workshop, but fails to adequately
graph the data of the Dinosaur Investigation. Finally, she teaches the concept of
distribution and its importance in making predictions during some classroom
investigations like Getting to Know You, and Cold Remedies (see Table 1), but fails
to include this type of discussion in the Author Study and offers only rudimentary
coverage in the Animal Habitat investigation. Stating this in another way, Donna
exhibits both exemplary and naïve, or basic, statistical reasoning about data and
distribution, depending on the context of the investigation. The pattern emerging in
her reasoning is that the more open-ended and unstructured the investigation, the
more Donna relies on the basics of statistical reasoning about data, namely, tallies
and counts. Conversely, the more the investigation resembles the activities and
context of her prior teaching practice, the more comfortable she becomes in teaching
sophisticated concepts like distribution.
Donna was selected to represent a best-case scenario to illustrate how
competent, experienced teachers can incorporate the process of statistical
investigation into their teaching practice. We had every expectation that her
statistical knowledge and graphing ability would be perfectly suited to the task
asked of her. When faced with the challenge of implementing open-ended statistical
investigations into content ideas of her curricula, Donna seemingly took the
following approach. First, she connected the curriculum topic with the investigation
to make “space,” covered standards, and ensured a positive learning experience.
Second, she mapped what she saw to be a similar type of data collection and
graphing activity from her prior teaching experience onto the investigation problem.
This pedagogical strategy was more for the ease of data collection, efficiency, and
student ownership, than carefully considering the nuances of a specific
investigation, looking ahead to data analysis and interpretation, and connecting this
back to the purpose of the investigation through the design (CBMS, 2001). Donna’s
extensive prior experience with statistics in the context of textbook problems and
activities loosely connected to her curriculum appears to have influenced her ability
to apply the process of statistical investigation in a context intended to teach ideas
central to the curriculum. It is the juxtaposition of her background and experience
with the statistical knowledge needed to implement purposeful statistical
investigation connected to curriculum topics that gives rise to a number of
implications regarding the statistics education of teachers.
IMPLICATIONS
To support planning and implementation of investigations connected to K–6
curriculum topics, teacher learning opportunities need to capitalize on occasions
inside the statistical investigation process. Some basic ideas and concepts teachers
need to learn are identified in this study. Examples include (a) how to define and
create variables when none are inherent or obvious to an investigation, (b) how to do
basic data manipulation like sorting, (c) how to gain the perspective to check and
350
WILLIAM T. MICKELSON AND RUTH M. HEATON
determine whether results of the analysis address the intended purpose of the
investigation, and (d) how to discern when and what types of graphs to use in
different situations. Others are enumerated in the section on data analysis and
statistics in the CBMS (2001) document. As this study illustrates, the context in
which the statistical concepts like data and distribution arise and are applied matters.
Teachers need opportunities to construct understanding and recognize use of
statistical concepts like data and distribution as they appear holistically in the
context of conducting purposeful applied statistical investigation with children.
This suggestion raises a question to consider in the data-driven, activity-based
trend and recommendations for teaching statistics (Cobb, 1993). The findings of this
study lead one to ask whether that formula for learning statistics might be valuable
for some teachers but detrimental to others wishing to use statistical investigation
for teaching other content. Using statistical investigation as a tool for this purpose
requires teachers to be able to reason in ways that require recognizing when and
how context matters across all tasks of an investigation and making preplanned and
spur-of-the-moment teaching decisions accordingly. People who apply statistical
reasoning in real-world problems must be able to frame the problem and use their
statistical knowledge in the framed context to solve it. Learning statistics through
predeveloped or canned activities does not necessarily require the teacher to
recognize the structure of the problem and to know how or when statistical
knowledge and reasoning comes into play when student learning about a curriculum
topic hinges on the outcome of the statistical investigation. As Lehrer and Schauble
(2000) state, “When students work with data sets handed to them, they may lose the
sense that data result from a constructive process” (p. 2). It is this constructive
process that teachers must appreciate and understand themselves, in deep and
sophisticated ways, in order to make decisions that guide and help children
appreciate and understand the same.
Another implication is that teachers need to feel that they can and will learn
more about statistics through the act of teaching statistical processes and content,
and that it is acceptable to be simultaneously a learner and a teacher. Playing this
dual role of teacher and learner is not without risk (Heaton, 2000) and needs to be
supported and encouraged by statistics educators in their work with teachers. It is
impossible to learn all the statistical content one needs to learn prior to teaching.
Taking on the role of learner while teaching requires both confidence and a
willingness to cope with uncertainty. One way for teachers to learn this is to see this
disposition toward teaching, as well as learning, openly modeled by the statistics
educators with whom they work.
Finally, a study such as this one could become an important tool in teacher
education in the area of statistics, complementing the use of images of real
classrooms and interactions and products of student work as a means of constructing
knowledge for teaching in other areas of teacher education (Lampert & Ball, 2001;
Merseth, 1996). Researchers have used studies like this to represent and help others
understand the complexity of teaching and teacher knowledge while constructing
their own knowledge for teaching. They offer a blend of statistical knowledge and
practice such that teachers can see not only examples of statistical knowledge
informing pedagogical decision making but also how particular pedagogical
PRIMARY TEACHERS' STATISTICAL REASONING
351
decisions can both positively and negatively affect data collection, summarization,
and interpretation.
The development of more vignettes focused on statistical concepts and the
process of statistical investigation would enable teachers to see statistical concepts
as they appear in context and the ways statistical knowledge is used, or could be
used, by teachers in investigative work with children. Furthermore, the development
of a collection of examples of practice—situated in real classrooms around specific
statistical concepts arising or deliberately taught while doing statistical
investigations—offers a direction for creating usable knowledge for teaching from
research. Additionally, using such examples from practice would continue to
illustrate to teachers and teacher educators a key finding from this research: that in
learning statistical knowledge for teaching, the context matters; and teachers need to
learn where, when, why, and how it matters.
REFERENCES
Ball, D. L. (2002). Mathematical proficiency for all students: Toward a strategic research and
development program in mathematics education. RAND Report. Washington, DC: Office of
Education Research and Improvement, U.S. Department of Education.
Ball, D. L., Lubienski, S. T., & Mewborn, D. S. (2001). Mathematics. In V. Richardson (Ed.), Handbook
of research on teaching, pp. 433–456.
Bright, G. W., & Friel, S. N. (1998). Graphical Representations: Helping students interpret data. In S. P.
Lajoie (Ed.), Reflections on statistics: Learning, teaching, and assessment in grades K–12. Mahwah,
NJ: Erlbaum.
Cobb, G. W. (1993). Reconsidering statistics education: A national science foundation conference,
Journal of Statistics Education [online], 1(1), available e-mail: archive@jse.stat.ncsu.edu Message:
Send jse/v1n1/cobb.
Cobb, P. (1999). Individual and collective mathematical development: The case of statistical data
analysis, Mathematical Thinking and Learning, 1(1), 5–43.
Conference Board of Mathematical Sciences (March 2001). Mathematical education of teachers project,
draft report. Washington, DC: Author.
Confrey, J., & Makar, K. (2001, August). Secondary teachers’ inquiry into data. Proceedings of the
Second International Research Forum on Statistical Reasoning, Thinking, and Literacy, University of
New England, Armidale, NSW, Australia.
Creswell, J. W. (1998). Qualitative inquiry and research design: Choosing among five traditions.
Thousand Oaks, CA: Sage Publications.
Fennema, E., & Nelson, B. S. (1997). Mathematics teachers in transition. Hillsdale, NJ: Erlbaum.
Friel, S. N., & Bright, G. W. (1998). Teach-Stat: A model for professional development in data analysis
and statistics for teachers K–6. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching, and
assessment in grades K–12. Mahwah, NJ: Erlbaum.
Friel, S. N., & Bright, G. W. (1997). A framework for assessing knowledge and learning in statistics (K–
8). In J. Gal and J. B. Garfield (Ed.), The assessment challenge in statistics education (pp. 55–63).
Amsterdam: IOS Press.
Graham, A. (1987). Statistical investigations in the secondary school. Cambridge, UK: Cambridge
University Press.
Heaton, R. (2000). Teaching mathematics to the new standards: Relearning the dance. New York:
Teachers College Press.
Heaton, R., & Mickelson, W. (2002). The learning and teaching of statistical investigation in teaching and
teacher education. Journal of Mathematics Teacher Education, 5, 35–59.
352
WILLIAM T. MICKELSON AND RUTH M. HEATON
Jaworski, B. (1998). The centrality of the researcher: Rigor in a constructivist inquiry into mathematics
teaching. In A. Teppo (Ed.), Qualitative research methods in mathematics education (pp. 112–127).
Reston, VA: National Council of Teachers of Mathematics (NCTM).
Lajoie, S. P. (1998). Reflections on a statistics agenda for K–12. In S.P. Lajoie (Ed.), Reflections on
statistics: Learning, teaching, and assessment in grades K–12. Mahwah, NJ: Erlbaum.
Lajoie, S. P., & Romberg, T. A. (1998). Identifying an agenda for statistics instruction and assessment in
K–12. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching, and assessment in grades
K–12. Mahwah, NJ: Erlbaum.
Lampert, M., & Ball, D. L. (2001). Teaching, multimedia, and mathematics: Investigations of real
practice. New York: Teachers College Press.
Lehrer, R., & Schauble, L. (2002). Investigating real data in the classroom: Expanding children’s
understanding of math and science. New York: Teachers College Press.
Lehrer, R., & Schauble, L. (2000). Inventing data structures for representational purposes: Elementary
grade students’ classification models, Mathematical Thinking and Learning, 2(1 & 2), 51–74.
Merseth, K. (1996). Cases and case methods in education. In J. Sikula (Ed.), Handbook of research on
teacher education (2nd ed., pp. 722–744). New York: Macmillan.
National Council of Teachers of Mathematics (2000). Principles & standards for school mathematics.
Reston, VA: Author.
National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school
mathematics. Reston, VA: Author.
Newmann, F. M., & Wehlage, G. G. (1997). Successful school restructuring: A report to the public and
educators. Madison: Center on Organization and Restructuring of Schools, University of Wisconsin.
Russell, S. J., & Friel, S. N. (1989). Collecting and analyzing real data in the elementary school
classroom. In P. R. Trafton & A. P. Shulte (Eds.), New directions for elementary school mathematics
(pp. 134–148). Reston, VA: National Council of Teachers of Mathematics.
Schaeffer, R. L. (1988). Statistics in the schools: The past, present and future of the quantitative literacy
project. Proceedings of the American Statistical Association form the Section on Statistical Education
(pp. 71–78).
Scheaffer, R. L., Gnanadesikan, M., Watkins, A., & Witmer, J. A. (1996). Activity-based statistics. New
York: Springer.
Scheaffer, R. L., Watkins, A. E., & Landwehr, J. M. (1998). What every high-school graduate should
know about statistics. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching, and
assessment in grades K–12. Mahwah, NJ: Erlbaum.
Schifter, D. (1996). What’s happening in math class? (Vols. 1–2). New York: Teachers College Press.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D. A.
Grouws (Ed.), 1992, Handbook of Research on Mathematics Teaching and Learning. New York:
MacMillan, pp. 465–494.
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher,
15(2), 4–14.
Utts, J. M. (1999). Seeing Through Statistics (2nd Ed). Pacific Grove, CA: Duxbury Press.
Watson, J. M. (2001). Profiling teachers’ competence and confidence to teach particular mathematics
topics: The case of chance and data. Journal of Mathematics Teacher Education, 4(4), 305–337.
Watson, J. M. (2000). Preservice mathematics teachers’ understanding of sampling: Intuition or
mathematics. Mathematics Teacher Education Research Journal, 12(2), 147–169.
Young, S. L. (1990). North American dinosaur data sheet; Graph the dinosaurs’ lengths; Meat-eaters and
Plant-eaters. Arithmetic Teacher, 38(1), 23–33.
Chapter 15
SECONDARY TEACHERS’ STATISTICAL
REASONING IN COMPARING TWO GROUPS
1
Katie Makar1 and Jere Confrey2
University of Texas at Austin, USA1, and Washington University in St. Louis, USA2
OVERVIEW
The importance of distributions in understanding statistics has been well articulated
in this book by other researchers (for example, Bakker & Gravemeijer, Chapter 7;
Ben-Zvi, Chapter 6). The task of comparing two distributions provides further
insight into this area of research, in particular that of variation, as well as to motivate
other aspects of statistical reasoning. The research study described here was
conducted at the end of a 6-month professional development sequence designed to
assist secondary teachers in making sense of their students’ results on a statemandated academic test. In the United States, schools are currently under
tremendous pressure to increase student test scores on state-developed academic
tests.
This paper focuses on the statistical reasoning of four secondary teachers during
interviews conducted at the end of the professional development sequence. The
teachers conducted investigations using the software Fathom™ in addressing the
research question: “How do you decide whether two groups are different?”
Qualitative analysis examines the responses during these interviews, in which the
teachers were asked to describe the relative performance of two groups of students
in a school on their statewide mathematics test. Pre- and posttest quantitative
analysis of statistical content knowledge provides triangulation (Stake, 1994), giving
further insight into the teachers’ understanding.
1
This research was funded by the National Science Foundation (NSF) under ESR-9816023.
The opinions expressed in this chapter do not necessarily reflect the views of NSF.
353
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 353–373.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
354
KATIE MAKAR AND JERE CONFREY
WHY STUDY TEACHERS’ REASONING
ABOUT COMPARING TWO GROUPS?
Statistics and data analysis are becoming increasingly important in our society
for a literate citizenry. As such, many schools have begun to incorporate statistics
and data analysis into their curriculum, beginning as early as Kindergarten (TERC,
1998). Although many schools are increasing their emphasis on statistics, very few
are taking sufficient steps to help teachers master the statistics they are expected to
teach. Professional development typically provided to teachers by their schools
gives mathematics teachers little opportunity to improve their statistical content
knowledge beyond evaluation of central tendency and simple interpretation of
graphs and tables, while university statistics courses are rarely aimed at content
teachers feel is relevant. Furthermore, U.S. teachers have little experience with data
analysis and inferential statistics, yet in a time when teachers are under increasing
pressure to improve student scores on state-mandated tests, teachers are required to
make instructional decisions based on large quantities of data about their students’
performance. Given that teachers are both the target and the vehicle of reform
(Cohen & Ball, 1990), it is vital that we consider teachers’ facility in statistical
reasoning as well as possible vehicles for helping teachers improve their conceptual
understanding of the statistics they are expected to teach. Enhanced understanding of
teachers’ statistical reasoning will help professional development leaders better
design conceptual trajectories for advancing teacher reasoning in statistics, which
should ultimately improve student understanding in probability and statistics.
Investigations involving comparing groups provide a motivational vehicle to
learn statistics (see, for example, Konold & Pollatsek, 2002): They are steeped in
context, necessitate a focus on both central tendency and distribution (for various
aspects of distributions, see Chapter 7 this volume), and provide momentum for the
conceptual development of hypothesis testing. Furthermore, tasks involving group
comparisons are rich enough to be accessible to a broad array of learners at varying
ages and levels of statistical understanding. Comparing distributions can be an
interesting arena for researchers to gain insight into teachers’ statistical reasoning,
and in particular gave us an opportunity to understand teachers’ reasoning about
variation in a more sophisticated context.
Several curriculum projects make use of group comparisons as an avenue to
teach statistical reasoning. At the elementary level, comparing two groups can be
used to introduce the concepts of data and graphing, providing students with
important early experiences in viewing and reasoning with distributions. For
example, a first-grade curriculum (TERC, 1998) introduces primary students to
distributions by having them compare and qualitatively describe the distribution of
their classmates’ ages to that of their classmates’ siblings. Middle school students
are able to build on earlier experiences with data and start to focus on descriptions of
distributions: measures of center and spread, shapes of distributions, as well as gaps
and outliers. For example, a sixth-grade curriculum puts these skills into a
meaningful context for students by having students compare “typical” heights of
males and females in their class, examining measures of center, describing the
SECONDARY TEACHERS’ STATISTICAL REASONING
355
shapes of two distributions, and looking at gaps and outliers (Lappan, Fey,
Fitzgerald, Friel, & Phillips, 1998). For older students, more open-ended designs
and conventional descriptions of statistical variation can be introduced, which will
help students build a foundation for inferential statistics or to inform debate of issues
in light of available data.
At a wide variety of grade levels and settings, comparing groups has the
potential for giving students authentic contexts to use data to answer meaningful
questions, thus motivating the power of data in decision making. However, in order
for teachers to provide these kinds of tasks for their students, they need to develop
their own statistical understanding. Heaton and Mickelson (Chapter 14, this volume)
described the experience of an elementary teacher’s struggle to develop her own
statistical reasoning as she worked to merge statistical investigations into the
existing school curriculum. This chapter will examine statistical reasoning in
secondary teachers as they build their statistical content knowledge through
investigations of student assessment data, in particular the role of variation in
considering what it means to compare two groups. (For additional discussions of the
teachers’ reasoning with data, see Confrey & Makar, 2002; Makar & Confrey,
2002.)
PREVIOUS RESEARCH ON COMPARING TWO GROUPS
Within the world of statistics, much concern is placed on making comparisons,
either direct or implied. Whether the difference is between brands of peanut butter,
or housing prices compared to last year, comparisons form the very fabric of
research and of principled arguments (Abelson, 1995):
The idea of comparison is crucial. To make a point that is at all meaningful, statistical
presentations must refer to differences between observation and expectation, or
differences among observations. Observed differences lead to why questions, which
in turn trigger a search for explanatory factors … When we expect a difference and
don’t find any, we may ask, “Why is there not a difference?” (p. 3)
Lehrer and Schauble (2000), in their work with children’s graphical
construction, indicate that young students “are often disconcerted when they find a
discrepancy between the expected value of a measure and its observed value” (p.
104). Watson and Moritz (1999) argue that comparisons of data sets provide a
meaningful backdrop for students to gain a deeper understanding of the arithmetic
mean as well as strong intuitive approaches to compare groups through balancing
and visual strategies, “hopefully avoiding the tendency to ‘apply a formula’ without
first obtaining an intuitive feeling for the data sets involved” (p. 166).
The task of comparing groups appears in the literature as an impetus for students
to begin to consider data as a distribution instead of focusing on individuals, in
addition to motivate students to take into account measures of variation as well as
center (Konold & Higgins, 2002). Lehrer and Schauble (2000) found that as older
356
KATIE MAKAR AND JERE CONFREY
students solved problems in which they compared two distributions, they began to
look at both centrality and dispersion. In their study, comparing groups served as an
impetus for students to gain an appreciation for measures beyond center. For
example, they report on a group of fifth graders who, when experimenting with
different diets for hornworms, found that the hornworms in the two treatment groups
showed differences not only in their typical lengths but also in the variability of
their lengths. This caused the students to speculate and discuss reasons why the
lengths in one group varied more, showing that “considerations of variability
inspired the generation of explanations that linked observed patterns to mechanisms
that might account for them” (p. 129).
Examining the context of a problem is critical for understanding group
comparisons. Confrey & Makar (2002) discuss the role of context in statistical
learning while examining the process of teachers’ inquiry into data. In one activity
they describe, teachers examined several pairs of graphs void of context and
reasoned about comparisons between graphs in each pair at a very superficial level
in a discussion that lasted only about 5 minutes. However, when the same graphs
were examined again in light of a context relevant to the teachers (quiz scores), a
much more in-depth analysis took place in a discussion lasting 40 minutes. This
discussion was the first time in their study that the teachers articulated variation in a
distribution as being useful. When the teachers could compare distributions in a
personally meaningful context, they began to gain a more robust understanding of
distribution. Similarly, Cobb (1999) found that by comparing the distributions in the
context of judging the relative lifespan of two types of batteries, students were
compelled to consider what it meant for one battery to be preferred over another—
does one consider overall performance, or consistency? Here, students negotiated a
purposeful reason to consider variation in the context of what constitutes a “better”
battery.
Comparing two groups also becomes a powerful tool in light of its use toward a
consideration of statistical inference. Watson & Moritz (1999) argue specifically
that comparing two groups provides the groundwork “to the more sophisticated
comparing of data sets which takes place when t-tests and ANOVAs are introduced
later” (p. 166). Without first building an intuitive foundation, inferential reasoning
can become recipe-like, encouraging black-and-white deterministic rather than
probabilistic reasoning. “The accept-reject dichotomy has a seductive appeal in the
context of making categorical statements” (p. 38, Abelson, 1995). Although formal
methods of inference are not usually a topic in school-level statistics content, an
ability to look “beyond the data” (Friel, Curcio, & Bright, 2001) is a desired skill.
Basic conceptual development of statistical inference can lead to assistance in
understanding one of the most difficult, but foundational concepts in university-level
statistics: sampling distributions (delMas, Garfield, & Chance, 1999).
SECONDARY TEACHERS’ STATISTICAL REASONING
357
RESEARCH DESIGN AND METHODOLOGY
The research described in this chapter was part of an NSF-funded research
project developed and carried out by a research team at the Systemic Research
Collaborative for Education in Math, Science, and Technology at the University of
Texas at Austin. Although this chapter focuses on the results of interviews taken at
the end of the study, the experience of the participants in the research project is key
to understanding their background knowledge and experience in statistical
reasoning. It should be noted that research on these teachers’ statistical reasoning
was not the purpose of the workshop, which was to examine the effects of the
professional development sequence within a larger systemic reform project
(Confrey, in preparation). The authors saw an opportunity, after the workshops were
planned, to examine the teachers’ statistical reasoning through a set of clinical
interviews. This chapter is the result.
The 6-month professional development research project took place in two
phases: 18 contact hours of full-day and after-school meetings, followed by a 2week summer institute. The project was conceived as a mathematical parallel of the
National Writing Project, where teachers focus on their own writing rather than how
to teach writing. A mission of the National Writing Project (2002), and a belief that
was fundamental to our study, is that if teachers are given the opportunity to focus
on their own learning of the content that they teach—to do writing, or mathematics,
in an authentic context—they will better understand the learning process and hence
teach with greater sensitivity to students’ conceptual development (Lieberman &
Wood, 2003). Our professional development sequence was designed under the
assumption that if mathematics teachers are immersed in content beyond the level
that they teach, and developed through their own investigations as statisticians
within a context that they find compelling and useful, then they will teach statistics
more authentically and their increased content knowledge will translate into
improved practice.
During the professional development sequence, teachers learned a core of
statistical content: descriptive statistics and graphing, correlation and regression,
sampling distributions, the Central Limit Theorem, confidence intervals, and basic
concepts of statistical inference. These concepts were not developed formally, as
they would be in a university course; rather, teachers were given extensive
experience with sampling distributions through simulations in order to (a) help them
understand concepts of sampling variation that we thought was critical to their
working with data and (b) give them access to powerful statistical ideas. Statistical
concepts were introduced only as they were needed to make sense of the data; many
of the teachers already had at least a working knowledge of descriptive statistics and
graphing, as indicated by their statistics pretest.
During the workshops and summer institute, teachers conducted increasingly
independent investigations focused on the analysis of their students’ high-stakes
state assessment data. For the teachers, this was a compelling context in which to
learn statistics. In Texas, there is much emphasis on the Texas Assessment of
Academic Skills (TAAS, www.tea.state.tx.us), the high-stakes state assessment
358
KATIE MAKAR AND JERE CONFREY
where students and schools are held accountable for their performance on the battery
of tests. Teachers felt they would be empowered if they were able to interpret TAAS
data instead of having to rely on experts to tell them what the data meant and what
actions the school needed to take in order to raise test scores. Because many of the
“lessons” we wanted them to gain from working with data involved sampling
variation, we felt it critical to give them enough experience to develop an intuition
about this type of variation.
Many of the investigations were supported by the use of the statistical learningbased software, Fathom (Finzer, 2000), to examine data directly as well as to create
simulations to test conjectures. The software allowed teachers to fluidly and
informally investigate relationships in the data because of the ease with which
Fathom creates graphs through a drag-and-drop process. Most statistical software
tends to be like a “black box” with a purpose that supports a data-in, results-out
mind-set that can work to encourage misconceptions in early learners who expect to
find definitive answers in the data. Fathom insists that users build simulations in the
same way that one would construct a sampling distribution: by creating a
randomization process, defining and collecting measures from each randomization,
and then iteratively collecting these measures. The levels of abstraction that make
sampling distributions and hypothesis testing so difficult for university students
(delMas et al., 1999) are not avoided in Fathom, but made transparent through
creating visual structures that parallel these processes, allowing users to better
visualize the concepts underlying the abstract nature of a sampling distribution.
Fathom was also a powerful tool for analysis and supported the use of authentic
data, even reading data directly from websites, thus empowering teachers to greater
access to the many data sets that are available on the Internet.
During the workshops with the teachers, we often used sampling distributions to
illustrate and investigate statistical concepts—properties of the normal distribution,
the Central Limit Theorem, the effect of sample size on sampling variability, the
null hypothesis, p-values, and hypothesis testing. In addition, these statistical
concepts were applied during investigations of relationships in the data. It is
important to note that we did not focus explicitly on group comparisons during the
professional development workshops; we did not formally develop a list of
procedures for comparing groups, nor had the teachers seen a task similar to the one
we asked them to perform for the interview. During the workshops, the teachers did
engage in two structured activities in which comparing two groups was central. The
first activity took place in the early stages of the professional development program,
when teachers were first learning the software and investigating descriptive
statistics. In this activity (Erickson, 2001, p. 206), a sample of student scores on the
Scholastic Aptitude Test (SAT; a national test many U.S. students are required to
take as part of their college application) and the grade point averages of males and
females were examined and informally compared. The second activity, Orbital
Express (Erickson, 2001, p. 276), took place in the final week of the summer
institute when investigating the concept of a null hypothesis and working with more
advanced features of the software. In this activity, teachers dropped two types of
wadded paper and attempted to hit a target 10 feet below. The distance each wad fell
from the target was then entered into one column in Fathom, and the type of paper
SECONDARY TEACHERS’ STATISTICAL REASONING
359
thrown was entered in a second column. Using the scramble attribute feature of the
software, the values in the second column (type of paper) were randomized,
simulating a null hypothesis, and the difference in the median of each group was
calculated. This process was repeated 100 times to create a “null hypothesis”
distribution, showing the amount of variation in the differences between the medians
of the groups that might be expected just by chance. The difference found in the
original sample was then examined in light of this “null hypothesis” distribution.
SUBJECTS AND DATA COLLECTION
This chapter focuses primarily on four secondary mathematics teachers from
Texas who took part in the professional development program just described. Two
of these participants joined the project later and were part of an abbreviated repeat of
the first phase of the professional development sequence. One of the four subjects
was a preservice teacher while the other three were experienced, credentialed
secondary mathematics teachers who taught 13- to 16-year-old students. Two of the
teachers had obtained a university degree in mathematics, the preservice teacher was
working on her mathematics degree, and the remaining teacher had a degree in the
social sciences. The two men and two women consisted of one Hispanic and three
non-Hispanic whites. All but the preservice teacher had taken a traditional
introductory statistics course 5–15 years previously during their university
coursework. These were the only four teachers who took part in Phase II of the
project (the summer institute), due partly to a change in the administration at the
school and scheduling conflicts with teachers teaching summer school.
Data collected on the subjects included a pre-post test of statistical content
knowledge, which was analyzed using a t-test in a repeated measures design. In
addition, all of the sessions with the teachers were videotaped. Interviews were
conducted at the end of the study in which participants were asked to compare the
performances of two groups, and which will comprise the main source of data for
this chapter. The interviews were videotaped, and major portions were transcribed
and then analyzed using the qualitative methodology of grounded theory (Strauss &
Corbin, 1998). Under this methodology, the transcripts were first subjected to open
coding in the software NVivo (QSR, 1999) to capture the phenomenon observed in
the teachers’ own words and actions and to allow potential categories to emerge
from the data that would describe strategies, mind-set, and other insights into how
the teachers were comparing groups. Secondly, initial categories were organized into
hierarchical trees and subjected to axial coding to begin to tie common themes into
larger categories. Finally, the data were analyzed with selective coding to further
investigate various dimensions of the categories and better describe the phenomenon
observed.
360
KATIE MAKAR AND JERE CONFREY
Figure 1. A dot plot of student test scores with means plotted in Fathom created by one
teacher.
INTERVIEW TASK
In the interviews, which took place during the last two days of the summer
institute, subjects were given a raw data set of student TAAS scores from a
hypothetical low-performing high school and asked to use Fathom to compare the
performance of males and females in the school. Although all of the data used in the
example was not from a single school, it was in fact authentic student data created
from a compilation of scores drawn from several schools in Texas. Figure 1 shows a
graph similar to ones that each of the teachers initially created in Fathom from the
given data. The MTLI on the horizontal axis of this graph is the Mathematics “Texas
Learning Index” on TAAS; a MTLI score of 70 is considered passing. In the context
in which the state high-stakes test exists, it is not just the means that are relevant to
consider. Schools are held accountable for the proportion of students who pass the
TAAS test, so it is also important to consider the proportion passing for each group.
In a second task during the interview, subjects were asked to investigate the lowperforming status of the school, based on analysis of the performance of the statedefined ethnic subgroups within the school and to make a campus-based program
recommendation to the principal for the following year. The analysis in this chapter
focuses on the first interview task.
SECONDARY TEACHERS’ STATISTICAL REASONING
361
RESULTS
In analyzing these teachers’ reasoning about comparing two groups, we assumed
that the professional development sequence had an impact on their content
knowledge. Rather than examine teachers’ reasoning about comparing two groups
with teachers who had little experience with data or diverse backgrounds in
statistical content knowledge, we chose to examine teachers who had developed
their statistical understanding through rich experiences as investigators. We
recognize that the reasoning ability of this group, therefore, is not typical of
secondary teachers, but what could occur after a relatively short period of
professional focus on building conceptual understanding and experience with
powerful statistical ideas. The overarching purpose of the professional development
was to give them rich experiences as investigators with school data. We did discuss
many concepts in inferential statistics, but the majority of these more advanced
concepts (e.g., t-tests, confidence intervals, null hypothesis, p-values) were
experienced through simulations on a conceptual, not formal level.
To measure whether the content that was taught had an impact on teachers’
understanding and to assess the level of statistical content knowledge at the time of
the interviews, a pre-post test of content knowledge was given to teachers. The
result of the analysis is given in Table 1. The data summary shows significant
growth (α = 0.05) in their overall content knowledge as well as for two individual
areas (Sampling distributions and Inference), even though the number of teachers in
the study was small (n = 4).
Table 1. Results of pre-post test of statistical content knowledge using a t-test and repeated
measures design, n=4
Topic
Descriptive Statistics
Graphical Representation
Sampling Distribution
Inference and Hypothesis
Testing
Overall
Pretest
Mean
Percent
Correct
61%
75%
8%
6%
Posttest
Mean
Percent
Correct
79%
83%
75%
59%
Difference
18%
8%
67%
53%
t
2.0
0.5
4.9
18.0
p-value
0.14
0.63
< 0.01
< 0.01
35%
71%
36%
6.8
< 0.01
While quantitative methods could be used to measure content knowledge, it was
necessary to use qualitative methods to better understand teachers’ statistical
reasoning about comparing two groups. During the qualitative analysis, 20 initial
categories were organized into four final categories—conjectures, context, variation,
and conclusions—by collapsing and generalizing other categories. Finally, the
researchers developed a preliminary framework for examining statistical reasoning
(Makar & Confrey, 2002). This chapter focuses on elements that are specific to
362
KATIE MAKAR AND JERE CONFREY
teachers’ reasoning about comparing two distributions. Of special interest are
teachers’ conceptions of variation, which bring with them several issues that are
unique to the task of comparing groups. In examining these teachers’ descriptions of
the two distributions, it will be interesting to note how they choose to compare the
characteristics of the each distribution. For example, do they see these measures as
absolute, or do they recognize the possibility of inherent error? That is, do they view
any small differences in these measures quantitatively as absolute differences, or do
they indicate a tolerance for variation in these measures, so that if these students
were tested again, they might expect to see slightly different results?
EXAMPLES OF TEACHERS’ REASONING
ABOUT COMPARING DISTRIBUTIONS
In this section, we discuss the data from interviews with the four subjects from
the second phase of the study: Larry, Leesa, Natalie, and Toby. These four teachers
were the only four participating in Phase II of the study, the 2-week summer
institute.
The first transcript we examine is Larry’s, who has taught middle school math
for 6 years. He has an undergraduate major in mathematics and is certified to teach
mathematics at the middle and high school levels. Larry’s initial portrayal of his
comparison of the two distributions began with a visual evaluation of the similarity
of their dispersion, then a numerical description of the means and standard deviation
of each of the two distributions. He finished with a comparison of these measures:
Figure 2. Larry’s summary table in Fathom. The three rows correspond to the values of the
mean, count, and standard deviation for the females, males, and then total group.
Larry:
I’m just first dropping them, both of them in a graph (Figure 1), the math
scores of the males and females. Um, both of them seem to be fairly equally
distributed, maybe. I’m going to try and find the means of each one
SECONDARY TEACHERS’ STATISTICAL REASONING
KM:
Larry:
KM:
Larry:
363
(mumbles). I’ll just graph them, then. Hmm. So they’re fairly close … I’m
pulling down a summary table (Figure 2) so I can actually find a number for
each one. The, uh, so I actually get a real number. So it’s giving me the
count and mean of each one. Also, here I can find out, uh (mumbles), I can
find the standard deviation of each one to see how close they are from the
mean.
And how, how will that help you?
Well, if, even if I didn’t see the graph, I can look at the females are even a
little tighter, around a higher mean.
OK.
On both sides. As opposed to the men, also—that are a little more spread
around, around a lower average.
Larry later considered the difference of the means more directly, estimating the
difference from the figure:
Larry:
Even though they’re going to be very close, I, I think, I, I mean, there’s not
a great difference between the men and the women. But the women look
like they scored maybe one or two points higher.
Larry here acknowledged that the difference between the means was very close,
but did not interpret the difference as anything other than a 1- or 2-point difference.
At the end of the first part of the interview, Larry informally compared the extreme
values of the two distributions, as well as their means and proportion passing, to
summarize his analysis:
KM:
Larry:
Just describe for me, if you were going to compare those two groups, the
performances of those two groups. Describe the similarities and differences.
OK. The females have a larger range, because the lowest score and the
highest score are—the lowest score of the females is lower than the lowest
score of the males, and the highest score of the females is higher than the
highest score of the males. Uh, so the, the range is higher. Yet, still the, the
mean score is higher than the average score of each of—, the females is
higher than the average score of the males.
Larry’s comparisons consisted of independent descriptions of each distribution
along with direct comparisons of center and dispersion. While he considered the
variability of each distribution, he did not indicate a sense of the variation between
the measures of the two distributions—that is, he compared the means and
dispersions of the two distributions qualitatively or in absolute terms. He concluded
that the mean of the females was higher than that of the males by observing an
estimated 1- or 2-point difference in the means. While he asserted that these were
close, Larry indicated no particular inclination to investigate whether the difference
in the means was significant.
364
KATIE MAKAR AND JERE CONFREY
Figure 3. Leesa’s initial dot plot of Gender vs. MTLI (TAAS math scores).
Leesa has taught middle school mathematics for 7 years, has undergraduate
majors in the social sciences, and is the chair of her mathematics department. Her
initial description included comparisons of the shape, range, maximums, and means
of each distribution (Figure 3; note that Leesa’s data set, and those of the other two
teachers who follow, are slightly different than Larry’s):
Leesa:
KM:
Leesa:
KM:
Leesa:
OK, um, let’s see. This looks skewed to the left [pointing to the top
distribution]. Well, they both look skewed to the left. Uh, the range of the
males looks like it goes from about nine[ty]—well, they’re about the same.
There’s a bigger range in the female performance because of this, this one
student right here who has got a 32.
OK.
Um. A high score is a 92 on female and on male it’s a 91. Um, and then I
can also, I can also go and find the mean. And then, [pause] the edit formula
and plot the mean on both of those [Leesa selects the graph, chooses “edit
formula” from the context menu, and plots the means on the graph in
Fathom]. So for the females it looks like their line is about 72.6, no, 73
[Leesa moves the cursor close to the mean line for the females and reads the
location of the cursor in the corner of the Fathom screen]. And then for the
males, it looks like about 72.
OK.
So the average female score is just a little bit higher than, than the average
male.
SECONDARY TEACHERS’ STATISTICAL REASONING
365
Leesa seemed to view the measures she stated as descriptions of each
distribution separately, although she made some comparison in these measures,
indicating some tolerance for variation in her qualitative description of the range of
the distributions as “about the same.” She did not hold on to this view, however,
when she moved from qualitatively to quantitatively comparing the distributions.
For example, while she found that the mean for females was higher than for males,
she did not indicate whether she interpreted this 1-point difference as their centers
being about the same. In the interview, Leesa went on to compare the proportion of
each group that passed [63% of the females passed compared to 68% of the males].
She noted that alternative measures were giving her seemingly conflicting
information about the relative performance of the two groups, stating, “More boys
passed than girls when you look at percentages, but, and the mean score of the girls
is higher.”
When asked to sum up whether she saw any difference between the performance
of males and females at the school, Leesa considered using a sampling distribution
this time to provide evidence that the difference between the two groups was not
significant. However, her attempt to do so included a laundry list of methods we had
used in the summer institute to examine variability, including a reference to “what
we did this morning,” which was a procedure in Fathom called stacking that
probably would not have been useful in this situation:
KM:
Leesa:
KM:
Leesa:
KM:
Leesa:
KM:
Leesa:
KM:
Leesa:
KM:
Leesa:
KM:
Leesa:
So can you, you say whether one performed better than the other?
No.
What evidence could you give me to, that there wasn’t any difference, what
would you say?
Um, I can do that test hypothesis thing. Um, I could do one of those, um,
like what we did this morning, the sample and see if there was any—How
many students were there? 231?
Uh-huh.
I could do a smaller sample and just kind of test and see if, see what the
means look like each time … OK, then when you do standard deviation—is
that really going to help me here? Because, let’s plot it and see what it looks
like [Leesa plots a marker on her graph in Fathom, one standard deviation
above the means].
OK, why do you think that might give you something, or are you just going
to see—
Um. I just want to see if this, if this mean, if this mean [pointing to the
females in Figure 3]—
Uh-huh?
—falls within one standard deviation of the top mean [the males].
Do you think it will?
Yes. (pause) So it’s not like it’s a huge difference, I guess.
So what does checking where the standard deviation, what does that tell
you? What does that measure? Try and think out loud.
Um, OK. Standard deviation means that most of the scores are going to fall
within there [the interval within one standard deviation of the mean]. So, I
don’t really see how that—OK, I understand what we were doing yesterday
when we had the standard deviation and then, you know, when we had, uh,
366
KATIE MAKAR AND JERE CONFREY
KM:
Leesa:
when we looked to see if that would, if that was really weird. And if it fell
outside the standard deviations, when we looked at z-scores and they were
really high, if it fell way out here, then we know that was something, not
typical.
OK.
OK, but since this, these are so close together, and it falls within, you know,
that that’s pretty typical and, it might go either way.
Unlike Larry, Leesa indicated a tolerance for variation between the measures she
used to compare the two groups. Even though the means of the groups were
different, she acknowledged that the difference was not enough for her to decide
whether one group performed better than the other. She struggled, however, with
providing evidence that the difference was not meaningful. Her explanation
contained a hybrid of concepts related to the distribution of scores and that of a
sampling distribution, together with a list of procedures she might try.
Natalie, a preservice teacher and mathematics major with no previous statistical
coursework, immediately took a less deterministic stance in her comparison of the
performances of males and females on the TAAS test at the hypothetical school.
Natalie initially created a dot plot of the data (similar to the one in Figure 3), then
changed it to a histogram. She then created a summary table in Fathom to calculate
the means and standard deviations of the MTLI score for each gender:
Natalie: It looks like the mean for the females is a couple of points higher than the
mean for the males [pointing to the summary table], but whether or not
that’s significant, I don’t know yet … I don’t think they’re very different. It
just happens to come up a little bit higher, but the standard deviation is 13
points, so 2-point difference isn’t all that much … The, the range looks
about the same to me, I mean, there’s a few extra down there in the females,
but I don’t think that’s very significant. They look pretty similar … I don’t
think they’re, they’re very different.
Natalie immediately considered whether the difference she was seeing in the
means was significant and went on to conclude that the 2-point difference in the
means of the two groups was probably not significant, relative to the distribution of
scores. She compared the 2-point difference in means to the standard deviation
rather than to their standard error, since she did not consider the size of the group in
her interpretation of significance. It’s possible that she was considering not
statistical significance, but a more informal notion of a meaningful difference
relative to the distribution of scores.
The final interview was with Toby, an experienced high school teacher who has
been teaching for over 10 years. Toby’s initial comparison between the two groups
(creating a graph similar to Figure 3) was based on a visual interpretation, before
considering a numerical comparison:
KM:
Toby:
Describe to me what you see, compare those two groups.
Well, just by looking at that I would say that the, the men scored better than
the women. Um, then I would probably drop, um, means in there. Um,
SECONDARY TEACHERS’ STATISTICAL REASONING
KM:
Toby:
KM:
Toby:
KM:
Toby:
KM:
Toby:
KM:
Toby:
367
probably get an idea of what that was. Uh, 74, closer to 74, and that was 72.
Not, not that much difference. They’re about the same.
The same?
Yes.
And you’re basing that on?
Uh, that the means are pretty close together and that, there’s about, uh,
there, there are no real outliers … The females averaged higher, um, there’s
one kind of low one out there, but there’s not that much, they’re a pretty
close group, pretty closely grouped. If we had to go farther, we might, now I
don’t know how big this set is but I used all of the data, so.
So if somebody said, you know, is there any difference between these two
groups?
Well, to get, well, we could do those things like what we’ve been doing. Uh.
How, how many is this? Uh, only 230. Well, uh. And they’re all there. We
can do one of those things about, you know, pick 50 of them at a time, find
that average, pick 50 at a time, find that average, pick 50 at a time, and then
look at that, uh, the average of those.
Uh-huh.
OK. And, uh, that’s going to tend to squish the data together, and, towards
whatever the real mean of that data is, but it would also give me a, uh, idea
of, of the spread or the vari—how, how the highs and lows were.
OK.
Of that spread.
Toby also interpreted the difference that he found in the means as being “about
the same,” indicating he, too, possessed an expectation of variation between the
measures of the two groups. Toby also recognized that a sampling distribution of
some kind would help support his assertion that the difference between the two
groups was not significant, but he had similar difficulties determining how to set up
a sampling distribution or how to incorporate the sizes of the groups.
DISCUSSION
In examining teachers’ reasoning about comparing distributions, we found that
teachers were generally comfortable working with and examining traditional
descriptive statistical measures as a means of informal comparison. An interesting
contrast occurs, however, when we consider teachers’ conceptions of variability
when reasoning about comparing two distributions. As indicated in the literature,
variability is an under-researched area of statistical thinking (Meletiou, 2000). Yet
attitude toward variability could provide an important indication of statistical mindset (Wild & Pfannkuch, 1999). Having an understanding and tolerance of variability
encompasses a broad range of ideas. In examining the concept of variability with
only one distribution, one considers the variation of values within that distribution.
However, descriptive statistics for a single distribution are often viewed without
regard to variability of the statistical measures themselves. With one distribution,
there is little motivation to consider or investigate possible sources of variation in
368
KATIE MAKAR AND JERE CONFREY
the measures drawn. Comparing distributions creates a situation where one is
pushed to consider measures less deterministically. Depending on the measure that
dominates the comparison (often a mean), how does one interpret differences found
in measures between groups? That is, how does one determine whether any
difference between the dominant measures is meaningful or significant? Further,
how do teachers manage the distinction between these two kinds of variation? By
considering variation between distributions, we are encouraged to consider sources
of variation in these measures. In this chapter, we discuss three different ways that
teachers considered issues of variability when reasoning about comparing two
distributions: (1) how teachers interpreted variation within a group—the variability
of data; (2) how teachers interpreted variation between groups—the variability of
measures; and (3) how teachers distinguished between these two types of variation.
In the interviews, all four teachers knew that scores within each distribution of
scores would possess variability—that is, they did not expect the data in the
distribution of scores would all have the same value. Teachers’ conceptions of this
within-group variation were heard in their descriptions of shape, distribution,
outliers, standard deviation, range, “domain” (maximum and minimum values), and
“whiskers” on a box plot (not included in the preceding excerpts, but used by two of
the teachers). Additional qualitative descriptions included statements about a
distribution being “tighter” or “more spread out.” Commonly, teachers calculated
the standard deviation of each set almost immediately and somewhat automatically.
While all of the teachers clearly recognized variation within a single distribution,
they articulated a variety of meanings about variation between two distributions.
From our interaction with them in the workshops, we anticipated they would
demonstrate their view of between-group variation by acting in one of four ways: (a)
by calculating descriptive statistics for each group without making any comparisons;
(b) by comparing descriptive statistics (e.g., indicating a difference in magnitude or
that one was greater then the other); (c) by first comparing the descriptive measures
of the two distributions as described earlier, then indicating whether they considered
the difference to be meaningful by relying on informal techniques or intuition; or (d)
by investigating whether the differences they found in the measures to be
statistically significant using a formal test, such as the randomization test the
teachers carried out during the Orbital Express activity (Erickson, 2001, p. 276)
using the scramble attribute feature in Fathom, which randomizes one attribute of
the data.
In addition to describing the variation within each distribution separately, the
teachers typically reported some aspect of the similarity or differences in the
measure of dispersion between the two distributions, by comparing range or
standard deviation. They may also have compared shapes or means, for example, by
noting that the mean of the females’ scores was 2 points higher than that of the
males. In some cases, teachers indicated an intuition about variation between
measures, but struggled to quantify the evidence for their observations. One reason
for our perception that teachers had difficulty in quantifying variation between
distributions may be that the participants felt they were being pushed to provide
evidence of what seemed to them to be an obvious example of two distributions that
were “about the same.” Perhaps to the teachers, the sameness could be seen visually,
SECONDARY TEACHERS’ STATISTICAL REASONING
369
and they would not feel compelled to provide evidence of this observation under less
test-like circumstances.
Two of the teachers, Leesa and Natalie, attempted to formally test whether the
difference in the means of the two distributions was significant using some form of a
standard deviation taken from the data distributions. Furthermore, Toby, as well as
Leesa, checked the size of the population to see if it was “large enough” to draw
samples from, perhaps recalling that several times during the workshop they had
created sampling distributions by drawing random samples from a state data set of
10,000 student test scores. Neither of them, however, used the size of the data set in
determining whether the difference in means between the males and females was
significant. Overall, the three who considered using a sampling distribution
struggled to understand the circumstances under which using one would be helpful
nor were they able to separate the variability in the distributions of the data sets from
that of the related sampling distribution, confirming that this is a very difficult
concept to understand in statistics, consistent with the findings of delMas, Garfield,
and Chance (1999).
Using Confrey’s (1991, 1998) concept of voice and perspective, the authors
brought to the research their own perspective of statistical reasoning surrounding the
task of comparing distributions. By listening to teacher voice we were able to gain
further insight into our own understanding of variation as we worked to understand
the teachers’ reasoning. Although the literature clearly points to sampling
distributions as a stumbling point for students in inferential statistics, we had
thought that abundant experience with simulations involving sampling distributions
within meaningful problems that would demonstrate their power would be sufficient
to help teachers overcome this difficulty. In fact, the conflicts teachers had in using
sampling distributions may have been compounded by the way in which sampling
distributions and simulations were introduced together without providing
sufficiently motivating tasks for teachers to create a need for them. We learned that
a wealth of experience with sampling distributions to solve interesting problems was
not sufficient for their understanding. We believe, given our analysis of teachers’
reasoning in this area, that sampling distribution concepts need to be developed
more slowly, allowing teachers to conceptually construct the notion of a sampling
distribution rather than have it presented as part of a “good way” to solve the
problem at hand.
Comparing distributions raises another important issue about variation—which
variation are we referring to when we compare two distributions? With a single
distribution, discussions of variation are meant to describe variation within the
distribution at hand. Having two distributions to compare provides a motivation to
compare variation between the distributions. For example, if we observe that the
performance of males and females on a test differs by 2 points, what does this 2point difference tell us? Could this difference just be due to random variation, or
could it indicate a more meaningful underlying phenomenon? When comparing
groups and considering variation between distributions, it is important to consider
whether the data being compared is that of a sample or a population. Traditional
introductory instruction in significance testing often uses sampling distributions as a
way to generalize our findings from a sample to some larger, unknown population.
370
KATIE MAKAR AND JERE CONFREY
Whether data should be considered as a population or a sample is somewhat
problematic in the context of a school’s student assessment data and indicates that
these distinctions are not always clear-cut (Chance, 2002). On one hand, it makes
sense to consider a set of student test data from a school as its own population.
When comparing two groups, however, sampling distributions can inform us as to
whether the difference between groups is meaningful, hence pushing us to consider
measures beyond descriptive statistics. Simulations can be used to support a broader,
inference-like view of a difference even though we are not necessarily trying to
generalize to a larger population. In this case, we can investigate the difference in
means between male and female performance through the use of a randomization
test. That is, under the null hypothesis that there is no difference between the
performance of males and females on a test, if we were to randomize the students’
genders and then compare the means of the two groups, how likely is a difference of
2 points to occur between males and females just by chance? On the other hand, we
might want to conceptualize the two groups as samples in a larger population of all
students who pass through a school over many years to make inferences about the
school itself, even though the samples are not randomly selected, assuming one is
willing to accept these as representative samples.
In working with teachers, we found that capturing and influencing teachers’
statistical reasoning is much more complex than trying to understand and describe
students’ reasoning. Firstly, students are expected to be learners, but teachers
consider themselves experts. Therefore, it is very difficult for most experienced
teachers to admit what they do not know and be open to learning and discussing
their reasoning. Fortunately, statistics is a content area in which few teachers are
expected to have knowledge, making it a viable entrance for teachers to
reexperience being learners. Secondly, unless experienced teachers are enrolled in a
masters program, they are usually not an easily accessible group for the kind of
long-term study that can affect teachers’ thinking. The study described here began
with an agreement between a school principal and our research group to commit the
entire mathematics department of seven teachers to the research project, including a
2-week summer institute. By the end of the study however, only the two strongest of
the seven original teachers remained. This raises both an important question and
limitation of the study. First, how one can engage experienced secondary teachers in
research that hopes to both influence and study teacher learning and practice?
Second, the four teachers in the study likely had higher mathematical content
knowledge than might be considered typical. In addition, they were very committed
to improving their own practice, were highly engaged during activities and
discussion, and were more open than most to consider weaknesses in their own
understanding.
Comparing two groups provides a rich context in which to build statistical
reasoning. At a very early age in school, group comparisons can provide an impetus
to collect data and later, to view data as a distribution. At an advanced level, an
interesting problem involving comparing distributions can stimulate learners to
consider not only measures of dispersion within each group, but comparisons of
measures between groups, and hence to consider variation within the measures
themselves. Just as algebra and calculus are considered to be gatekeepers to higher
SECONDARY TEACHERS’ STATISTICAL REASONING
371
mathematics, understanding sampling distributions may be a gatekeeper to advanced
statistical reasoning. However, simply presenting sampling distributions as a
precursor to hypothesis testing may aggravate the difficulty learners have with its
underlying concepts.
Further work is needed in better understanding reasoning about sampling
distributions as well as ways to think about facilitating learners’ conceptual
development of variation within a distribution with an eye toward developing a
tolerance and expectation for variation in statistical measures. Understanding
sampling distributions is by no means a cure for the difficulty of understanding
variation of any sort, or toward loosening a deterministic view of statistics and data
analysis. It is the authors’ hope, however, that better understanding of teachers’
reasoning about comparing groups will open further discussion of building an
intuition of variation in data and statistics for teachers as well as students.
IMPLICATIONS
We ascertained that comparing distributions holds great potential for
encouraging learners to broaden their view of statistics and data. As researchers, we
found comparing distributions to be a fruitful arena for expanding teachers’
understanding of distribution and conceptions of variability as well as a motivating
reason to introduce sampling distributions. However, we found it important to
specify which kind of variation we are discussing when comparing two
distributions. Teachers’ reasoning about variation in the context of group
comparisons was examined in three areas: variation within a distribution, variation
between groups (variation of measures), and the struggle to interpret the difference
between these two types of variation. The importance of making this distinction
surprised us, and motivated us to consider both our own understanding and the way
in which we planned our conjectured learning trajectory. This study implies that
sources of variation in both data and in measures need to be discussed frequently
when working with data, and again as measures are compared between distributions,
to engender a tolerance for variation both within and between distributions.
At a more advanced level of statistical content, our study supports the findings
of delMas et al. (1999) about the difficulty in understanding sampling distributions
and implies that the teaching of sampling distributions needs to be done more
carefully. Furthermore, traditional teaching of hypothesis and significance testing
and the overreliance on computer simulations may actually promote misconceptions
rather than advance understanding of sampling distributions. In addition, discussion
about the distinctions and ambiguities between considering data as a sample or a
population need to occur in the teaching of significance testing and among the
research community.
H. G. Wells predicted decades ago that “Statistical thinking will one day be as
necessary for efficient citizenship as the ability to read and write” (quoted in Snee,
1990, p. 117). If our goal is to promote statistical reasoning in our students, we must
better understand and engender the statistical thinking and reasoning of teachers.
372
KATIE MAKAR AND JERE CONFREY
Snee (1990) highlights in his definition of statistical thinking in the quality control
industry the importance of a recognition that “variation is all around us, present in
everything we do” (p. 118). The concept of variation needs to be engendered early
and continuously when teaching statistical reasoning. The teaching of statistics
throughout schooling, with an emphasis on distribution and variation, may provide a
way to loosen the deterministic stance of teachers, students, and the public toward
data and statistics. More research is needed in this area.
REFERENCES
Abelson, R. (1995). Statistics as principled argument. Hillsdale, NJ: Erlbaum.
Chance, B. L. (2002). Personal communication (email: April 11, 2002).
Cobb, P. (1999). Individual and collective mathematical development. The case of statistical data
analysis. Mathematical Thinking and Learning, 1(1), 5–43.
Cohen, D. K., & Ball, D. L. (1990). Relations between policy and practice: A commentary. Educational
Evaluation and Policy Analysis, 12(3), 249–256.
Confrey, J. (1991). Learning to listen: A student’s understanding of powers of ten. In E. von Glasersfeld
(Ed.), Radical constructivism in mathematics education (pp. 111–138). Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Confrey, J. (1998). Voice and perspective: Hearing epistemological innovation in students’ words. In M.
Larochelle & N. Bednarz & J. Garrison (Eds.), Constructivism and education (pp. 104–120). New
York: Cambridge University Press.
Confrey, J. (in preparation). Systemic crossfire. Unpublished manuscript.
Confrey, J., & Makar, K. (2002). Developing secondary teachers’ statistical inquiry through immersion
in high-stakes accountability data. Paper presented at the Twenty-fourth Annual Meeting of the North
American Chapter of the International Group for the Psychology of Mathematics Education (PMENA), Athens, GA.
delMas, R. C., Garfield, J., & Chance, B. L. (1999). A model of classroom research in action: Developing
simulation activities to improve students’ statistical reasoning. Journal of Statistics Education, 7(3).
Erickson, T. (2001). Data in depth: Exploring mathematics with Fathom. Emeryville, CA: Key
Curriculum Press.
Finzer, W. (2000). Fathom (Version 1.1). Emeryville, CA: Key Curriculum Press.
Friel, S., Curcio, F., & Bright, G. (2001). Making sense of graphs: Critical factors influencing
comprehension and instructional implications. Journal for Research in Mathematics Education, 32(2),
124–158.
Konold, C., & Higgins, T. (2002). Highlights of related research. In S. J. Russell, D. Schifter, & V.
Bastable (Eds.), Developing mathematical ideas: Working with data, (pp. 165-201). Parsippany, NJ:
Seymour Publications.
Konold, C., & Pollatsek, A. (2002). Data analysis as the search for signals in noisy processes. Journal for
Research in Mathematics Education, 33(4), 259–289.
Lappan, G., Fey, J. T., Fitzgerald, W. M., Friel, S. N., & Phillips, E. D. (1998). Connected Mathematics:
Data about us. White Plains, NY: Seymour.
Lehrer, R., & Schauble, L. (2000). Modeling in mathematics and science. In R. Glaser (Ed.), Advances in
instructional psychology: Educational design and cognitive science (Vol. 5, pp. 101–159). Mahwah,
NJ: Erlbaum.
Lieberman, A., & Wood, D. R. (2003). Inside the National Writing Project: Connecting network learning
and classroom teaching. New York: Teachers College Press.
Makar, K., & Confrey, J. (2002). Comparing two distributions: Investigating secondary teachers’
statistical thinking. Paper presented at the Sixth International Conference on Teaching Statistics
(ICOTS-6), Cape Town, South Africa.
Meletiou, M. (2000). Developing students’ conceptions of variation: An untapped well in statistical
reasoning. Unpublished dissertation, University of Texas, Austin.
SECONDARY TEACHERS’ STATISTICAL REASONING
373
National Writing Project. (2002, April). National Writing Project Mission. Author. Retrieved April 28,
2002, from www.writingproject.org
QSR. (1999). NVivo (Version 1.1). Melbourne, Australia: Qualitative Solutions and Research Pty. Ltd.
Snee, R. (1990). Statistical thinking and its contribution to total quality. The American Statistician, 44(2),
116–121.
Stake, R. E. (1994). Case studies. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative
research. Thousand Oaks, CA: Sage.
Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for
developing grounded theory. Thousand Oaks, CA: Sage.
TERC. (1998). Investigations in number, data, and space. White Plains, NY: Seymour.
Watson, J., & Moritz, J. (1999). The beginning of statistical inference: Comparing two data sets.
Educational Studies in Mathematics, 37, 145–168.
Wild, C., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67(3), 223–265.
Chapter 16
PRINCIPLES OF INSTRUCTIONAL DESIGN
FOR SUPPORTING THE DEVELOPMENT OF
STUDENTS’ STATISTICAL REASONING
Paul Cobb and Kay McClain
Vanderbilt University, USA
OVERVIEW
This
chapter proposes design principles for developing statistical reasoning in
elementary school. In doing so, we will draw on a classroom design experiment that
we conducted several years ago in the United States with 12-year-old students that
focused on the analysis of univariate data. Experiments of this type involve tightly
integrated cycles of instructional design and the analysis of students’ learning that
feeds back to inform the revision of the design. However, before giving an overview
of the experiment and discussing specific principles for supporting students’
development of statistical reasoning, we need to clarify that we take a relatively
broad view of statistics. The approach that we followed in the classroom design
experiment is consistent with G. Cobb and Moore’s (1997) argument that data
analysis comprises three main aspects: data generation, exploratory data analysis
(EDA), and statistical inference. Although Cobb and Moore are primarily concerned
with the teaching and learning of statistics at the college level, we contend that the
major aspects of their argument also apply to the middle and high school levels.
EDA involves the investigation of the specific data at hand (Shaughnessey,
Garfield, & Greer, 1996). Cobb and Moore (1997) argue that EDA should be the
initial focus of statistics instruction since it is concerned with trends and patterns in
data sets and does not involve an explicit consideration of sample-population
relations. In such an approach, students therefore do not initially need to support
their conclusions with probabilistic statements of confidence. Instead, conclusions
are informal and are based on meaningful patterns identified in specific data sets.
Cobb and Moore’s (1997) proposal reflects their contention that EDA is a
necessary precursor to statistical inference. Statistical inference is probabilistic in
that the intent is to assess the likelihood that patterns identified in a sample are not
375
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 375–395.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
376
PAUL COBB AND KAY MCCLAIN
specific to that batch of data, but indicate trends in the larger population from which
the data were generated. As Cobb and Moore indicate, the key idea underpinning
statistical inference—that of sampling distribution—is relatively challenging even at
the college level.
Cobb and Moore also argue that students are not in a position to appreciate the
relevance of the third aspect of statistics, a carefully designed data generation
process, until they become familiar with data analysis. They observe that “if you
teach design before data analysis, it is harder for students to understand why design
matters” (Cobb & Moore, 1997, p. 816). However, although they recommend
introducing design after EDA, they also recognize that the crucial understandings
that students should develop center on the relationship between the legitimacy of
conclusions drawn from the data and the soundness of the process by which the data
were generated.
Cobb and Moore’s (1997) observations provide an initial framework for
instructional design within which to develop specific design principles. In the design
experiment that we conducted with the 12-year-old students, we focused primarily
on EDA and on the process of generating data. We did, however, also explore the
possibilities for statistical inference in both this experiment and in a follow-up
design experiment we conducted with some of the same students that emphasized
the analysis of bivariate data1. As Bakker and Gravemeijer (Chapter 7) illustrate,
productive instructional activities for supporting the development of the students’
reasoning about statistical inference include those in which students describe the
characteristics of a data set that they anticipate will be relatively stable if the data
generation process is repeated or if the size of the sample is increased. In our view,
students’ initial intuitions about the relative stability of the shape of both univariate
and bivariate data sets constitute a potential starting point for an instructional
sequence that culminates with students’ development of a relatively deep
understanding of the crucial idea of sampling distribution (Cobb, McClain, &
Gravemeijer, 2003; Saldanha and Thompson, 2001). We introduce this conjecture to
situate our discussion of design principles in a broader instructional context, since
our focus in the remainder of this chapter will be on supporting the development of
students’ reasoning about data in the contexts of EDA and data generation. To
ground the proposed design principles, we first give a short overview of the
classroom design experiment and then frame it as a paradigm case in which to tease
out design principles that address five aspects of the classroom environment that
proved critical in supporting the students’ statistical learning:
•
•
•
•
•
The focus on central statistical ideas
The instructional activities
The classroom activity structure
The computer-based tools the students used
The classroom discourse
PRINCIPLES OF INSTRUCTIONAL DESIGN
377
OVERVIEW OF THE CLASSROOM DESIGN EXPERIMENT
Initial Assessments of the Students’ Reasoning
In preparing for the design experiment, we conducted interviews and wholeclass performance assessments with a group of seventh graders from the same
school in which we planned to work. These assessments indicated that data analysis
for most of these students involved “doing something with the numbers” (McGatha,
Cobb, & McClain, 1999). In other words, they did not view data as measures of
aspects or features of a situation that had been generated in order to understand a
phenomenon or make a decision or judgment (e.g., the points that a player scores in
a series of basketball games as a measure of her skill at the game). In a very real
sense, rather than analyzing data, the students were simply manipulating numbers in
a relatively procedural manner. Further, when the students compared two data sets
(e.g., the points scored by two basketball players in a series of games), they typically
calculated the means without considering whether this would enable them to address
the question or issue at hand. For example, in the case of the points scored by the
two basketball players, simply calculating the means would not necessarily be a
good way to select a player for an important game because it ignores possible
differences in the range and variability of the players’ scores (i.e., the player with a
slightly lower mean could be much more consistent).
In interpreting these findings, we did not view ourselves as documenting an
inherent psychological stage in seventh graders’ reasoning about data. Instead, we
were documenting the consequences of the students’ prior instruction in statistics.
They had, for example, previously studied measures of center (i.e., mean, mode, and
median) as well as several types of statistical graphs (e.g., bar graphs, histograms,
and pie charts). Our assessments of the students’ reasoning at the beginning of the
experiment tell us something about not just the content but also the quality of their
prior instruction. The assessments indicate, for example, that classroom activities
had emphasized calculational procedures and conventions for drawing graphs rather
than the creation and manipulation of graphs to detect trends and patterns in the
data. This view of the students’ reasoning as situated with respect to prior
instruction was useful in that it enabled us to clarify the starting points for the design
experiment. For example, we concluded from the assessments that our immediate
goal was not one of merely remediating certain competencies and skills. Instead, the
challenge was to influence the students’ beliefs about what it means to do statistics
in school. In doing so, it would be essential that they actually begin to analyze data
in order to address a significant question rather than simply manipulate numbers and
draw specific types of graphs.
378
PAUL COBB AND KAY MCCLAIN
Concluding Assessments of the Students’ Reasoning
The students’ reasoning in these initial assessments contrasts sharply with the
ways in which they analyzed data at the end of the 10-week experiment. As an
illustration, in one instructional activity, the students compared two treatment
protocols for AIDS patients by analyzing the T-cell counts of people who had
enrolled in one of the two protocols. Their task was to assess whether a new
experimental protocol in which 46 people had enrolled was more successful in
raising T-cell counts than a standard protocol in which 186 people had enrolled. The
data the students analyzed is shown in Figure 1 as it was displayed in the second of
two computer-based tools that they used. All 29 students in the class concluded from
their analyses that the experimental treatment protocol was more effective.
Nonetheless, the subsequent whole-class discussion lasted for over an hour and
focused on both the adequacy of the reports the students had written for a chief
medical officer and the soundness of their arguments.
For example, one group of students had partitioned the two data sets at T-cell
counts of 525 by using one of the options on the computer tool as shown in Figure 1.
In the course of the discussion, it became clear that their choice of 525 was not
arbitrary. Instead, they had observed that what they referred to as the “hill” in the
experimental treatment data was above 525, whereas the “hill” in the standard
treatment data was below 525. It was also apparent from the discussion that both
they and the other students who contributed to the discussion reasoned about the
display shown in Figure 1 in terms of relative rather than absolute frequencies (i.e.,
they focused on the proportion rather than the number of the patients in each
treatment protocol whose T-cell counts were above and below 525). This was
indicated by explanations in which students argued that most of the T-cell counts in
the experimental treatment were above 525, but most of the T-cell counts in the
traditional treatment were below 525.
PRINCIPLES OF INSTRUCTIONAL DESIGN
379
Ex perimental Treatment
Trad itional Treatment
Figure 1. The AIDS protocol data partitioned at T-cell counts of 525.
This analysis was one of the most elementary that the students produced on this
instructional activity. As a point of comparison, another group of students had used
an option on the computer tool that enabled them to hide the dots that represented
the individual data values and had then used another option on the tool to partition
the two data sets into four groups, each of which contained one-fourth of the data
points (see Figure 2). In this option, 25 percent of the data in each data set are
located in each of the four intervals bounded by the vertical bars (similar to a box
plot). As one student explained, these graphs show that the experimental treatment is
more effective because the T-cell counts of 75 percent of the patients in this
treatment were above 550, whereas the T-cell counts of only 25 percent of the
patients who had enrolled in the standard treatment were above 550. This student’s
argument was representative in that he, like the other students who contributed to
the discussion, was actually reasoning about data rather than attempting to recall
procedures for manipulating numerical values.
380
PAUL COBB AND KAY MCCLAIN
Figure 2. The AIDS protocol data organized into four equal groups with the individual data
points hidden.
We have described the design experiment in some detail elsewhere and have
documented the major phases in the development of the students’ reasoning about
data (Cobb, 1999; McClain, Cobb, & Gravemeijer, 2000). In addition, Bakker and
Gravemeijer (Chapter 7) report on a series of four classroom design experiments
that they conducted in the Netherlands in which students used the same two
computer tools. For our present purposes, it therefore suffices to note that our
classroom observations were corroborated by individual interviews that we
conducted to document the students’ reasoning at the end of the experiment. The
analysis of these interviews indicates that a significant majority of the students could
PRINCIPLES OF INSTRUCTIONAL DESIGN
381
readily interpret graphs of two unequal data sets organized either into equal interval
widths (an analogue of histograms) or into four equal groups (an analogue of box
plots) in terms of patterns in how the data were distributed. In this regard, Konold,
Pollatsek, Well, & Gagnon (1996) argue that a focus on the rate of occurrence (i.e.,
the proportion) of data within a range of values (e.g., above or below T-cell counts
of 525) is at the heart of what they term a statistical perspective. Because
discussions in the latter part of the experiment involved a concern for the proportion
of data within various ranges of values, the students appeared to be developing this
statistical perspective. It is also worth noting that when we began the follow-up
design experiment with some of the same students nine months later, there was no
regression in their statistical reasoning (Cobb et al., 2003). The students’ progress at
the beginning of this follow-up experiment was in fact such that they could all
interpret univariate data sets organized into equal interval widths and into four equal
groups in these relatively sophisticated ways within the first three or four class
sessions.
This overview gives some indication of how the students’ reasoning about data
changed during the 10-week experiment. We now turn our attention to the process
of that change and the design principles inherent in the means by which it was
supported and organized.
CENTRAL STATISTICAL IDEAS
Distribution as an Overarching Statistical Idea
In their discussion of instructional design, Wiggins and McTighe (1998)
emphasize the importance of beginning the design process by identifying the “big
ideas” that are at the heart of the discipline, that have enduring value beyond the
classroom, and that offer potential for engaging students. This design principle is
particularly important in the case of elementary statistics instruction given that
curricula frequently reduce the domain to a collection of at best loosely related
concepts (e.g., mean, mode, median) together with conventions for making various
types of graphs. McGatha (2000) documents the actual process by which we
prepared for the design experiment. As she describes, our proposal of distribution as
an overarching statistical idea emerged as we attempted to synthesize the research
literature and analyzed the interviews and classroom performance assessments that
we conducted as part of our pilot work. One of the primary goals for the design
experiment was therefore that the students would come to reason about data sets as
entities that are distributed within a space of possible values (Konold et al., 1996;
Hancock, Kaput, & Goldsmith, 1992; Konold & Higgins, in press; Wilensky, 1997).
Bakker and Gravemeijer (Chapter 7, Table 1) clarify the central, organizing role of
distribution, thereby illustrating that notions such as center, spread, skewness, and
relative density can then be viewed as ways of characterizing how specific data sets
382
PAUL COBB AND KAY MCCLAIN
are distributed within this space of values. We would only add to their account that
various statistical graphs or inscriptions then become ways of structuring data
distributions in order to identify relevant trends or patterns. As an illustration, the
students who analyzed the AIDS treatment data by organizing the two data sets into
four equal groups used this precursor of the box plot in order to identify patterns that
were relevant in determining which of the two treatments was more effective. More
generally, in the approach that we took in the design experiment, the students’
development of increasingly sophisticated ways of reasoning about data was
inextricably bound up with their development of increasingly sophisticated ways of
inscribing data (Biehler, 1993; de Lange, van Reeuwijk, Burrill, & Romberg, 1993;
Lehrer & Romberg, 1996).
Bivariate Data Sets as Distributions
We can illustrate the importance of explicating central statistical ideas as a basic
design principle by extending this focus on distribution to the analysis of bivariate
data. Because statistical covariation involves coordinating the variation of two sets
of measures, the characteristics of directionality and strength are sometimes viewed
as being relatively transparent in two-dimensional inscriptions such as scatter plots.
However, a focus on the way that bivariate data are distributed reveals that
proficient statistical analysts’ imagery of covariation is no more two-dimensional
than their imagery of univariate distributions is one-dimensional. This is clearer in
the case of univariate data in that inscriptions such as line plots involve, for the
proficient user, a second dimension that indicates relative frequency. In the case of
bivariate data, however, scatter plots do not provide such direct perceptual support
for a third dimension corresponding to relative frequency. Instead, it appears that
proficient analysts read this third dimension from the relative density of the data
points2. This analysis of the types of reasoning that are involved in viewing bivariate
data sets as distributions serves to clarify both the overall instructional goal and the
primary challenge facing the instructional designer, that of enabling students to read
this implicit third dimension into two-dimensional inscriptions such as scatter plots
and thus to see the distributional shape of the data.
It should be clear from the illustrations we have given as well as from Bakker
and Gravemeijer’s (Chapter 7) discussion of their design experiments that a focus on
overarching ideas can lead to a far-reaching reconceptualization of the statistics
curriculum. This design principle therefore contrasts sharply with research that
focuses on standard topics in current curricula in isolation. The benefit of adhering
to the principle of identifying central statistical ideas is that it contributes to the
development of relatively coherent instructional designs. The development of the
students’ statistical reasoning in the design experiment that we conducted can in fact
be viewed as the first phase of a long-term learning trajectory that extends to the
university level and encompasses the key idea of sampling distribution.
PRINCIPLES OF INSTRUCTIONAL DESIGN
383
INSTRUCTIONAL ACTIVITIES
The Investigative Spirit of Data Analysis
As we have indicated, our primary focus in the design experiment was on
exploratory data analysis and the process of generating data rather than on statistical
inference. We found Biehler and Steinbring’s (1991) characterization of EDA as
detective work particularly helpful in that it emphasizes that the purpose is to search
for evidence. In contrast, statistical inference plays the role of the jury that decides
whether this evidence is sufficient to make claims about the population from which
the data were drawn. Biehler and Steinbring’s metaphor of detective makes it clear
that an exploratory or investigative orientation is central to data analysis and
constitutes an important instructional goal in its own right. From this, we concluded
as a basic design principle for elementary statistics instruction that students’ activity
in the classroom should involve the investigative spirit of data analysis from the
outset. This in turn implied that the instructional activities should all involve
analyzing data sets that students view as realistic for a purpose that they consider
legitimate.
The instructional activities that we developed in the course of the design
experiment involved either (a) analyzing a single data set in order to understand a
phenomenon, or (b) comparing two data sets in order to make a decision or
judgment. The example of the AIDS treatment activity illustrates the second of the
two types of instructional activities. In describing this activity, we also noted that the
students were required to write a report of their analyses for a chief medical officer.
This requirement supported the students’ engagement in what might be termed
genuine data analysis by orienting them to take account of a specific audience to
either understand a phenomenon or to make a decision based on their analyses. In
this regard, we note that data are typically analyzed with a particular audience in
mind almost everywhere except in school (cf. Noss, Pozzi, & Hoyles, 1999).
Focusing on Significant Statistical Ideas
In addition to ensuring that the students’ activity was imbued with the
investigative spirit of data analysis, we also had to make certain that significant
statistical ideas emerged as the focus of conversations during whole-class
discussions of the students’ analyses (cf. Hancock, Kaput, & Goldsmith, 1992). The
challenge for us as instructional designers was therefore to transcend what Dewey
(1981) termed the dichotomy between process and content by systematically
supporting the emergence of key statistical ideas while simultaneously ensuring that
the analyses the students conducted involved an investigative orientation. This is a
nontrivial issue in that inquiry-based instructional approaches have sometimes been
criticized for emphasizing the process of inquiry at the expense of substantive
disciplinary ideas.
384
PAUL COBB AND KAY MCCLAIN
In approaching this challenge, we viewed the various data-based arguments that
the students produced as they completed the instructional activities as a primary
resource on which the teacher could draw to initiate and guide whole-class
discussions that focused on significant statistical ideas. As a basic instructional
design principle, our goal when developing specific instructional activities was
therefore to ensure that the students’ analyses constituted such a resource for the
teacher. This would enable the teacher to initiate and guide the direction of wholeclass discussions that furthered her instructional agenda by capitalizing on the
diverse ways in which the students had organized and interpreted the data sets. In
the case of the AIDS instructional activity, for example, the issues that emerged as
explicit topics of conversation during the subsequent whole-class discussion
included the contrast between absolute and relative frequency, the interpretation of
data organized into four equal groups, and the use of percentages to quantify the
proportion of the data located in particular intervals (Cobb, 1999; McClain et al.,
2000).
The enactment of this design principle required extremely detailed instructional
planning, in the course of which we attempted to anticipate the range of data-based
arguments the students might produce as they completed specific instructional
activities. Our discussions of seemingly inconsequential features of task scenarios
and of the particular characteristics of data sets were therefore quite lengthy since
minor modifications to an instructional activity could significantly influence the
types of analyses the students would produce and thus the resources on which the
teacher could draw to further her instructional agenda.
As an illustration, we purposefully constructed data sets with a significantly
different number of data points when we developed the AIDS activity, so that the
contrast between absolute and relative frequency might become explicit. This in turn
required a task scenario in which the inequality in the size of the data sets would
seem reasonable to the students and in which they would view the issue under
investigation to be worthy of their engagement. Although the AIDS activity proved
to be productive, on several occasions our conjectures about either the level of the
students’ engagement in an activity or the types of analyses they would produce
turned out to be ill founded. In these situations, our immediate task was to analyze
the classroom session in order to understand why the instructional activity had
proven to be inadequate and thus revise our conjectures and develop a new
instructional activity. In our view, this cyclic process of testing and revising
conjectures about the seemingly minor features of instructional activities is essential
if we are to develop relatively long-term instructional sequences in which teachers
can support students’ development of significant statistical ideas by drawing on their
inquiry-oriented reasoning as a primary resource.
PRINCIPLES OF INSTRUCTIONAL DESIGN
385
THE CLASSROOM ACTIVITY STRUCTURE
Talking through the Data Generation Process
As we have indicated, one of our concerns at the beginning of the design
experiment was that the students would view data not merely as numbers, but as
measures of an aspect of a situation that were relevant to the question under
investigation. To this end, the teacher introduced each instructional activity by
talking through the data generation process with the students. These conversations
often involved protracted discussions during which the teacher and students together
framed the particular phenomenon under investigation (e.g., AIDS), clarified its
significance (e.g., the importance of developing more effective treatments),
delineated relevant aspects of the situation that should be measured (e.g., T-cell
counts), and considered how they might be measured (e.g., taking blood samples).
The teacher then introduced the data the students were to analyze as being generated
by this process. The resulting structure of classroom activities, which often spanned
two or more class sessions, was therefore (a) a whole-class discussion of the data
generation process, (b) an individual or small-group activity in which the students
usually worked at computers to analyze data, and (c) a whole-class discussion of the
students’ analyses.
In developing this classroom activity structure, we conjectured that as a result of
participating in discussions of the data generation process, data sets would come to
have a history for the students such that they reflected the interests and purposes for
which they were generated (cf. Latour, 1987; Lehrer & Romberg, 1996; Roth,
1997). This conjecture proved to be well founded. For example, we have clear
indications that within a week of the beginning of the design experiment, doing
statistics in the project classroom actually involved analyzing data for the students
(Cobb, 1999; McClain et al., 2000). In addition, changes in the way that the students
contributed to discussions of the data generation process as the design experiment
progressed indicate that there was a gradual transfer of responsibility from the
teacher to the students.
Initially, the teacher had to take an extremely proactive role. However, later in
the experiment the students increasingly initiated shifts in these discussions, in the
course of which they raised concerns about sampling processes as well as the control
of extraneous variables. We have documented the process by which the students
learned about data generation and the means by which that learning was supported
elsewhere (Cobb & Tzou, 2000). For our current purposes, it suffices to note that the
issues the students raised in the latter part of the experiment indicate that most if not
all had come to realize that the legitimacy of the conclusions drawn from data
depends crucially on the data generation process.
We should clarify that the teacher did not attempt to teach the students how to
generate sound data directly. Instead, she guided the development of a classroom
culture in which a premium was placed on the development of data-based
arguments. It was against this background that the students gradually became able to
386
PAUL COBB AND KAY MCCLAIN
anticipate the implications of the data generation process for the conclusions that
they would be able to draw from data.
Data Collection and Data Generation
Given that our focus in this chapter is on design principles, it is important to note
that design decisions relating to data generation are frequently reduced to the
question of whether students should collect the data that they analyze. We decided
that the students would for the most part not collect data during the design
experiment, for two reasons. First, we had a limited number of classroom sessions
available in which to conduct the design experiment; and second, we wanted to
ensure that the data sets the students analyzed had particular characteristics so that
the teacher could guide the emergence of issues that would further her instructional
agenda. However, an interpretation of the design experiment as merely a case of
students coming to reason meaningfully about data that they have not generated
themselves misses the larger point. As a design principle for elementary statistics
instruction, we contend on the basis of our findings that it is important for students
to talk through the data generation process whether or not they actually collect data.
Our rationale for this claim becomes apparent when we note that data collection
is but one phase in the data generation process, one that involves making
measurements. The science education literature is relevant in this regard since it
indicates that students who are involved in collecting their own data often do not
understand the fundamental reasons for doing so and are primarily concerned with
following methodological procedures and getting “the right data.” In our view, such
cases are predictable consequences of instructional designs that fail to engage
students in the phases of the data generation process that precede data collection.
These preceding phases involve clarifying the significance of the phenomenon under
investigation, delineating relevant aspects of the phenomenon that should be
measured, and considering how they might be measured. A primary purpose for
engaging students in these phases is to enable them to remain cognizant of the
purposes underpinning their inquiries and, eventually, to appreciate the influence of
data generation on the legitimacy of the conclusions they can draw from the data
they collect. In an approach of this type, the series of methodological decisions that
make the collection of data possible are not assumed to be transparent to students,
but instead become an explicit focus of discussion in the course of which students
engage in all phases of the data generation process.
TOOL USE
As we have noted, the use of computer-based tools to create and manipulate
graphical representations of data is central to exploratory data analysis (EDA). In the
design experiment, the students used two computer tools that were explicitly
designed to support the development of their statistical reasoning. We described the
PRINCIPLES OF INSTRUCTIONAL DESIGN
387
second of these tools when we discussed students’ analyses of the AIDS treatment
data. Bakker and Gravemeijer (Chapter 7) illustrate the range of options available
for structuring data on both this tool and the first tool, the interface of which is
shown in Figure 3. As Bakker and Gravemeijer also clarify, students could use this
first tool to order, partition, and otherwise organize sets of up to 40 data points in a
relatively immediate way. When data are entered, each data point is inscribed as a
horizontal bar. Figure 3 shows data on the life spans of ten batteries of each of two
different brands that were generated to investigate which of the two brands is
superior in this respect.
Figure 3. The first computer Minitool.
Compatibility with Students’ Current Reasoning
A design principle that guided the development of the two computer tools was
that they should fit with students’ reasoning at a particular point in the instructional
sequence (cf. Gravemeijer, 1994). It was apparent from our classroom observations
that the tools did fit with the students’ reasoning since they could use them to
investigate trends and patterns in data with only a brief introduction. These
observations indicate that when they were first introduced in the design experiment,
the ways in which data were inscribed in the tools were transparent to the students.
388
PAUL COBB AND KAY MCCLAIN
In the case of the first tool, we have noted that one of our concerns at the beginning
of the experiment was that the students would actually analyze data rather than
merely manipulate numbers. It was for this reason that we decided to inscribe
individual data values as horizontal bars. In addition, the initial data sets that the
students analyzed when this tool was introduced within the first week of the
experiment were selected so that the measurements made when generating the data
had a sense of linearity and thus lent themselves to this type of inscription (e.g., the
braking distances of cars, the life spans of batteries). As we have indicated, the
choice of this inscription together with the approach of talking through the data
generation process proved to be effective in that the teacher was able to initiate a
shift in classroom discourse such that all the students actually began to reason about
data as they completed the second instructional activity involving the first tool.
Supporting the Development of Students’ Reasoning
A second design principle that guided the development of the two computer
tools was that the students would come to reason about data in increasingly
sophisticated ways as they used the tools and participated in the subsequent wholeclass discussions of their analyses. We therefore viewed the design of the tools that
the students would use as a primary means of supporting the reorganization of their
statistical reasoning (cf. Dorfler, 1993; Kaput, 1991; Meira, 1998; Pea, 1993). In the
case of the first tool, the students dragged the vertical value bar along the axis to
either partition data sets or find the value of specific data points. In addition, they
used the range tool to isolate a particular interval and compare the number of data
points of each data set that were in that interval. In Figure 3, the range tool has been
used to bound the 10 longest lasting batteries. It was as the students used the
computer tool in these ways that they began to reason about (a) the maximum and
minimum values and the range of data sets, (b) the number of data points above or
below a particular value or within a specified interval, and (c) the median and its
relation to the mean. Against the background of these developments, the teacher
introduced the second tool in which data points were inscribed as dots in an axis plot
(see Figure 1).
Sequencing the Use of Tools
Our intention in designing the second tool was to build on the ways of reasoning
about data that the students had developed as they used the first tool. As Bakker and
Gravemeijer note, the dots at the end of the bars in the first tool have, in effect, been
collapsed down onto the axis in the second tool. The teacher in fact introduced this
new way of inscribing data first by showing a data set inscribed as horizontal bars,
and then by removing the bars to leave only the dots, and finally by transposing the
dots onto the horizontal axis. As we had conjectured, the students were able to use
the second tool to analyze data with little additional guidance, and it was apparent
that the axis plot inscription signified a set of data values rather than merely a
PRINCIPLES OF INSTRUCTIONAL DESIGN
389
collection of dots spaced along a line. However, this development cannot be
explained solely by the teacher’s careful introduction of the new tool. Instead, we
have to take account of a further aspect of the students’ activity as they used the first
tool in order to explain why the second tool fit with their reasoning.
We can tease out this aspect of the students’ learning by focusing on their
reasoning as they used the first tool to compare data sets in terms of the number of
data points either within a particular interval or above or below a particular value.
To illustrate, one student explained that he had analyzed the battery data by using
the value bar to partition the data at 80 hours as shown in Figure 3. He then argued
that some of the batteries of one brand were below 80 hours, whereas all those of the
other brand lasted more than 80 hours. He judged this latter brand to be superior
because, as he put it, he wanted a consistent battery. The crucial point to note is that
in making arguments of this type, the students focused on the location of the dots at
the end of the bars with respect to the axis. In other words, a subtle but important
shift occurred as the students used the first tool. Originally, the individual data
values were represented by the lengths of the bars. However, in the very process of
using the tool, these values came to be signified by the endpoints of the bars.
As a result of this development, the second tool fit with the students’ reasoning
when it was introduced; they could readily understand the teacher’s explanation of
collapsing the dots at the end of the bars down onto the axis. Further, because the
options in this new tool all involved partitioning data sets in various ways, the
students could use it immediately because they had routinely partitioned data sets
when they used range and value bar options on the first tool. This in turn made it
possible for the second tool to serve as a means of supporting the development of
their statistical reasoning. As our discussion of the AIDS treatment activity
illustrates, students came to view data sets as holistic distributions that have shape
rather than as amorphous collections of individual data points, to reason about these
shapes in terms of relative rather than absolute frequencies, and to structure data sets
in increasingly sophisticated ways.
It is almost impossible to deduce this subtle but important aspect of the students’
learning by inspecting the physical characteristics of the first tool. As a third
principle for the design of tools, we did not attempt to build the statistical ideas we
wanted students to learn into the two computer tools and then hope that they might
come to see them in some mysterious and unexplained way. Instead, when we
designed the tools, we focused squarely on how the students might actually use them
and what they might learn as they did so. Although this principle is relatively
general, it is particularly important in the case of statistical data analysis given the
central role of computer-based analysis tools and graphical representations in the
discipline. The value of this principle is that it orients the designer to consider how
the students’ use of a proposed tool will change the nature of their activity as they
analyze data and thus the types of reasoning that they might develop. In the case of
the design experiment, a focus on data sets as holistic distributions rather than as
collections of individual data points might not have become routine had the design
of the tools been significantly different.
390
PAUL COBB AND KAY MCCLAIN
CLASSROOM DISCOURSE
The frequent references we have made to the whole-class discussions in which
the students shared and critiqued their analyses indicates the value we attribute to
this discourse as a means of supporting the students’ learning. To this point, we have
emphasized that these discussions should focus on significant statistical ideas that
advance the teachers’ instructional agenda. In further clarifying the importance of
the whole-class discussions, we consider norms or standards for what counts as an
acceptable data-based argument and then return to our goal of ensuring that
significant statistical ideas emerge as topics of conversation.
Norms for Statistical Argumentation
Bakker and Gravemeijer (Chapter 7) report that the establishment of productive
classroom norms is as important in supporting students’ learning as the use of
suitable computer tools, the careful planning of instructional activities, and the skills
of the teacher in managing whole-class discussions. We can illustrate the
significance of a key classroom norm—that of what counts as an acceptable databased argument—by returning to the students’ analyses of the battery data.
The first student who explained her reasoning said that she had focused on the
10 highest data values (i.e., those bounded by the range tool as shown in Figure 3).
She went on to note that 7 of the 10 longest lasting batteries were of one brand and
concluded that this brand was better. However, during the ensuing discussion, it
became apparent that her decision to focus on the 10 rather than, say, the 14 longest
lasting batteries was relatively arbitrary. In contrast, the next student who presented
an analysis explained that he had partitioned the data at 80 hours because he wanted
a consistent battery that lasted at least 80 hours. In doing so, he clarified why his
approach to organizing the data was relevant to the question at hand—that of
deciding which of the two brands was superior.
As the classroom discussion continued, the obligation that the students should
give a justification of this type became increasingly explicit. For example, a third
student compared the two analyses by commenting that although 7 of the 10 longest
lasting batteries were of one brand, the 2 lowest batteries were also of this brand,
and “if you were using the batteries for something important, you could end up with
one of those bad batteries.” Because of exchanges like this, the teacher and students
established relatively early in the design experiment that to be acceptable, an
argument had to justify why the method of structuring the data was relevant to the
question under investigation. In our view, the establishment of this norm of
argumentation constitutes an important design principle for statistics instruction. On
the one hand, it serves to delegitimize analyses in which students simply produce a
collection of statistics (e.g., mean, median, range) rather than attempt to identify
trends and patterns in the data that are relevant to the issue they are investigating.
On the other hand, it serves as a means of inducting students into an important
disciplinary norm—namely, that the appropriateness of the statistics used when
PRINCIPLES OF INSTRUCTIONAL DESIGN
391
conducting an analysis has to be justified with respect to the question being
addressed.
Focusing on Significant Statistical Ideas
Returning to the previously stated goal of ensuring that classroom discussions
focus on significant statistical ideas, it is helpful if we outline the approach the
teacher took when planning for the whole-class discussions. In the latter part of the
design experiment, we organized instructional activities so that the students
conducted their analyses and wrote their reports in one classroom session, and then
the teacher conducted the whole-class discussion with them in the following
classroom session. The teacher found this arrangement productive because she could
review the students’ reports prior to the whole-class discussion to gain a sense of the
various ways in which students had reasoned about the data. This in turn enabled her
to develop conjectures about statistically significant issues that might emerge as
topics of conversation. Her intent in planning for discussions in this way was to
capitalize on the students’ reasoning by identifying data analyses that, when
compared and contrasted, might give rise to substantive statistical conversations
(McClain, 2002). In the case of the AIDS treatment data, for example, the teacher
selected a sequence of four analyses for discussion, so that the issues of reasoning
proportionally about data and of interpreting data organized into four equal groups
might come to the fore.
Our purpose in describing this planning process is to emphasize that although
the design of instructional activities and tools is important, the expertise of a
knowledgeable teacher in guiding productive discussions by capitalizing on
students’ reasoning is also critical. Earlier in this chapter, we noted that the
challenge of transcending what Dewey (1981) termed the dichotomy between
process and content is especially pressing in the case of statistical data analysis,
given that an investigative orientation is integral to the discipline. Thus, in contrast
to attempts to make curricula teacher-proof, our final design principle attributes a
central role to the teacher. We in fact find it useful to view the teacher as a designer
who is responsible for organizing substantive classroom discussions that can serve
as primary means of supporting students’ induction into the values, beliefs, and
ways of knowing of the discipline. The final design principle is therefore that our
task in developing instructional activities and tools is to take account of the
mediating role of the teacher rather than to view ourselves as supporting the
students’ statistical learning directly. The challenge is then to make it possible for
the teacher to organize productive learning experiences for students by capitalizing
on the diverse ways in which they use tools to complete specific instructional
activities.
392
PAUL COBB AND KAY MCCLAIN
DISCUSSION
In this chapter, we have framed a classroom design experiment as a paradigm
case in which to propose a number of design principles for supporting the
development of students’ statistical reasoning. These principles involve formulating
and testing conjectures about:
1. Central statistical ideas, such as distribution, that can serve to orient the
development of an instructional design
2. The characteristics of instructional activities that
a) Make it possible for students’ classroom activity to be imbued with the
investigative spirit of data analysis
b) Enable teachers to achieve their instructional agendas by building on the
range of data-based arguments that students produce
3. Classroom activity structures that support the development of students’
reasoning about data generation as well as data analysis
4. The characteristics of data analysis tools that
a) Fit with students’ reasoning when they are first introduced in an
instructional sequence
b) Serve as a primary means of supporting students’ development of
increasingly sophisticated forms of statistical reasoning
5. The characteristics of classroom discourse in which
a) Statistical arguments explain why the way in which the data have been
organized gives rise to insights into the phenomenon under investigation
b) Students engage in sustained exchanges that focus on significant
statistical ideas
Because we have discussed the principles in separate sections of the chapter,
they might appear to be somewhat independent. We therefore need to stress that
they are in fact highly interrelated. For example, the instructional activities as they
were actually realized in the classroom depended on:
•
•
•
•
The overall goal for doing statistics (i.e., to identify patterns in data that are
relevant to the question or issue at hand)
The structure of classroom activities (e.g., talking through the data
generation process)
The computer tools that the students used to conduct their analyses
The nature of the of the classroom discourse (e.g., engaging in discussion in
which significant statistical issues emerge as topics of conversation)
It is relatively easy to imagine how the instructional activities might have been
realized very differently in a classroom where the overall goal is to apply prescribed
methods to data, or where there are no whole-class discussions and the teacher
simply grades students’ analyses.
PRINCIPLES OF INSTRUCTIONAL DESIGN
393
Given the interdependencies, it is reasonable to view the various principles we
have discussed as serving to orient the design of productive classroom activity
systems. The intent of instructional design from this perspective is to provide
teachers with the resources necessary to guide the development of their classrooms
as activity systems in which students develop significant statistical ideas as they
participate in them and contribute to their evolution. They are, in short, systems
designed to produce the learning of significant statistical ideas.
The comprehensive nature of a classroom activity system indicates that the
approach we take to instructional design extends far beyond the traditional focus on
curriculum while simultaneously acknowledging the vital, mediating role of the
teacher. Because this perspective might seem unorthodox, we close by illustrating
that it is in fact highly consistent with current research in the learning sciences.
Bransford, Brown, and Cocking (2000) synthesize this research in the highly
influential book, How People Learn, and propose a framework that consists of four
overlapping lenses for examining learning environments. The first of these lenses
focuses on the extent to which learning environments are knowledge centered in the
sense of being based on a careful analysis of what we want people to know and be
able to do as a result of instruction. In this regard, we discussed the importance of
organizing instruction around overarching statistical ideas such as distribution, of
ensuring that classroom discussions focus on significant statistical ideas, and of
designing tools as a means of supporting the development of students’ statistical
reasoning. The second lens is learner centered and examines the extent to which a
learning environment builds on the strengths, interests, and preconceptions of
learners. We illustrated this focus when we discussed (a) the initial data generation
discussions and the importance of cultivating students’ interests in the issue under
investigation, (b) the approach of designing tools that fit with students’ current
statistical reasoning, and (c) the process of planning whole-class discussions by
building on students’ analyses.
The third lens of the How People Learn Framework is assessment centered and
examines the extent to which students’ thinking is made visible, so that teachers can
adjust instruction to their students’ reasoning and students have multiple
opportunities to test and revise their ideas. This lens was evident when we discussed
the value of whole-class discussions in which students shared their analyses and
received feedback, and when we indicated how the reports the students wrote
enabled the teacher to assess their statistical reasoning. The final lens in the
Framework is community centered and examines the extent to which the classroom
is an environment in which students not only feel safe to ask questions but also can
learn to work collaboratively. Our discussion of the AIDS and batteries instructional
activities served to illustrate these general features of the classroom, and we also
stressed the importance of the discipline specific norm of what counts as an
acceptable data-based argument.
The broad compatibility between the instructional design principles we have
proposed for elementary statistics instruction and the How People Learn Framework
gives the principles some credibility. In addition, the grounding of the Framework in
an extensive, multidisciplinary research base adds weight to our claim that it is
productive for our purposes as statistics educators to view classrooms as activity
394
PAUL COBB AND KAY MCCLAIN
systems that are designed to support students’ learning of significant statistical
ideas. As a result, although the set of principles that we have proposed might appear
unduly wide ranging, we contend that approaches considering only the design of
instructional activities and computer tools are in fact overly narrow.
NOTES
1
2
The second author served as the teacher in both this and the prior design experiment that
focused on the analysis of univariate data and was assisted by the first author.
This notion of an implicit third dimension in bivariate data was first brought to our
attention by Patrick Thompson (personal communication, August 1998).
REFERENCES
Biehler, R. (1993). Software tools and mathematics education: The case of statistics. In C. Keitel & K.
Ruthven (Eds.), Learning from computers: Mathematics education and technology (pp. 68–100).
Berlin: Springer.
Biehler, R., & Steinbring, H. (1991). Entdeckende Statistik, Strenget-und-Blatter, Boxplots: Konzepte,
Begrundungen und Enfahrungen eines Unterrichtsversuches [Explorations in statistics, stem-and-leaf,
boxplots: Concepts, justifications, and experience in a teaching experiment]. Der
Mathematikunterricht, 37(6), 5–32.
Bransford, J., Brown, A. L., & Cocking, R. R. (Eds.) (2000). How people learn: Brain, mind, experience,
and school. Washington, DC: National Academy Press.
Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and teaching. American Mathematical
Monthly, 104, 801–823.
Cobb, P. (1999). Individual and collective mathematical learning: The case of statistical data analysis.
Mathematical Thinking and Learning, 1, 5–44.
Cobb, P., McClain, K., & Gravemeijer, K. P. E. (2003). Learning about statistical covariation. Cognition
and Instruction, 21, 1–78.
Cobb, P., & Tzou, C. (2000). Learning about data creation. Paper presented at the annual meeting of the
American Educational Research Association, New Orleans.
de Lange, J., van Reeuwijk, M., Burrill, G., & Romberg, T. (1993). Learning and testing mathematics in
context. The case: Data visualization. Madison: University of Wisconsin, National Center for
Research in Mathematical Sciences Education.
Dewey, J. (1981). Experience and nature. In J. A. Boydston (Ed.), John Dewey: The later works, 1925–
1953 (Vol. 1). Carbondale: Southern Illinois University Press.
Dorfler, W. (1993). Computer use and views of the mind. In C. Keitel & K. Ruthven (Eds.), Learning
from computers: Mathematics education and technology (pp. 159–186). Berlin: Springer-Verlag.
Gravemeijer, K. E. P. (1994) Developing realistic mathematics education. Utrecht, The Netherlands: CDB Press.
Hancock, C., Kaput, J. J., & Goldsmith, L. T. (1992). Authentic inquiry with data: Critical barriers to
classroom implementation. Educational Psychologist, 27, 337–364.
Kaput, J. J. (1991). Notations and representations as mediators of constructive processes. In E. von
Glasersfeld (Ed.), Constructivism in mathematics education (pp. 53–74). Dordrecht, The Netherlands:
Kluwer.
Konold, C., & Higgins, T. (in press). Working with Data. In S. J. Russell & D. Schifter & V. Bastable
(Eds.), Developing Mathematical Ideas: Collecting, Representing, and Analyzing Data. Parsippany,
NJ: Seymour.
PRINCIPLES OF INSTRUCTIONAL DESIGN
395
Konold, C., Pollatsek, A., Well, A., & Gagnon, A. (1996, July). Students’ analyzing data: Research of
critical barriers. Paper presented the Roundtable Conference of the International Association for
Statistics Education, Granada, Spain.
Latour, B. (1987). Science in action. Cambridge, MA: Harvard University Press.
Lehrer, R., & Romberg, T. (1996). Exploring children’s data modeling. Cognition and Instruction, 14,
69–108.
McClain, K. (2002). Teacher’s and students’ understanding: The role of tools and inscriptions in
supporting effective communication. Journal of the Learning Sciences, 11, 216–241.
McClain, K., Cobb, P., & Gravemeijer, K. (2000). Supporting students’ ways of reasoning about data. In
M. Burke (Ed.), Learning mathematics for a new century (2001 Yearbook of the National Council of
Teachers of Mathematics, pp. 174–187). Reston, VA: National Council of Teachers of Mathematics.
McGatha, M. (2000). Instructional design in the context of classroom-based research: Documenting the
learning of a research team as it engaged in a mathematics design experiment. Unpublished
dissertation, Vanderbilt University, Nashville, TN.
McGatha, M., Cobb, P., & McClain K. (1999, April). An analysis of student’s initial statistical
understandings. Paper presented at the annual meeting of the American Educational Research
Association, Montreal.
Meira, L. (1998). Making sense of instructional devices: The emergence of transparency in mathematical
activity. Journal for Research in Mathematics Education, 29, 121–142.
Noss, R., Pozzi, S., & Hoyles, C. (1999). Touching epistemologies: Statistics in practice. Educational
Studies in Mathematics, 40, 25–51.
Pea, R. D. (1993). Practices of distributed intelligence and designs for education. In G. Salomon (Ed.),
Distributed cognitions (pp. 47–87). New York: Cambridge University Press.
Roth, W. M. (1997). Where is the context in contextual word problems? Mathematical practices and
products in grade 8 students’ answers to story problems. Cognition and Instruction, 14, 487–527.
Saldanha, L. A., & Thompson, P. W. (2001). Students’ reasoning about sampling distributions and
statistical inference. In R. Speiser & C. Maher (Eds.), Proceedings of the Twenty Third Annual
Meeting of the North American Chapter of the International Group for the Psychology of
Mathematics Education (vol. 1, pp. 449–454), Snowbird, Utah. ERIC Clearinghouse for Science,
Mathematics, and Environmental Education, Columbus, OH.
Shaughnessey, J. M., Garfield, J., & Greer, B. (1996). Data handling. In A. J. Bishop, K. Clements, C.
Keitel, J. Kilpatrick, & C. Laborde (Eds.), International handbook of mathematics education (part 1,
pp. 205–237). Dordrecht, The Netherlands: Kluwer.
Wiggins, G., & McTighe, J. (1998). Understanding by design. Alexandria, VA: Association for
Curriculum and Supervision.
Wilensky, U. (1997). What is normal anyway? Therapy for epistemological anxiety. Educational Studies
in Mathematics, 33, 171–202.
Chapter 17
RESEARCH ON STATISTICAL LITERACY,
REASONING, AND THINKING: ISSUES,
CHALLENGES, AND IMPLICATIONS
Joan Garfield1 and Dani Ben-Zvi2
University of Minnesota, USA1, and University of Haifa, Israel2
INTRODUCTION
The collection of studies in this book represents cutting-edge research on statistical
literacy, reasoning, and thinking in the emerging area of statistics education. This
chapter describes some of the main issues and challenges, as well as implications for
teaching and assessing students, raised by these studies. Because statistics education
is a new field, taking on its own place in educational research, this chapter begins
with some comments on statistics education as an emerging research area, and then
concentrates on various issues related to research on statistical literacy, reasoning,
and thinking. Some of the topics discussed are the need to focus research,
instruction, and assessment on the big ideas of statistics; the role of technology in
developing statistical reasoning; addressing the diversity of learners (e.g., students at
different educational levels as well as their teachers); and research methodologies
for studying statistical reasoning. Finally, we consider implications for teaching and
assessing students and suggest future research directions.
STATISTICS EDUCATION AS AN EMERGING RESEARCH AREA
Statistics and statistics education are relatively new disciplines. Statistics has
only recently been introduced into school curricula (e.g., NCTM, 2000) and is a new
academic major at the college level (Bryce, 2002). In the United States, the NCTM
standards (2000) recommend that instructional programs from pre-kindergarten
through grade 12 focus more on statistical reasoning. The goals of their suggested
statistics curriculum include
•
Enable all students to formulate questions that can be addressed with data
and collect, organize, and display relevant data to answer them.
397
D. Ben-Zvi and J. Garfield (eds.),
The Challenge of Developing Statistical Literacy, Reasoning and Thinking, 397–409.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
398
JOAN GARFIELD AND DANI BEN-ZVI
•
•
•
Select and use appropriate statistical methods to analyze data.
Develop and evaluate inferences and predictions that are based on data.
Understand and apply basic concepts of probability.
At the university level, statistics is taught at undergraduate as well as graduate
levels across many disciplines. The students taking statistics at these levels may be
preparing to be future “users” or “producers” of statistics in different fields of
application (e.g., sciences, technology, industry, and medicine), or future
statisticians or statistics teachers. Over the last 20 years there has been a steady
increase in the numbers of statistics courses taught, to fulfill the growing demand for
students and professionals who can use and understand statistical information.
Although the amount of statistics instruction at all levels is growing at a fast
pace, the research to support statistics instruction is proceeding at a much slower
rate. The research literature in statistics education is not well known; therefore, it is
not often valued or utilized by statisticians, schools, or the immense number of other
fields that use statistics (Joliffe, 1998). In fact, researchers in this area argue that the
field still needs to define what research in statistics education is—not only to
achieve academic recognition, but to convince others of its validity as a research
discipline (Batanero, Garfield, Ottaviani, & Truran, 2000).
Unlike other research areas, the research studies on teaching and learning
statistics have been conducted in, and influenced by, several different disciplines,
each with its own perspectives, literatures, methodology, and research questions. For
example, much of the early research was conducted by psychologists, often focusing
on conceptions of chance and randomness (e.g., Piaget & Inhelder, 1975; Fischbein,
1975; and Kahneman, Slovic, & Tversky, 1982). Psychologists’ dominant effort was
to identify, through observations or paper and pencil tests, ways in which people
make judgments of chance. Many researchers (for example, Kahneman et al., 1982;
Konold, 1989) identified widespread errors in reasoning, finding that people tend to
use nonstatistical heuristics to make judgments or decisions regarding chance
events. By the end of the 1980s, there was strong evidence that many adults are
unable to deal competently with a range of questions that require probabilistic
thinking.
In the 1980s and 1990s, many researchers in mathematics education, motivated
by the inclusion of statistics and probability in the elementary and secondary
mathematics curriculum, began to explore students’ understanding of ideas related
to statistics and data analysis (e.g., Russell & Mokros, 1996; Mokros & Russell,
1995; Rubin, Bruce, & Tenney, 1991; Shaughnessy, 1992). These researchers found
the mathematics education theoretical frameworks and methodologies relevant for
research in statistics education (see, for example, Kelly & Lesh, 2000). During this
same time period, several educational psychologists explored students’ attitudes and
anxiety about statistics in an attempt to predict success in statistics courses (e.g.,
Wisenbaker & Scott, 1997; Schau & Mattern, 1997), while cognitive psychologists
examined ways to help students and adults correctly use statistical reasoning (e.g.,
Fong, Krantz, & Nisbett, 1986; Nisbett, 1993; Sedlmeier, 1999). A more recent
group of researchers is emerging from the growing number of statisticians who are
focusing their scholarship on educational issues (e.g., Chance, 2002; Lee, Zeleke, &
Wachtel, 2002; Wild, Triggs, & Pfannkuch, 1997). Some of these researchers
ISSUES, CHALLENGES, AND IMPLICATIONS
399
looked at particular classroom interventions and their impact on learning outcomes
or developed models for teaching and for experts’ statistical thinking.
What has been missing in the past few decades is a coordination of the research
across the different disciplines described earlier, and a convergence of methods and
important research questions. Without this coherence, it is hard to move the field
forward and to build on the results by linking research to teaching. One example of
an effort to coordinate the research across the different disciplines is the book edited
by Lajoie (1998), which addresses issues of statistical content, learner needs,
instructional methods, and assessment goals. It was the outcome of the coordinated
work of statisticians, mathematics educators, and psychologists, who focused on
formulating a research agenda for K–12 statistics education.
The International Research Forums on Statistical Reasoning, Thinking, and
Literacy (SRTL-1, Israel; SRTL-2, Australia; and SRTL-3, USA) have been another
important effort to achieve this goal, by bringing together an international group of
researchers from across these disciplines to share their findings, discuss their
methods, and generate important issues and research questions. Another goal of
these research forums has been to make explicit connections to teaching practice,
something that researchers are often criticized for failing to address.
RESEARCH ON STATISTICAL LITERACY, REASONING, AND THINKING
Although statistics is now viewed as a unique discipline, statistical content is
most often taught in the mathematics curriculum (K–12) and in departments of
mathematics (tertiary level). This has led to exhortations by leading statisticians,
such as Moore (1998), about the differences between statistics and mathematics (see
Chapter 4). These arguments challenge statisticians and statistics educators to
carefully define the unique characteristics of statistics, and in particular, the
distinctions between statistical literacy, reasoning, and thinking. We provided
summaries of these arguments and related research in the early chapters of this book
(see Chapters 1 through 4).
The authors of chapters in this book represent the growing network of
researchers from SRTL-1 and SRTL-2 who are interested in statistical literacy,
reasoning, and thinking, and who have been trained in the different disciplines (e.g.,
mathematics education, cognitive and educational psychology, and statistics). Many
of the chapters describe collaborative studies, some including researchers from
different disciplines (e.g., Chapters 2, 13, and 14). It may seem strange, given the
quantitative nature of statistics, that most of the studies in this book include analyses
of qualitative data, particularly videotaped observations or interviews. We have
found that sharing these videos and their associated transcripts allows us to better
present and discuss the important aspects of our work, as well as to solicit useful
feedback from colleagues. Further discussion of methodological issues is provided
later in this chapter.
The topics of the research studies presented in this book reflect the shift in
emphasis in statistics instruction, from statistical techniques, formulas, and
procedures to developing statistical reasoning and thinking. The chapters on
individual aspects of reasoning focus on some core ideas of statistics, often referred
400
JOAN GARFIELD AND DANI BEN-ZVI
to as the “big ideas.” Increasing attention is being paid in the educational research
community to the need to clearly define and focus both research and instruction, and
therefore, assessment, on the big ideas of a discipline (Bransford, Brown, &
Cocking, 2000; Wiggins, 1998). We offer a list and description of the big ideas of
statistics in the following section.
FOCUSING ON THE BIG IDEAS OF STATISTICS
The topics of the chapters in this book (e.g., data, distribution, averages, etc.)
focus on some of the big ideas in statistics that students encounter in their
educational experiences in elementary, secondary, or tertiary classes. Although
many statistics educators and researchers today agree that there should be a greater
focus on the big ideas of statistics, little has been written about what these ideas are.
Friel (in press) offers a list similar to the one we provide here:
•
•
•
•
•
•
•
•
Data—the need for data; how data represent characteristics or values in the
real world; how data are obtained; different types of data, such as numbers,
words, and so forth.
Distribution—a representation of quantitative data that can be examined and
described in terms of shape, center, and spread, as well as unique features
such as gaps, clusters, outliers, and so on.
Trend—a signal or pattern we are interested in. It could be a mean for one
group, the difference of means for comparing two groups, a straight line for
bivariate data, or a pattern over time for time-series data.
Variability—the variation or noise around a signal for a data set, such as
measurement error. Variability may also be of interest in that it helps
describe and explain a data set, reflecting natural variation in measurements
such as head sizes of adult men.
Models—an ideal that is sometimes useful in understanding, explaining, or
making predictions from data. A model is useful if it “fits” the data well.
Some examples of models are the normal curve, a straight line, or a binomial
random variable with probability of 0.5.
Association—a particular kind of relationship between two variables;
information on one variable helps us understand, explain, or predict values of
the other variable. Association may be observed between quantitative or
categorical variables. This also includes being able to distinguish correlation
from causality.
Samples and sampling—the process of taking samples and comparing
samples to a larger group. The sampling process is important in obtaining a
representative sample. Samples are also used to generate theory, such as
simulating sampling distributions to illustrate the Central Limit Theorem.
Inference—ways of estimating and drawing conclusions about larger groups
based on samples. Utts (2003) elaborates that this includes being able to
differentiate between practical and statistical significance as well as knowing
ISSUES, CHALLENGES, AND IMPLICATIONS
401
the difference between finding “no effect” versus finding “no significant
effect.”
When we examine much of statistics instruction, it is not always clear how these
big ideas are supposed to be presented and developed. In most statistics classroom
instruction, the emphasis is on individual concepts and skills, and the big ideas are
obscured by the focus on procedures and computations. After one topic has been
studied, there is little mention of it again, and students fail to see how the big ideas
are actually providing a foundation for course content and that they underlie
statistical reasoning. For example, students may focus on how to compute different
measures of center or variability without fully understanding the ideas of center and
spread and their relationships to other big ideas, such as data and distribution. Later
in their studies, students may fail to connect the idea of center and spread of
sampling distributions with the ideas of center and spread in descriptive statistics.
Or, when studying association, students may lose track of how center and spread of
each variable are embedded in looking at bivariate relationships.
Major challenges that teachers face include not only finding ways to go beyond
the individual concepts and skills, but leading students to develop an understanding
of the big ideas and the interrelations among them. Such an approach will enable
teachers to make the big ideas explicit and visible, throughout the curriculum. For
example, Cobb (1999) suggests that focusing on distribution as a multifaceted end
goal of instruction in seventh grade might bring more coherence in the middle
school statistics curriculum and empower students’ statistical reasoning. Bakker and
Gravemeijer (Chapter 7) propose to focus instruction on the informal aspects of
shape. Other illustrations of the need to focus on the big ideas of statistics and how
to do it can be found in various chapters of this book: data (Chapter 6), center
(Chapter 8), variability (Chapter 9), covariation (Chapter 10), and sampling
(Chapters 12 and 13). It has been suggested that the use of technology-assisted
learning environments can support—in many ways—students’ construction of
meanings for the big ideas of statistics (e.g., Garfield & Burrill, 1997).
THE ROLE OF TECHNOLOGY IN DEVELOPING STATISTICAL REASONING
Many of the chapters in this book mention the use of technology in developing
statistical reasoning. This is not surprising, given how the discipline of statistics has
depended on technology and how technology has been driving change in the field of
statistics. Although there are many technological tools available, including graphing
calculators, computers, and the World Wide Web, there is still a lack of research on
how to best use these tools and how they affect student learning.
The interaction of technology with efforts to redefine both content and
instruction in statistics in the K–12 curriculum provides a variety of strategies for
teaching statistics and, at the same time, offers new ways of doing statistics
(Garfield & Burrill, 1997). Today, computers, software, and the Internet are
essential tools for instruction in statistics (Friel, in press).
Ben-Zvi (2000) describes how technological tools may be used to help students
actively construct knowledge, by “doing” and “seeing” statistics, as well as to give
402
JOAN GARFIELD AND DANI BEN-ZVI
students opportunities to reflect on observed phenomena. He views computers as
cognitive tools that help transcend the limitations of the human mind. Therefore,
technology is not just an amplifier of students’ statistical power, but rather a
reorganizer of students’ physical and mental work. The following types of software,
which are described in this book, are good examples of such tools:
•
•
•
•
Commercial statistical packages for analyzing data and constructing visual
representations of data such as spreadsheets (Excel©, Chapter 6), or data
analysis programs (Statgraphics©, Chapter 11) that offer a variety of
simultaneous representations that are easily manipulated and modified, as
well as simulation of different distributions.
Educational data analysis tools (Fathom©, Chapter 15) are intended to help
students develop an understanding of data and data exploration. They
support in-depth inquiry in statistics and data analysis through powerful
statistical and plotting capabilities that give the user greater overall control in
structuring and representing data (Friel, in press). Fathom also allows
plotting functions, creating animated simulations, and has a “dragging”
facility that dynamically updates data representations. This helps reveal the
invariant phenomenon and the relationships among representations.
Web- or computer-based applets were developed to demonstrate and
visualize statistical concepts. Applets are typically small, web-based
computer programs that visually illustrate a statistical concept by letting the
user manipulate and change various parameters. The Minitools (Chapters 7
and 16), a particular type of applet, were designed to support an explicit
“learning trajectory” to develop an understanding of a particular graph and
its link to the data on which it is based.
Stand-alone simulation software, such as Sampling SIM (Chapter 13), which
was developed to provide a simulation of sampling distributions, with many
capabilities allowing students to see the connections between individual
samples, distributions of sample means, confidence intervals, and p-values.
The last three tools on this list (Fathom, Minitools, and Sampling SIM) were
designed based on ideas about what students need to see and do in order to develop
a conceptual understanding of abstract statistical concepts as well as develop the
kinds of attitudes and reasoning required for analyzing data. Although these three
tools were developed to improve student learning, Bakker (2002) distinguished
between route-type software––small applets and applications, such as the Minitools
that fit in a particular learning trajectory; and landscape-type software––larger
applications, such as Fathom and TinkerPlots, that provide an open landscape in
which teachers and students may freely explore data.
The increasing use of Internet and computer-mediated communication (CMC) in
education has also influenced statistics education. Although not the focus of
chapters in this book, there are numerous Internet uses in statistics classes that
support the development of students’ statistical reasoning. For example, data sources
in downloadable formats are available on the Web to support active learning of
exploratory data analysis. They are electronically available from data-set archives,
government and official agencies, textbook data, etc. An additional example is the
ISSUES, CHALLENGES, AND IMPLICATIONS
403
use of CMC tools, such as online forums, e-mail, and so forth to create electronic
communities that support students’ learning in face-to-face or distance learning.
It is important to note that despite its role in helping students learn and do
statistics, technology is not available in all parts of the world, and not even in all
classrooms in the more affluent countries. The research studies in this book address
different instructional settings with and without the use of technology, as well as
diverse types of students who are learning statistics at all levels.
DIVERSITY OF STUDENTS AND TEACHERS
With the growing emphasis on statistical literacy, reasoning, and thinking,
statistics education research must address the diversity of students in statistics
courses by considering issues of continuity (when to teach what), pedagogy (how to
approach the content and develop desired learning outcomes), priority (prioritizing
and sequencing of topics), and diversity (students’ educational preparation and
background, grade and level). For example, little attention has been given to the
issue of when and how a new statistical idea or concept can be presented to students,
or to the question of sequencing statistical ideas and concepts along the educational
life span of a student.
The individual research studies in this book partially address such issues, but as
a group reflect the diversity of students (and teachers) who learn and know statistics.
The widest “student” population is addressed in research about statistical literacy,
which includes school students through adults. Gal (Chapter 3) underscores the
importance of statistical literacy education for all present and future citizens to
enable them to function effectively in an information-laden society. The goal of
statistical literacy research is to identify the components of literacy, to find ways to
equip all citizens with basic literacy skills—such as being able to critically read the
newspaper or evaluate media reports.
The students observed by Ben-Zvi (Chapter 6) as well as Bakker and
Gravemeijer (Chapter 7) were high-ability students. The forms of reasoning
exhibited by some of these students are to some extent unique to the specific settings
and circumstances. However, these studies describe some important teaching and
learning issues and how the reasoning might develop in other types of students.
They also suggest meaningful and engaging activities such as making predictions
graphs without having data and using software tools that support specific statistical
ways of reasoning. The instructional suggestions in some chapters require
establishing certain socio-mathematical (statistical) norms and practices (Cobb &
McClain, Chapter 16), use of suitable computer tools, carefully planned instructional
activities, and skills of the teacher to orchestrate class discussions.
Ben-Zvi (Chapter 6) and Mickelson and Heaton (Chapter 14) describe the
teachers in their studies as above average in pedagogical and statistical knowledge
and skills. It is likely that the role of “average” elementary and middle school
teachers, normally not trained in statistics instruction, would be quite different.
Teachers need careful guidance to teach such a new and complex subject. Hence,
more studies are needed that explore how to equip school teachers at all levels with
404
JOAN GARFIELD AND DANI BEN-ZVI
appropriate content knowledge and pedagogical knowledge, and to determine what
kind of guidance they need to successfully teach these topics.
RESEARCH METHODOLOGIES TO STUDY STATISTICAL REASONING
The chapters in this book reveal a variety of research methods used to study
statistical literacy, reasoning, and thinking. Ben-Zvi (Chapter 6), and Mickelson and
Heaton (Chapter 14) use a case study approach in natural classroom settings to study
one or two cases in great detail. Batanero, Tauber, and Sánchez (Chapter 11) use a
semiotic approach to analyze students’ responses to open -ended questions on an
exam. Chance, delMas, and Garfield (Chapter 13) use collaborative classroom
research to develop software and build a model of statistical reasoning. Their
research is implemented in their own classes and with their students, using an
iterative cycle to study the impact of an activity on students’ reasoning as they
develop their model. Their method of classroom research is similar to the classroom
teaching experiment used in the studies by Bakker and Gravemeijer (Chapter 7) and
Cobb and McClain (Chapter 16), who refer to this method as design experiment
(described by Lesh, 2002). Watson (Chapter 12) uses a longitudinal approach to
study children’s development of reasoning about samples.
As mentioned earlier, videotaped classroom observations and teacher or student
interviews were included in most studies as a way to gather qualitative data. We
have found in analyzing these videos, that observing students' verbal actions as well
as their physical gestures helps us better understand students' reasoning and the
socio-cultural processes of learning. Other sources of qualitative data were students’
responses to open-ended questions, field notes of teachers and researchers, and
samples of students’ work (e.g., graphs constructed, statistics projects).
Makar and Confrey (Chapter 15) combine qualitative data with quantitative data
on teachers’ statistical reasoning. Pre- and posttests of statistical content knowledge
provided the main source of quantitative data for their study, while videotaped
interviews were transcribed and then analyzed using grounded theory (Strauss &
Corbin, 1998). A few other studies also include some quantitative data in the context
of student assessment, for example, Reading and Shaughnessy (Chapter 9), Moritz
(Chapter 10), Batanero et al. (Chapter 11), and Watson (Chapter 12).
It may seem surprising that few statistical summaries are actually included in
these studies, given that the subject being studied by students or teachers is statistics.
And it may seem surprising that the research studies in this book are not traditional
designed experiments, involving control groups compared to groups that have
received experimental treatment, the gold standard of experimental design.
However, statistics education tends to follow the tradition of mathematics and
science education, in using mostly qualitative methods to develop an understanding
of the nature of students’ thinking and reasoning, and to explore how these develop
(see Kelly & Lesh, 2000). Perhaps after more of this baseline information is
gathered and analyzed, the field will later include some small, experimental studies
that allow for comparisons of particular activities, instructional methods, curricular
trajectories, types of technological tools, or assessments.
ISSUES, CHALLENGES, AND IMPLICATIONS
405
Before we reach this stage of research, we need to further study the long-lasting
effects of instruction on students’ reasoning, and to continue the exploration of
models of conceptual change and development. These models will be based on
careful examination and analyses of how reasoning changes, either over an
extensive period of time (as in longitudinal studies) or during periods of significant
transition (as in some clinical interviews or classroom episodes).
IMPLICATIONS FOR TEACHING AND ASSESSING STUDENTS
In the three chapters that focus on the topics of statistical thinking (Chapter 2),
statistical literacy (Chapter 3), and statistical reasoning (Chapter 4), each author, or
pair of authors, recommends that instruction be designed to explicitly lead students
to develop these particular learning outcomes. For example, Pfannkuch and Wild
(Chapter 2) discuss the areas to emphasize for developing statistical thinking, Gal
(Chapter 3) describes the knowledge bases and dispositions needed for statistical
literacy, and delMas (Chapter 4) describes the kinds of experiences with data that
should lead to statistical reasoning.
One important goal of this book is provide suggestions for how teachers may
build on the research studies described to improve student learning of statistics.
Although most teachers do not typically read the research related to learning their
subject matter content, we encourage teachers of statistics at the elementary,
secondary, and tertiary level to refer to chapters in this book for a concise summary
of research on the different areas of reasoning. These studies provide ideas not only
about the types of difficulties students have when learning particular topics, so that
teachers may be aware of where errors and misconceptions might occur, but also
what to look for in their informal and formal assessments of students learning. In
addition, these studies provide valuable information regarding the type of statistical
reasoning that can be expected at different age levels. The models of cognitive
development in statistical reasoning documented in Chapter 5 enable teachers to
trace students’ individual and collect ive development in statistical reasoning during
instruction. Because the cognitive models offer a coherent picture of students’
statistical reasoning, they can provide a knowledge base for teachers in designing
and implementing instruction.
These research studies include details on the complexity of the different
statistical topics, explaining why they are so difficult for students to learn. As
several authors stress, it is important for teachers to move beyond a focus on skills
and computations, and the role of teacher as the one who delivers the content.
Instead, the role of teacher suggested by the authors of these chapters is one of
providing a carefully designed learning environment, appropriate technological
tools, and access to real and interesting data sets. The teacher should orchestrate
class work and discussion, establish socio-statistical norms (see Cobb and McClain,
Chapter 16) and provide timely and nondirective interventions by the teacher as
representative of the discipline in the classroom (e.g., Voigt, 1995). The teacher
should be aware not only of the complexities and difficulty of the concepts, but of
the desired learning goals—such as what good statistical literacy, reasoning, and
thinking look like—so that assessments can be examined and compared to these
406
JOAN GARFIELD AND DANI BEN-ZVI
goals. The teachers need to be comfortable with both the content and tools, and with
the process of data analysis.
The chapters in this book stress that students need to be exposed to the big ideas
and their associated reasoning in a variety of settings, through a course or over
several years of instruction. The authors make many suggestions about how
technology can be used to help students develop their reasoning, and suggest that
students be prodded to explain what they see and learn when using these tools as a
way to develop their reasoning.
Many of the authors present some types of learning activities and data sets that
teachers can use in their classes at different school levels. They suggest that
regardless of the activity used, teachers can find ways to observe their students
carefully to see how their reasoning is affected by the activities. Teachers should
also avoid assuming that students have learned the material merely because they
have completed an activity on that topic. Finally, teachers are encouraged to apply
the research tools on their classes, and to use the information gathered to continually
revise and improve their activities, materials, and methods. We believe that it is
better to learn a few concepts in depth, rather than trying to cover every topic. If this
can be done in a systematic way, then more topics might be covered over a span of
grades, rather than in one single grade level.
We agree with the many educators who have called for classroom instruction to
be aligned with appropriate methods of assessment, which are used as a way to
make reasoning visible to teachers as well as to students. Assessment should be used
for formative as well as summative purposes, and it should be aligned with learning
goals. In most cases, a type of performance assessment seems to best capture the full
extent of students’ statistical reasoning and thinking (Gal & Garfield, 1997; Garfield
& Gal, 1999).
We suggest that it is often helpful to start by considering the types of assessment
that are appropriate to measure the desired learning outcomes, and to work
backward, thinking about instruction and activities that will lead to these goals (see
Wiggins, 1998). Then assessment data gathered from students can be used to
evaluate the extent to which these important learning goals (e.g., developing
statistical reasoning) have been achieved.
FUTURE DIRECTIONS AND CHALLENGES
Given the importance of the learning outcomes described in this book, statistical
literacy, reasoning and thinking, it is crucial that people working in this area use the
same language and definitions when discussing these terms. Similarly, some
standard goals for each outcome should be agreed upon and used in developing
educational materials and curricula, designing assessments, preparing teachers’
courses, and conducting future research.
Because the field of statistics education research is so new, there is a need for
more research in all of the areas represented in this book. Studies need to be
conducted in different educational settings, with different-aged students worldwide,
and involving different educational materials and technological tools. As we
continue to learn more about how different types of reasoning in statistics develop,
ISSUES, CHALLENGES, AND IMPLICATIONS
407
we need to continue to explore cognitive developmental models, seeing how these
apply to the different settings. There is also a need to validate these models, and to
investigate how they may be used to promote reasoning, thinking, and literacy
through carefully designed instruction.
There is a great need for assessment instruments and materials that may be used
to assess statistical literacy, reasoning, and thinking. A set of accessible, highquality instruments could be used in future evaluation and research projects to allow
more comparison of students who study with different curricula or in different
educational settings.
SUMMARY
This book focuses on one aspect of the “infancy” of the field of statistics
education research, by attempting to grapple with the definitions, distinctions, and
development of statistical literacy, reasoning, and thinking. As this field grows, the
research studies in this volume should help provide a strong foundation as well as a
common research literature. This is an exciting time, given the newness of the
research area and the energy and enthusiasm of the contributing researchers and
educators who are helping to shape the discipline as well as the future teaching and
learning of statistics. We point out that there is room for more participants to help
define and construct the research agenda and contribute to results. We hope to see
many new faces at future gatherings of the international research community,
whether at SRTL-4, or 5, or other venues such as the International Conference on
Teaching Statistics (ICOTS), International Congress on Mathematical Education
(ICME), and the International Group for the Psychology of Mathematics Education
(PME).
REFERENCES
Bakker, A. (2002). Route-type and landscape-type software for learning statistical data analysis. In B.
Phillips (Chief Ed.), Developing a Statistically Literate Society: Proceedings of the Sixth
International Conference on Teaching Statistics, Voorburg, The Netherlands (CD-ROM).
Batanero, C., Garfield, J., Ottaviani, M. G., and Truran, J. (2000). Research in statistical education: Some
priority questions. Statistical Education Research Newsletter, 1(2), 2–6.
Ben-Zvi, D. (2000). Toward understanding the role of technological tools in statistical learning.
Mathematical Thinking and Learning, 2(1&2), 127–155.
Bransford, J., Brown, A. L., & Cocking, R. R. (Eds.). (2000). How people learn: Brain, mind, experience,
and school. Washington, DC: National Academy Press.
Bryce, G. B. (2002). Undergraduate statistics education: An introduction and review of selected literature.
Retrieved
June
23,
2003
from
Journal
of
Statistics
Education,
10(2).
http://www.amstat.org/publications/jse/v10n2/bryce.html.
Chance, B. L. (2002). Components of statistical thinking and implications for instruction and assessment.
Retrieved
April
7,
2003,
from
Journal
of
Statistics
Education,
10(3).
http://www.amstat.org/publications/jse/.
Cobb, P. (1999). Individual and collective mathematical development: The case of statistical data
analysis. Mathematical Thinking and Learning, 1(1), 5–43.
408
JOAN GARFIELD AND DANI BEN-ZVI
Fischbein, E. (1975). The intuitive sources of probabilistic thinking in children. Dordrecht, The
Netherlands: D. Reidel.
Fong G. T., Krantz, D. H., & Nisbett, R. E. (1986). The effects of statistical training on thinking about
everyday problems. Cognitive Psychology, 18, 253–292.
Friel, S. N. (in press). The research frontier: Where technology interacts with the teaching and learning of
data analysis and statistics. In M. K. Heid & G. W. Blume (Eds.), Research on technology and the
teaching and learning of mathematics: Syntheses and perspectives, vol. 1. Greenwich, CT:
Information Age Publishing.
Gal, I., & Garfield, J. B. (Eds.). (1997). The assessment challenge in statistics education. Voorburg, The
Netherlands: International Statistical Institute.
Garfield, J. B., & Burrill, G. (Eds.). (1997). Research on the role of technology in teaching and learning
statistics. In Proceedings of the 1996 IASE Round Table Conference, Granada, Spain. Voorburg, The
Netherlands: International Statistical Institute.
Garfield, J., & Gal. I. (1999). Teaching and assessing statistical reasoning. In L. V. Stiff (Ed.),
Developing mathematical reasoning in grades K–12 (NCTM 1999 Yearbook), pp. 207–219. Reston,
VA: National Council of Teachers of Mathematics.
Joliffe, F. (1998). What is research in statistical education? In L. Pereira-Mendoza, L. Seu Kea, T. Wee
Kee, & W. K. Wong (Eds.), Proceedings of the Fifth International Conference on Teaching Statistics,
pp. 801–806. Singapore: International Statistical Institute.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and
biases. New York: Cambridge University Press.
Konold, C. (1989). Informal conceptions of probability. Cognition and Instruction, 6, 59–98.
Kelly, A. E., & Lesh, R. A. (Eds.). (2000). Handbook of research design in mathematics and science
education. Mahwah, NJ: Erlbaum.
Lajoie, S. P. (Ed.). (1998). Reflections on statistics: Learning, teaching, and assessment in grades K–12.
Mahwah, NJ: Erlbaum.
Lee, C., Zeleke, A., & Wachtel, H. (2002). Where do students get lost? The concept of variation. In B.
Phillips (Chief Ed.), Developing a Statistically Literate Society: Proceedings of the Sixth
International Conference on Teaching Statistics. Voorburg: The Netherlands (CD-ROM).
Lesh, R. (2002). Research design in mathematics education: Focusing on design experiments. In L.
English (Ed.), International Handbook of Research Design in Mathematics Education. Hillsdale, NJ:
Erlbaum.
Mokros, J., & Russell, S. J. (1995). Children’s concepts of a verage and representativeness. Journal for
Research in Mathematics Education, 26, 20–39.
Moore, D. (1998). Statistics among the liberal arts. Journal of the American Statistical Association, 93,
1253–1259.
National Council of Teachers of Mathematics (NCTM). (2000). Principles and standards for school
mathematics. Reston, VA: NCTM.
Nisbett, R. (1993). Rules for reasoning. Hillsdale, NJ: Erlbaum.
Piaget, J., & Inhelder, B. (1975). The origin of the idea of chance in children. London: Routledge &
Kegan Paul.
Rubin, A., Bruce, B., & Tenney, Y. (1991). Learning about sampling: Trouble at the core of statistics. In
D. Vere-Jones (Ed.), Proceedings of the Third International Conference on Teaching Statistics, vol. 1
(pp. 314–319). Voorburg, The Netherlands: International Statistical Institute.
Russell, S. J., & Mokros, J. (1996). What do children understand about average? Teaching Children
Mathematics, 2, 360–364.
Schau, C., & Mattern, N. (1997). Assessing students’ connected understanding of statistical relationships.
In I. Gal & J. B. Garfield (Eds.), The Assessment Challenge in Statistics Education (pp. 91–104).
Amsterdam, Netherlands: IOS Press.
Sedlmeier, P. (1999). Improving statistical reasoning: Theoretical models and practical implications.
Hillsdale, NJ: Erlbaum.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 465–494). New
York: Macmillan.
Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for
developing grounded theory. Thousand Oaks, CA: Sage.
Utts, J. (2003). What educated citizens should know about statistics and probability. The American
Statistician, 57(2), 74–79.
ISSUES, CHALLENGES, AND IMPLICATIONS
409
Voigt, J. (1995). Thematic patterns of interaction and socio-mathematical norms. In P. Cobb & H.
Bauersfeld (Eds.), Emergence of mathematical meaning: Interaction in classroom cultures (pp. 163–
201). Hillsdale, NJ: Erlbaum.
Wiggins, G. (1998). Understanding by design. Alexandria, VA: ASCD Press.
Wisenbaker, J., & Scott, J. (1997). Modeling aspects of students’ attitudes and achievement in
introductory statistics courses. Paper presented at AERA Annual Meeting, Chicago.
Wild, C., Triggs, C., & Pfannkuch, M. (1997). Assessment on a budget: Using traditional methods
imaginatively. In I. Gal & J. B. Garfield (Eds.), The Assessment Challenge in Statistics Education.
Amsterdam, Netherlands: IOS Press.
Author Index
Bakker, A., 12, 128, 153, 196, 353, 376,
380, 381, 382, 387, 388, 390, 401, 402,
403, 404
Ball, D. L., 328, 329, 350, 354
Ballenger, C., 194
Barabba, V., 41
Barbella, P., 123
Baron, J., 6, 70, 80, 81
Barron, B. J., 278, 296
Bartholomew, D., 30
Batanero, C., 13, 111, 234, 235, 258, 259,
398, 404
Beaton, A. E., 72
Begg, A., 36
Behrens, J. T., 296
Bell, A., 228, 232, 234, 235
Beninger, J. R., 231, 249
Ben-Zvi, D., 12, 34, 110, 123, 125, 126,
134, 140, 141, 142, 147, 232, 274, 353,
401, 403, 404
Berenson, S. B., 228, 232
Bereska, C., 122
Bethell, S. C., 233, 249
Bichler, E., 110, 170, 177, 178, 179
Bidell, T. R., 98
Biehler, R., 34, 36, 121, 123, 125, 149,
169, 170, 171, 174, 181, 187, 196, 275,
382, 383
Biggs, J. B., 98, 99, 100, 101, 102, 104,
108, 109, 206, 238, 279, 280, 309
Bolster, C. H., 122
A
ABC Research Group, 42
Abelson, R., 355, 356
Ahlgren, C., 36
Ainley, J. 132, 233
Alloy, L. B., 234
Amabile, T. M., 234, 299
American Association for the
Advancement of Science (AAAS) 1989,
170
American Association for the
Advancement of Science (AAAS) 1995,
47, 58
American Association for the
Advancement of Science (Project 2061),
5
Amit, M., 110
Anderson, C. A., 81, 299
Arcavi, A., 110, 140, 141, 147, 232, 274
Arnold, G., 35
Australian Education Council (1991), 5,
47, 71, 110, 111, 122, 229, 230, 277, 278,
283, 292
Australian Education Council (1994), 5,
97, 111, 122, 229
B
Bailar, B., 17
411
412
AUTHOR INDEX
Bolster, L. C., 122
Bowen, D., 49
Box, J. F., 26, 27
Boyce, S. J., 296, 314
Boyd, C., 122
Boyle, V., 122
Bransford, J., 393, 400
Brasell, H. M., 234, 249
Brekke, G., 232, 234, 235
Bright, G. W., 55, 60, 103, 110, 155, 181,
182, 196, 328, 331, 3335, 344, 356
Britz, G., 32, 42
Brousseau, G., 132
Brown, A. L., 393, 400
Bruce, B., 206, 207, 278, 398
Bryce, G. B., 397
Burke, M., 297
Burrill, G., 36, 123, 155, 204, 382, 401
Burrill, J., 204
Byrne, R. M. J., 80, 86, 88
C
Callingham, R., 35, 99, 100, 102, 108,
109, 205, 207
Campbell, K. J., 100, 109, 175
Carlson, M., 232
Carnevale, A. P., 48
Carpenter, T. P., 101, 112, 191, 194
Carter, M., 35
Case, R., 98, 99, 101, 309
Chance, B. L., 7, 13, 43, 84, 85, 92, 99,
111, 113, 259, 299, 300, 302, 356, 358,
369, 370, 371, 398, 404
Chater, N., 80
Chazen, D., 122, 233, 249
Chiang, C. P., 101, 112
Ciancetta, M., 205
Clark, S. B., 258
Clemen, R., 52, 61, 73
Clement, J., 232, 305
Cleveland, W. S., 128, 169, 195
Cline Cohen, P., 22
Cobb, G., 5, 30, 38, 58, 59, 61, 62, 62,
64, 72, 84, 86, 87, 88, 91, 92, 121, 169,
194, 196, 375, 376
Cobb, P., 14, 101, 111, 123, 140, 147,
148, 149, 152, 153, 161, 164, 167, 170,
178, 181, 194, 205, 234, 235, 350, 356,
376, 377, 380, 381, 384, 385, 400, 403,
404, 405
Cocking, R. R., 53, 393, 401
Coe, E., 232
Cognition and Technology Group at
Vanderbilt, 278
Cohen, D., 354
Cohen, I., 23, 24, 25
Cohen, S., 296
Collis, K., 35, 98, 99, 100, 101, 102, 104,
108, 109, 205, 206, 238, 279, 280, 309
Conference Board of Mathematical
Science, 329, 349, 350
Confrey, J., 13, 328, 354, 356, 357, 361,
369, 404
Cooke, D. J., 61
Corbin, J., 147, 152, 359, 404
Cortina, J., 180
Corwin, R. B., 279
Coulombe, W. N., 228, 232
Coulter, R., 169
Cousins, J. B., 111, 234, 235, 238, 248,
251
Cox, J. A., 60
Coyne, G., 296
Creswell, J.W., 329
Crismond, D., 309
Crocker, J., 234, 235
Cross, T. L., 258
Crossen, C., 58, 66, 74
Cummings, G., 296
Curcio, F. R., 60, 102, 103, 155, 234,
248, 250, 356
Curry, D., 63, 74
D
Dantal, B., 275
Davenport, E. C., 296
Davey, G., 99, 100, 108, 109
David, F., 21
Davis, P., 21, 138
de Lange, J., 122-123, 155
del Mas, R. C., 7, 11, 13, 85, 92, 99, 111,
113, 259, 296, 297, 299, 300, 302, 356,
358, 369, 371, 382, 404, 405
Deming, W. E., 30
Department for Education and
Employment, 122, 229, 230
Department of Education and Science and
the Welch Office (DFE), 97, 110, 111
Derry, S. J., 290
Devlin, K., 82
AUTHOR INDEX
Dewey, J., 383, 391
Doane, D. P., 296
Doerr, H. M., 110
Doganaksoy, N., 38
Donahue, P. L., 175
Donnelly, J. F., 234, 235
Dorfler, W., 388
Dreyfus, T., 125
E
Earley, M. A., 296
Edelson, D. C., 149
Edwards, K., 69
Eisenhart, M., 72
Emerling, D., 32, 42
English, L. D., 83, 84, 91
Erickson, J. R., 81
Erickson, T., 196, 358, 368
Estepa, A., 111, 234, 235
European Commission, 47, 50, 74
Evans, J. St. B. T., 80, 81, 86
Evensen, D. H., 122
F
Falk, R., 30, 36
Feldman, A., 169
Fennema, E., 101, 112, 328
Fey, J. T., 122, 355
Fienberg, S., 26
Fillenbaum, S., 60
Finkel, E., 72
Finzer, B., 122, 358
Fischbein, E., 33, 101, 398
Fischer, K. W., 98, 99
Fischhoff, B., 81
Fitzgerald, W. M., 122, 355
Fitzpatrick, M., 122
Fong, G. T., 398
Frankenstein, M., 65, 69
Freire, P., 65
Freudenthal, H., 149
Freund, R. J., 177
Frick, R. W., 174, 196
Friedlander, A., 34, 123, 125, 126
Friel, S. N., 55, 60, 64, 103, 110, 122,
123, 155, 181, 182, 193, 196, 279, 328,
331, 335, 344, 355, 356, 400, 401
413
G
Gabriele, A. J., 57, 58, 75
Gagnon, A., 34, 35, 178, 181, 381
Gail, M., 27, 28
Gainer, L. J., 48
Gal, I., 11, 35, 36, 41, 49, 51, 52, 58, 59,
60, 61, 63, 66, 67, 69, 70, 71, 72, 75, 102,
142, 181, 206, 261, 262, 278, 279, 403,
405, 406
Galotti, K. M., 79, 80, 81, 85
Garfield, J., 6, 7, 13, 34, 58, 59, 61, 63,
66, 72, 85, 92, 99, 102, 111, 113, 125,
142, 259, 260, 278, 299, 300, 302, 356,
358, 369, 371, 375, 398, 401, 404, 406
Gaudard, M., 33
Gazit, A., 101
Gertzog, W. A., 299
Gigerenzer, G., 42
Gilovich, T., 80, 92
Ginsburg, L., 70, 75
Glencross, M. J., 296
Gnanadesikan, M., 329, 348
Godino, J. D., 111, 234, 235, 259
Goldman, S. R., 278, 296
Goldsmith, L., 34, 35, 147, 181, 381, 383
Gonzalez, E. J., 72
Gordon, F., 169
Gordon, M., 122
Gordon, S., 169
Gould, S. J., 171
Graham, A., 121
Gravemeijer, K., 12, 112, 147, 149, 161,
164, 170, 194, 205, 234, 235, 353, 376,
380, 381, 382, 384, 385, 387, 388, 390,
401, 403, 404
Green, D. R., 101, 111, 204, 234, 235
Green, K. E., 69
Greenwood, M., 21
Greer, B., 61, 102, 122, 125, 278
Gregory, R., 52, 61, 73
Griffin, D., 80, 92
Grosslight, L., 297
H
Hacking, I., 21, 22, 175, 195
Hadas, N., 125
Hahn, G., 38, 39
Hancock, C., 34, 35, 147, 181, 381, 383
Hare, L., 30, 32, 42
Harvill, J. L., 296
414
AUTHOR INDEX
Hawkins, A., 38, 39, 40
Heaton, R. M., 13, 327, 328, 350, 355,
404
Hersh, R., 21
Hershkowitz, R., 125
Hewson, P. W., 299
Higgins, T. L., 121, 122, 147, 164, 170,
178, 194, 355, 381
Hill, A. B., 28
Hmelo, C. E., 122
Hoaglin, D., 121
Hodgson, T. R., 296
Hoerl, R., 30, 32, 38, 42
Hofbauer, P. S., 104
Hogg, B., 6, 92
Holland, J. H., 297
Holyoak, K. J., 297
Hooke, R., 58
Hopfensperger, P., 204
Houang, R. T., 71
Hoyles, C., 170, 178, 383
Hromi, J., 30, 32
Hsu, E., 232
Huberman, A. M., 237, 281
Huck, S., 258
Hudicourt-Barnes, J., 195
Huff, D., 58, 60
Hunt, D. N., 126
K
Kader, G. D., 121
Kahneman, D., 28, 29, 30, 33, 42, 48, 80,
92, 206, 278, 279, 398
Kaput, J., 34, 35, 147, 181, 381, 383, 388
Kelly, A. E., 91, 93, 398, 404
Kelly, B., 205, 207, 222
Kelly, D. L., 72
Kelly, P., 35
Kendall, M. G., 21, 22
Kepner, J., 123
Kettenring, J., 32
Khalil, K., 159, 162, 165, 178
Kirsch, I., 52, 55, 56
Knight, G., 35
Kolata, G., 58
Kolstad, A., 55
Konold, C., 12, 30, 35, 36, 61, 92, 110,
121, 122, 128, 147, 148, 159, 162, 164,
165, 169, 170, 177, 178, 181, 194, 206,
233, 234, 354, 381
Kosonen, P., 58
Kotz, S., 20, 36
Krabbendam, H., 232
Kramarsky, B., 232, 233, 235, 236, 249,
250, 252, 253
Kranendonk, H., 204
Krantz, D. H., 398
Krishnan, T., 26
I
Inhelder, B., 235, 258, 398
J
Jacobs, S., 232
Jacobs, V., 206, 207, 209, 278
Janvier, C., 228, 233
Jaworski, B., 331
Jenkins, E. W., 48, 55
Jennings, D. L., 234, 299
Johnson, N., 20, 36
Johnson, Y., 104
Johnson-Laird, P. N., 79, 80, 81
Joiner, B., 20, 30, 33
Joliffe, F., 398
Jones, G. A., 11, 99, 100, 101, 102, 103,
104, 109, 112, 113, 206, 217, 302
Jones, M., 102, 103, 290, 302
Joram, E., 57, 58, 75
Jungeblut, A., 55, 56
L
Laborde, C., 52
Lajoie, S. P., 58, 72, 182, 398
Lakoff, G., 82
Lampert, M., 124, 350
Landwehr, J. M., 36, 56, 63, 73, 204, 279,
328
Lane, D. M., 296
Lang, J., 296
Langrall, C. W., 11, 99, 100, 101, 102,
103, 104, 109, 112, 113, 206, 217, 302
Lappan, G., 122, 355
Larson, S., 232
Latour, B., 385
Lawler, E. E., 49
Lecoutre, M. P., 206
Lee, C., 205, 398
Lefoka, P. J., 101
Lehrer, R., 110, 148, 191, 192, 194, 327,
328, 339, 350, 355, 382, 385
AUTHOR INDEX
Leinhardt, G., 58, 59, 63, 232
Leiser, D., 234
Lemke, J. L., 155
Lepper, M., 299
Lesh, R. A., 91, 93, 110, 398, 404
Levin, J. R., 290
Levins, L., 100
Lichtenstein, S., 81
Lieberman, A., 357
Lightner, J., 23
Lima, S., 170
Lipson, A., 36, 206
Loef, M., 101, 112
Lohmeier, J., 36, 206
Lord, C., 299
Lovett, M., 85
Lovitt, C., 122
Lowe, I., 122
Lubienski, S. T., 328
M
MacCoun, R. J., 73
Maher, C. A., 138
Makar, K., 13, 328, 355, 356, 361, 404
Mallows, C., 17, 38
Marion, S. F., 72
Martin, M. O., 72
Mathieson, K., 296
Mattern, N., 398
Mayr, S., 178
Mazzeo, J., 175
McCabe, G. P., 128, 205, 206, 211, 297
McClain, K., 14, 147, 149, 161, 164, 167,
170, 194, 205, 234, 235, 376, 377, 380,
381, 384, 385, 391, 403, 404, 405
McCleod, D. B., 68, 69, 111
McGatha, M., 377, 381
McKean, K., 28, 29
McKnight, C. C., 235
McTighe, J., 381
Medin, D. L., 187
Meeker, W., 39
Meira, L. R., 126, 155, 388
Meletiou, M., 203, 205, 367
Mellers, B. A., 61
Meltzer, A. S., 48
Mendez, H., 259
Merseth, K., 350
Mervis, C. B., 187
Mestre, J. P., 53
415
Metz, K. E., 278, 292
Mevarech, Z. A., 111, 233, 235, 236, 249,
250, 252, 253
Mewborn, D. S., 328
Meyer, J., 234, 258
Mickelson, W., 13, 327, 355, 404
Miles, M. G., 237, 281
Mills, J. D., 296
Ministry of Education, 5, 35, 122, 229,
230
Mogill, T., 101, 206, 217
Mokros, J. R., 64, 92, 110, 122, 123, 148,
170, 177, 178, 193, 204, 278, 398
Monk, G. S., 128
Mooney, E. S., 11, 99, 100, 102, 103,
104, 109, 111, 112, 113, 302
Moore, D., 4, 5, 17, 20, 37, 38, 39, 40,
47, 58, 59, 62, 63, 64, 65, 66, 70, 71, 72,
74, 84, 86, 87, 88, 91, 92, 111, 121, 122,
128, 172, 202, 203, 205, 206, 211, 228,
279, 297, 375, 376, 398
Morgan, C., 297
Moritz, J., 13, 35, 37, 64, 72, 99, 100,
101, 108, 109, 110, 111, 113, 155, 171,
178, 179, 181, 194, 205, 206, 207, 209,
211, 221, 233. 234. 235, 237, 238, 239,
249, 252, 253, 279, 280, 283, 286, 287,
288, 290, 291, 292, 355, 356, 404
Moschkovich, J. D., 132, 134, 140
Mosenthal, P. B., 52, 55-56
Mosteller, F., 121
Mullis, I. V. S., 72
N
National Council of Teachers of
Mathematics (1989), 97, 111, 122, 169,
203
National Council of Teachers of
Mathematics (2000), 4, 5, 47, 58, 63, 71,
92, 97, 110, 111, 112, 122, 149, 169, 170,
203, 229, 230, 231, 249, 251, 277, 327,
328, 336, 397
National Research Council, 169
National Writing Project, 357
Nelson, B. S., 328
Nemirovsky, R., 231, 232, 251
Newmann, F. W., 328
Newstead, S. E., 80, 86, 88
Newton, H. J., 296
Nicholls, J., 101
416
AUTHOR INDEX
Nicholson, J., 205
Nickerson, R. S., 297
Nisbet, S., 102, 103
Nisbett, R., 81, 297, 398
Noddings, N., 138
Norman, C., 30
Noss, R., 170, 178, 383
Nunez, R. E., 82
O
Oaksford, M., 80
Ogonowski, M., 195
Okamoto, W., 98, 99
Olecka, A., 101
Orcutt, J. D., 60, 65, 74
Organization for Economic Cooperation
and Development (OECD) and Human
Resource Development Canada, 55, 73
Orr, D. B., 279
Osana, H. P., 290
Ottaviani, M. G., 398
Over, D. E., 81
Ozruso, G., 126
P
Packer, A., 48, 49
Parker, M., 58, 59, 63
Paulos, J. A., 58, 63, 65
Pea, R. D., 388
Pearsall, J., 201
Pegg, J., 99, 100, 108, 109, 110, 205
Penner, D., 191
Pereira-Mendoza, L., 72
Perkins, D., 122, 297
Perlwitz, M., 101
Perry, B., 99, 100, 102, 103, 104, 109,
112, 113, 302
Perry, M., 121
Peterson, P. L., 101, 112
Petrosino, A. J., 148
Pfannkuch, M., 6, 11, 18, 30, 32, 37, 38,
40, 43, 50, 59, 62, 73, 84, 85, 87, 88, 110,
124, 170, 201, 203, 291, 367, 398, 405
Phillips, E. D., 122, 355
Phillips, L. D., 81
Piaget, J., 98, 101, 231, 234, 235, 258,
398
Pinker, S., 234
Plackett, R. L., 182
Pligge, M., 192
Polaki, M. V., 101
Pollatsek, A., 12, 35, 36, 110, 128, 148,
159, 162, 164, 165, 170, 178, 181, 206,
296, 314, 354, 381
Porter, T. M., 22, 23, 24, 25, 26, 27, 28,
173, 195
Posner, G. J., 299
Pozzi, L., 170, 178, 383
Provost, L., 30
Putt, I. J., 99, 100, 102, 103, 104, 109,
112, 113, 302
Putz, A., 194
Pyzdek, T., 31
Q
QSR, 359
Quetelet, M. A., 183
R
Reading, C., 12, 37, 100, 110, 149, 171,
205, 206, 209, 211, 221, 404
Reber, A. S., 99
Resnick, L., 41, 57, 58, 75, 101, 124, 128
Resnick, T., 125, 126
Rich, W., 36
Robinson, A., 159, 162, 165, 178
Robyn, D. L., 231, 249
Romberg, T., 110, 123, 155, 382, 385
Rosch, E., 187
Rosebery, A., 195
Ross, J., 111, 234, 235, 238, 248, 251
Ross, L., 81, 234, 299
Roth, W. M., 385
Rothschild, K., 181
Rowe, M. B., 234, 249
Rubick, A., 43
Rubin, A., 122, 193, 206, 207, 278, 398
Rumsey, D. J., 7
Russell, S., 64, 92, 110, 123, 148, 170,
177, 178, 204, 278, 398
Rutherford, J. F., 49
S
Sabini, J. P., 80
Saldanha, L. A., 111, 180, 296, 376
Salsburg, D., 25
Sanchez, V., 13, 258, 262, 404
Schaffer, M. M., 187
Schaeffer, R. L., 123, 328, 329
AUTHOR INDEX
Schau, C., 6, 70, 75, 92, 398
Schauble, L., 110, 148, 191, 192, 194,
327, 328, 339, 350, 355
Scheaffer, R., 20, 56, 63, 122, 204, 329,
348
Schifter, D., 328
Schmidt, W. H., 71
Schmitt, M. J., 63, 74
Schnarch, D., 101
Schoenfeld, A. H., 124, 126, 129
Scholz, R. W., 101
Schorr, R. Y., 110
Schuyten, G., 269
Schwartz, A., 61
Schwartz, D. L., 278, 296
Schwartz, J. L., 297
Schwartzman, S., 178
Schwarz, B., 125
Schwarz, C. J., 296
Schwarz, N., 81
Scott, J., 398
Seber, G. A. F., 206
Sedlmeier, P., 92, 297, 398
Sfard, A., 83, 84, 86, 87, 153
Shade, J., 32, 42
Shamos, M. H., 48, 58, 72
Shaughnessy, J. M., 12, 34, 37, 58, 61,
92, 100, 101, 102, 110, 125, 149, 171,
203, 204, 205, 206, 207, 208, 209, 211,
221, 222, 278, 328, 375, 404
Shewart, W., 30
Shinar, D., 234
Shulman, L. S., 328
Siegel, A., 297
Simmons, R., 309
Simon, M. A., 112, 296
Sloman, S. A., 81, 82
Slovic, P., 48, 92, 398
Smith, C., 297
Smith, G., 169
Smith, T. A., 72
Snee, R., 17, 30, 31, 32, 42, 371, 372
Snir, J., 297
Stake, R. E., 353
Stanovich, K. E., 81, 82
Starkings, S., 73
Statistics Canada and Organization for
Economic Co-operation and
Development (OECD), 52, 53, 55, 69, 73
Steen, L. A., 58
Stein, S., 74, 232
417
Steinbring, H., 34, 123, 155
Stigler, S., 22, 24, 183, 185, 195
Strauss, A., 147, 152, 359, 404
Strauss, S., 110, 170, 177, 178, 179
Streefland, L., 112
Street, B. V., 71
Strike, K. A., 299
Strom, D., 192
Sutherland, J., 296
Swan, M., 232, 234, 235
Swatton, P., 234, 235, 238, 248
Swift, J., 73, 279
T
Tabachnik, N., 234
Tabach, M., 125, 126
Tanur, J., 26
Tarr, J. E., 101
Tauber, L., 12, 258, 262, 404
Taylor, R. M., 234, 235
Tenney, Y., 207, 278, 398
TERC, 122, 354
Thagard, P. R., 397
Thomason, N., 296
Thompson, P. W., 84, 91, 111, 180, 296,
376
Thornley, G., 35
Thornton, C. A., 11, 99, 100, 101, 102,
103, 104, 109, 112, 113, 206, 217, 302
Tilling, L., 231
Todd, P. M., 42
Torok, R., 110, 206, 210, 211, 222
Tracy, R. L., 296
Trigatti, B., 101
Triggs, C., 73, 398
Tripp, J. S., 110
Trumble, B., 202
Truran, J., 398
Tufte, E. R., 25, 26, 60, 231
Tukey, J., 121, 169, 171, 195
Turner, J. B., 60, 65, 74
Tversky, A., 28, 29, 30, 33, 42, 48, 80,
92, 207, 278, 279, 398
Tzou, C., 385
U
Ullman, N., 40
UNESCO 1990, 47
UNESCO 2000, 63, 69, 73
Unger, C., 122, 309
418
AUTHOR INDEX
Utts, J. M., 335, 400
V
Vallecillos, A., 259, 269
Valverde, G. A., 71
Van Reeuwikj, M., 382
Vaughn, L. A., 81
Velleman, P., 122, 296
Venezky, R. L., 71
Verhage, H., 122
Voelkl, K. E., 175
Voigt, J., 141, 405
Vye, N. J., 296
Vygotsky, L. S., 140
W
Wachtel, H., 398
Wackerly, D., 296
Wagner, D. A., 71, 181, 206, 278
Wainer, H., 60
Waldron, W., 63, 74
Wallman, K. K., 47, 48, 68, 279
Wallsten, T. S., 60
Wanta, W., 52, 66
Wares, A., 102, 103, 110, 302
Warren, B., 194
Wason, P. C., 79, 80, 81
Watkins, A. E., 56, 63, 73, 204, 279, 328,
329, 348
Watson, J., 13, 35, 36, 37, 48, 49, 64, 72,
99, 100, 101, 102, 108, 109, 110, 111,
113, 149, 171, 178, 179, 181, 194, 205,
206, 207, 209, 210, 211, 221, 233, 235,
237, 238, 249, 252, 253, 279, 281, 282,
283, 286, 287, 288, 290, 291, 292, 328,
355, 356, 404
Wavering, J., 111, 232, 251
Wehlage, G. G., 328
Welford, A. G., 234, 235
Well, A., 35, 36, 128, 159, 162, 165, 170,
178, 181, 206, 296, 314, 381
West, M. M., 297
West, R. F., 81, 82
Wheatley, G., 101
Whitenack, J. W., 147, 152
Whittinghill, D., 6, 92
Wiggins, G., 281, 400, 406
Wild, C. J., 6, 11, 18, 20, 32, 38, 40, 50,
59, 62, 73, 84, 85, 87, 88, 110, 124, 170,
201, 202, 203, 206, 291, 367, 398, 405
Wilensky, U., 170, 193, 259, 381
Wiley, D. E., 71
Wilkinson, L., 156
Wilson, W. J., 177
Wing, R., 159, 162, 165, 178
Winne, P. H., 48
Wisenbaker, J., 398
Wiske, M. S., 297
Witmer, J., 204, 329, 348
Wood, D. R., 357
Wood, T., 101
Y
Yackel, E., 101, 140
Yamin-Ali, M., 122
Yerushalmy, M., 122, 232, 251
Young, S. L., 343
Z
Zaslavsky, O., 232
Zawojewski, J. S., 149, 203, 206, 208
Zeleke, A., 398
Subject Index
mean 57, 59, 61-63, 86, 110,
111, 126, 136, 170-197,
205, 228, 258-274, 298304, 307, 309-312, 319,
320, 323, 355, 360, 362370, 377, 381, 388, 390,
400, 402
median 59, 136, 148, 156, 164166, 168, 170, 171-185,
190, 260-274, 309, 359,
377, 381, 388, 390
midrange 148
mode 76, 136, 171, 260, 293,
264, 265, 268-274, 377,
381
A
assessment 6, 10, 14, 15, 43, 46, 49,
69, 75-78, 92, 93, 98, 113,
115-117, 127, 142, 144, 166,
175, 190, 191, 197, 198, 223,
229, 249, 252, 254, 258, 275,
276, 295, 297, 299, 300, 303,
304, 313, 315, 317, 328, 351,
352,355, 357,370, 377, 378,
381, 397, 399, 400, 404, 406408
implications 43, 74-75, 142,
223-224, 314-317, 405406
average 138, 169-197, 204, 207, 208,
259, 260, 265,267, 278, 279,
282, 289, 290-292, 294, 301,
310, 313, 314, 323,363, 364,
367
center of a distribution 7, 10, 12,
20, 59, 97, 103, 111, 115,
122, 128, 138, 148, 149,
152, 159, 165, 204, 207,
354, 355, 356, 363, 377,
381, 400, 401
B
bivariate data, see covariation
C
causation 7, 38, 44, 67, 213, 214,
217, 221, 222, 252, 400
Central Limit Theorem 112, 170,
258, 259, 296, 298, 299, 303305, 311-316, 357-400
chi-square 228
cognitive development (see models
of cognitive development)
419
420
SUBJECT INDEX
comparing groups 97, 111, 171, 172,
180, 181, 189, 278, 353-372,
400
confidence interval 88, 89, 311, 357,
361, 402
correlation 67, 90, 112, 205, 224,
228, 247, 254, 357, 400
covariation 12, 13, 111, 123, 128,
227-253, 382, 394, 401
curriculum 4, 8-15, 37, 43, 47, 56,
71, 73, 88, 97, 98, 112-116,
122-127, 132, 134, 139-144,
149, 162, 170, 190, 191,201,
203, 204, 229, 231, 253-255,
277, 278, 283, 290, 291, 327332, 348-350, 354, 355, 382,
393, 397-401
standards 149, 169, 170, 191,
198, 277, 328, 397
D
data
data production 38, 58, 62, 92
data representation 12, 25, 39,
121-127, 141-143, 402
data sets 10, 24, 87, 106, 108,
112, 118, 128, 165, 203,
205, 223, 225, 275, 350,
355, 376, 406
data types 400
real data 18, 86, 92, 128, 164,
169, 257, 261, 272, 274,
275, 302
data analysis
Exploratory Data Analysis
(EDA) 12, 14, 121-131,
138-145, 171, 195, 197,
375, 376, 383, 386
data handling 34, 103, 202, 203,
206, 278
density 148, 149, 156, 163, 165, 259272, 275, 300, 310, 381, 382
design experiments 375-395
dispersion (see variability)
dispositions 36, 41, 49, 51, 68, 72,
75, 124, 126, 140, 141, 279,
405
distribution 10, 12, 13, 20, 23, 24, 26,
29, 36, 38, 59, 67, 87, 88, 92,
123, 136, 137, 147-167, 170197, 207, 217, 220-222, 295,
298-323, 327-350, 353-372,
376, 381, 382, 392, 393,400,
401
normal 149, 194, 257-275, 300313, 323
shape 12, 52, 57, 59, 67, 75,
122, 123, 132, 133, 135,
140, 147, 149, 150, 159165, 171, 172, 173, 176,
177, 187, 189, 190, 193,
263, 264, 270, 274, 275,
298-315, 320-323, 354,
355, 364, 368, 400, 401,
407
skewed 63, 148, 149, 156, 159,
171, 189, 190, 259, 261,
263, 268, 271-275, 313,
381
E
enculturation 124, 129, 138, 140, 143
G
graphs 12, 20, 22, 34, 35, 55-65, 73,
76, 78, 103, 107-110, 116,
119, 122, 128, 129, 132, 138,
141, 143, 147-168,205, 223,
228-255, 259, 262, 267, 271275, 299, 304-310, 319, 321,
328, 330, 335, 336, 341-344,
348-358, 372, 377, 379, 381,
402-404
bar graph 51, 55, 72, 233, 333,
377
boxplot 87, 156, 163, 181, 259,
260, 368, 379
histogram 156, 188, 189, 259,
264, 366, 377
SUBJECT INDEX
pie chart, 51, 56, 377
scatterplot 13, 227, 382
stem-and-leaf plot, 87, 182, 259,
260
time series plot, 128, 129
H
history of statistics 7-43
I
inference, statistical 27, 61, 62, 64,
80, 85, 88, 89, 94, 112, 118,
122, 170, 199, 252, 257, 277,
278, 293, 295, 296, 311, 316,
317, 328, 338, 356, 357, 373,
375, 376, 383, 395, 400
instructional design 9, 13, 155, 159,
375-394
introductory college course 4, 8, 62,
73, 92, 169, 257, 260, 295,
296, 302, 304, 305, 359, 397
intuitions 4, 110, 116, 189, 207, 224,
299
L
learning 3-15, 49, 71-78, 91, 92, 95,
99-103, 110-118, 121-129,
135, 138-145, 147-152, 155,
166, 167, 170, 171, 183, 196199, 203-206, 210, 223-225,
232, 238, 253, 254, 274, 279,
291, 293, 294, 295, 296, 297,
300, 309, 312-317, 327-332,
336, 342-351, 356-358, 370372, 398-408
M
mathematical reasoning (see
reasoning)
421
mathematics 5, 14-17, 25, 33, 35, 40,
44-47, 71, 75-77, 79, 82-93,
97, 98, 100, 102, 112, 113,
122, 124-126, 128, 142, 149,
169, 170, 193, 197-199, 203,
204, 209, 224, 225, 229, 277,
278, 283, 290-293, 328, 329,
335, 353-359, 362, 364, 366,
370-372, 398, 399, 404, 408
misconceptions 61, 92, 258-260, 276,
296, 299, 300, 302, 311, 312314, 358, 371, 405
models
modeling 18, 23, 25, 31, 34, 36,
61, 97, 111, 116, 117,
188, 190, 202-204, 228,
274-276
statistical models 20, 23, 25, 35,
41
models of cognitive
development 11, 97-113,
124, 201, 231, 328, 405,
407
N
Normal distribution (see distribution)
O
outliers 20, 67, 122, 128, 147, 148,
153, 156, 163-165, 178, 188,
190, 192, 274, 310, 354, 367,
368, 400
P
percentiles 171, 266, 268
prediction 12, 22, 32, 85, 97, 102,
104, 111, 112, 128, 147, 160,
163, 167, 202-204, 210, 228,
278, 303, 309, 310-314, 330,
349, 398, 400, 403
preparation of teachers (see teacher
preparation)
422
SUBJECT INDEX
probability 5, 7, 8, 11, 17, 21-24, 33,
36, 38, 44-45, 57-63, 66, 71,
73, 77-93, 97, 99, 101, 116118, 149, 181, 186, 198, 205210,219, 220, 223, 257-260,
268, 277, 301, 354, 398, 400,
408, 409
R
randomness 6, 35-38, 44, 61, 202,
277, 398
reasoning
mathematical 11, 79-93, 99
statistical 6, 39, 43, 47, 50, 86,
89, 97-113, 121-125,
136, 142, 159, 170, 172,
188, 190, 197, 198, 227232, 235, 247-253, 257,
260-263, 266, 271, 274,
279, 290, 302, 327-339,
344, 348, 349, 350, 353357, 361, 369, 370, 375,
381, 382, 386-393, 397405
reform movement 3, 5, 72, 92, 97,
190, 354, 357
regression 112, 205, 224, 357
S
sampling 52, 59, 61, 64, 65, 74, 78,
149, 159, 165, 176, 192, 193,
196, 197, 201-224, 234, 235,
250, 259, 260, 261, 277-293,
295-316, 400-402, 408
bias 42, 59, 206, 278, 280, 282,
284, 287, 289, 292
sampling distributions 64, 85,
97, 111-114, 118, 259,
260, 276, 295-323, 358,
366-369, 376, 400
SOLO model 99, 109, 115, 206, 279,
281, 293
statistical literacy 6, 47-75, 235, 252,
279-282, 291, 292, 397-407
statistical reasoning (see reasoning)
T
teacher preparation 72, 73
teachers’ knowledge 13, 110
elementary 327-351
secondary 353-372
teaching
implications 43, 71-74, 91-92,
111-114, 142, 166-167,
223-224, 250-253, 274275, 291-292, 314-316,
329-351, 371-372, 405406
technology 5, 9, 14, 48, 49, 72, 92,
121, 122, 125, 143, 144, 296,
297, 312-317, 397-403, 406,
408
ActivStats 296, 318
calculators 223
Computer Mediated
Communication (CMC)
402, 403
ConStatS 296, 316
Data Desk 122, 145
Excel 126, 402
ExplorStat 296, 317
Fathom 122, 144, 353, 358, 360368, 372, 402
HyperStat 296, 317
Internet 50, 123, 136, 358, 401,
402
Minitools 149-152, 205, 387,
402
Sampling SIM 296-304, 316,
402
simulation software 5, 260, 276,
297, 311, 315
StatConcepts 296, 319
Statgraphics 260-262, 402
StatPlay 296, 318
Tabletop 122, 145
Tinkerplots 402
Visual Statistics 296, 316
test of significance 26, 88, 89, 369,
371
SUBJECT INDEX
thinking
mathematical 83, 88, 99, 101,
116
statistical 17-43, 47, 48, 85-87,
92, 93, 100, 116, 117,
121, 124, 125, 202, 203,
292, 367, 397-399, 403408
V
variability 9, 12, 18, 24, 30, 31, 32,
36-40, 61, 62, 87, 89, 90, 128,
139, 141, 171-173, 176, 184,
187-197, 201-225, 236, 260,
278, 295, 300-318, 330, 339,
344, 377, 400, 401
dispersion 10, 103, 203, 205,
223, 356, 362, 363, 368,
370
interquartile range 148, 171,
172, 176, 197
range 136, 148, 159, 166, 210,
311,363-368, 377, 381,
388-390
spread 7, 20, 42, 43, 103, 111,
122, 126, 148, 149, 153156, 159, 165, 166, 171173, 176, 187-190, 203223, 258, 259, 275, 300,
301, 309, 310, 315, 354,
363, 367, 368, 381, 400,
401
standard deviation, 148, 171,
172, 176, 184, 190, 196,
197, 204, 264, 267, 268,
273, 301, 307, 310-313,
323, 362-369
variance 64, 189, 207, 259, 262,
305
423
variation 5, 7, 12, 13, 97, 102,
111, 115, 118, 149, 154,
159, 164, 168, 169, 170,
183, 188, 198, 199, 201224, 227, 228, 231-233,
239-254, 278, 290, 292,
353-372, 382, 400
variable 56, 64, 67, 110, 150, 201,
227-251, 259-274, 300, 335,
342, 344, 400, 401
lurking variable 67, 235, 385