Meta reviewsIEEEV5
Meta reviewsIEEEV5
Meta reviewsIEEEV5
Introduction
The
field
of
Artificial
Intelligence
in
Education
(AIED)
has
been
in
existence
for
about
40
years
and
operated
under
various
other
names,
the
most
common
of
which
is
Intelligent
Tutoring
Systems.
The
field
was
initially
brought
to
wider
attention
by
papers
in
a
special
issue
of
the
International
Journal
of
Man-‐
Machine
Studies
(see
e.g.,
Brown,
Burton,
&
Bell,
1975),
by
papers
in
a
book
based
on
that
special
issue
(see
e.g.,
O'Shea,
1982)
and
in
Artificial
Intelligence
books
of
the
era
(see
e.g.,
Brown
&
Burton,
1975).
This
field
used
and
continues
to
use
techniques
from
artificial
intelligence
and
cognitive
science
to
attempt
to
understand
the
nature
of
learning
and
teaching
and
so
build
systems
to
assist
learners
to
master
new
skills
or
to
understand
new
concepts,
in
ways
that
mimic
the
insightful
and
adaptive
tutoring
of
a
skilled
human
tutor
working
one-‐to-‐one
with
the
learner.
That
is
to
say,
such
systems
attempt
to
adapt
the
way
that
they
teach
to
the
existing
and
developing
knowledge
and
skill
of
the
learners,
to
their
preferred
ways
of
going
about
learning,
and
to
take
into
account
the
affective
trajectory
of
the
learners
as
they
deal
with
the
expected
set-‐backs
and
impasses
of
mastering
new
material.
There
is
clearly
some
overlap
with
other
uses
of
computing
technology
in
education,
though
the
commitment
to
individual
adaptation
through
modelling
different
parts
of
the
educational
process
is
a
key
defining
characteristic.
In
order
for
such
systems
to
adapt
to
the
learner
and
so
provide
a
personalised
learning
experience,
a
typical
conceptual
architecture
has
evolved.
This
consists
of
(i)
a
model
of
the
domain
being
learned
so
that
the
system
can
reason
about
and
judge
whether
a
student’s
answer
or
indeed
a
problem-‐solving
step
is
appropriate;
(ii)
a
model
of
the
current
level
of
the
learner’s
understanding
or
skill,
so
that
tasks
of
appropriate
complexity
can
be
posed;
(iii)
a
model
of
pedagogy
so
that
the
system
can
make
sensible
tutorial
moves
such
as
providing
effective
feedback
or
adjusting
the
nature
of
the
next
task;
and
(iv)
one
or
more
interfaces
through
which
the
system
and
the
learner
can
communicate
to
explore
and
learn
about
the
domain
in
question.
1
This
column
is
an
adapted
and
enlarged
version
of
a
letter
to
the
Editor
of
the
International Journal of Artificial Intelligence in Education (du Boulay, 2016) .
1
Over
the
years
many
systems
using
a
variety
of
pedagogical
techniques
and
topics
have
been
built
and
evaluated.
To
give
a
sense
of
the
wide
scope
of
the
work,
four
systems
are
mentioned.
These
have
been
chosen
for
their
diversity
and
range
from
classic
teaching
in
a
formal
subject
and
a
procedural
skill,
through
learning
by
creating
externalised
forms
of
knowledge
for
a
highly
conceptual
learning
task
to
rich,
natural
user
interaction
via
speech
for
a
learning
complex
culture-‐laden
skills.
The
four
examples
of
AIED
systems
are:
(i)
a
system
to
help
learners
understand
basic
algebra
by
being
set
problems
and
provided
with
step
by
step
feedback
and
guidance
on
their
solution
(Koedinger,
Anderson,
Hadley,
&
Mark,
1997);
(ii)
a
system
to
help
learners
gain
a
conceptual
understanding
of
river
ecosystems
by
building
a
concept
map
of
that
domain,
as
if
for
another
learner,
and
having
that
simulated
other
learner
take
tests
on
the
adequacy
of
the
concept
map
so
far
built
(Leelawong
&
Biswas,
2008);
(iii)
a
system
to
help
military
personnel
both
learn
and
speak
Arabic
as
well
as
understand
the
social
and
cultural
norms
needed
to
interact
with
people
in
the
country
within
which
they
are
operating
(Johnson,
2010).
The
fourth
example
system
illustrates
the
increasing
importance
of
the
interface
in
AIED
systems
and
their
use
in
informal,
such
as
museums,
as
well
as
formal
learning
environments.
The
screenshot
in
Figure
1
shows
Coach
Mike,
a
pedagogical
agent
designed
to
help
children
visiting
a
museum
to
learn
about
robotics.
This
kind
of
application
extends
the
role
of
classroom
teaching:
“it
means that such systems need to go beyond simply focusing on knowledge
outcomes. They must take seriously goals such as convincing a visitor to
engage, promoting curiosity and interest, and ensuring that a visitor has a
positive learning experience. In other words, pedagogical agents for informal
learning need to not only act as coach (or teacher), but also as advocate (or
salesperson)” (Lane et al., 2013, p. 310).
Figure 1 Coach Mike in three different poses, taken from (Lane et al., 2013)
Coach Mike was designed to emulate some of the work of the human museum
2
A
number
of
papers
recently
have
argued
the
case
for
the
benefit
of
artificial
intelligence
systems
in
education
(see
e.g.,
Luckin,
Holmes,
Griffiths,
&
Forcier,
2016;
Woolf,
Lane,
Chaudhri,
&
Kolodner,
2014),
while
others
have
been
more
sceptical
(see
e.g.,
Enyedy,
2014).
This
column
looks
at
the
evidence
derived
from
meta-‐reviews
and
meta-‐analyses
conduced
over
the
last
5
years.
Its
main
focus
is
on
the
comparative
effectiveness
of
AIED
systems
vs
human
tutoring.
We
note
in
passing
the
a
meta-‐review
of
the
use
of
pedagogical
agents
(not
necessarily
in
AIED
systems)
“produced
a
small
but
significant
effect
on
learning”
(Schroeder,
Adesope,
&
Gilbert,
2013).
This
column
is
absolutely
not
intended
as
support
for
an
argument
about
getting
rid
of
human
teachers,
but
is
intended
as
support
for
blended
learning
where
some
of
the
human
teacher’s
work
can
be
off-‐loaded
to
AIED
systems,
as
if
to
a
classroom
assistant.
3
Sidebox
1.
For
example,
imagine
that
the
problem
is
to
solve
the
equation
2(14 – x) = 23 + 3x
An
answer-‐based
system
would
expect
the
student
to
do
all
the
working
offline
and
then
provide
the
answer
x = 1.
If
asked
for
a
hint
prior
to
the
answer
being
provided,
the
tutor
can
suggest
broad
ways
of
going
about
the
problem,
such
as
collect
all
the
terms
in
x
on
one
side
of
the
equation,
but
has
no
way
of
knowing
that
this
advice
is
being
followed.
If
the
answer
provided
is
wrong
e.g.
x = 1.25,
the
tutor
may
be
able
hypothesise
that
perhaps
the
student
multiplied
out
the
bracket
incorrectly,
but
if
the
answer
provided
is
x = 14,
it
probably
will
not
be
able
to
offer
much
in
the
way
of
specific
help.
In
a
step-‐based
system,
the
student
might
be
invited
to
multiply
out
the
bracket
expression
as
a
first
step
and
so
types
in
28 – 2x
as
the
answer
to
that
step.
If
a
hint
is
requested
or
a
wrong
answer
given
to
this
step,
then
help
can
be
given
about
the
working
of
that
step.
Once
the
step
is
completed
correctly,
the
tutor
would
invite
an
answer
to
the
next
step,
e.g.
reordering
terms
in
the
equation,
and
then
on
through
further
steps
to
the
final
answer.
In
a
substep-‐based
system
there
might
be
a
remedial
dialogue
at
a
finer
level
than
an
individual
step,
for
instance
about
what
expressions
such
as
2x
or
3x
mean,
if
that
seems
warranted
by
the
request
for
a
hint
or
by
a
wrong
step
answer.
Given
the
above
levels
of
granularity,
VanLehn
derived
10
pairwise
comparisons
of
effect
sizes,
see
Table
1.
In
this
table
the
rightmost
column
shows
the
proportion
of
the
results
for
that
row
where
the
individual
study
comparison
was
statistically
reliable
at
the
level
p
<
0.05.
Table
1.
Effect
sizes
adapted
from
(VanLehn,
2011).
Row1
was
taken
by
VanLehn
from
a
separate
study
(C.-‐L.
C.
Kulik
&
Kulik,
1991).
4
Four
Meta-‐reviews
Since
VanLehn’s
meta-‐analysis,
four
meta-‐reviews
have
been
published,
as
well
as
a
large-‐scale
study
of
a
specific
tutor,
see
Table
2.
In
this
table
the
No.
of
Studies
column
shows
the
number
of
instances
for
the
given
comparison
in
that
row,
not
the
total
number
of
studies
in
the
overall
meta-‐review.
Table
2.
Six
meta-‐reviews
and
a
large
scale
study.
*The
standard
error
in
row
1
is
based
on
all
10
studies,
not
just
the
30%
that
produced
reliable
results,
see
Table
1.
Meta-‐review
Comparison
No.
of
Mean
Standard
Comparisons
Effect
Error
Size
1
VanLehn
Step
based
vs
one-‐ 10
-‐0.21
0.19*$
(2011)
to-‐one
human
tutoring
2
Step
based
vs
one-‐ 5
-‐0.11
0.10
to-‐one
human
Ma
et
al.
tutoring
3
(2014)
Step
based
vs
“large
66
0.44
0.05
group
human
instruction”
4
Nesbit
et
al.
Step
based
vs
11
0.67
0.09
(2014)
“teacher
led
group
instruction”
5
Kulik
et
al.
(Step
based
and
63
0.65
0.07$
(2016)
Substep
based)
vs
“conventional
classes”
6
Step
based
vs
one-‐ 3
-‐0.25
0.24
Steenbergen-‐
to-‐one
human
Hu
et
al.
tutoring
(2014)
7
Step
based
vs
16
0.37
0.07
5
“traditional
classroom
instruction”
8
Steenbergen-‐ (Step
based
and
26
0.09
0.01
Hu
et
al.
answer
based)
vs
(2013)
“traditional
classroom
instruction”
9
Pane
et
al.
Blended
learning
147
schools
-‐0.1
0.10
(2014)
including
a
step-‐ 0.21
0.10
based
system
vs
0.01
0.11
traditional
0.19
0.14
classroom
instruction
10
Weighted
AIED
s ystem
v s
o ne-‐to-‐ 18
-‐0.19
mean
one
human
tutoring
11
Weighted
AIED
system
vs
182
0.47
mean
conventional
classes
In
a
meta-‐review
of
107
studies,
Ma,
Adesope,
Nesbit,
and
Liu
(2014)
found
similar
results
to
VanLehn
for
step-‐based
ITSs
both
when
compared
to
no
tutoring
condition
(i.e.
just
a
textbook;
mean
effect-‐size
=
0.36)
and,
more
positively
than
VanLehn,
when
compared
to
large
group
human
teacher
led-‐
instruction
(mean
effect
size
=
0.44),
but
no
differences
when
compared
to
small
group
human
tutoring
or
one–to–one
tutoring.
The
same
authors
analysed
22
systems
for
teaching
programming
and
also
found
a
“a
significant
advantage
of
ITS
over
teacher-‐led
classroom
instruction
and
non-‐
ITS
computer-‐based
instruction”
(Nesbit,
Adesope,
Liu,
&
Ma,
2014).
A
larger
version
of
a
similar
study
involving
280
studies
is
currently
in
progress
(Nesbit,
Liu,
Liu,
&
Adesope,
2015).
In
a
meta-‐review
of
50
studies
involving
63
comparisons,
J.
A.
Kulik
and
Fletcher
(2016)
found
similar
sized
improvements
(mean
effect
size
=
0.65)
but
distinguished
between
studies
that
used
standardised
tests
from
those
where
the
tests
were
more
specifically
tuned
to
the
system
providing
tuition,
with
smaller
effect
sizes
when
standardised
tests
were
employed.
Overall
they
concluded
that
“This
meta-‐analysis
shows
that
ITSs
can
be
very
effective
instructional
tools
.
.
.
Developers
of
ITSs
long
ago
set
out
to
improve
on
the
success
of
CAI
tutoring
and
to
match
the
success
of
human
tutoring.
Our
results
suggest
that
ITS
developers
have
already
met
both
of
these
goals”
(J.
A.
Kulik
&
Fletcher,
2016,
page
67).
They
also
found
better
results
for
substep
based
systems
than
VanLehn,
which
they
ascribed
to
differing
comparison
methodologies.
Much
smaller
effect
sizes
were
found
by
Steenbergen-‐Hu
and
Cooper
(2013)
in
their
meta-‐analysis
of
pupils
using
ITSs
in
school
settings.
J.
A.
Kulik
and
Fletcher
(2016)
put
this
down
to
the
weaker
study
inclusion
criteria
(e.g.
the
6
inclusion
of
answer
based
systems
as
if
they
were
step
based
systems)
used
by
Steenbergen-‐Hu
and
Cooper
who
also
noted
that
lower-‐achievers
seemed
to
do
worse
with
ITSs
than
did
the
broad
spectrum
of
school
pupils,
though
this
result
is
again
disputed
by
Kulik
and
Fletcher.
However,
in
a
parallel
study
of
university
students,
Steenbergen-‐Hu
and
Cooper
(2014)
found
more
positive
effect
sizes
(in
the
range
0.32
–
0.37)
for
ITSs
as
compared
to
conventional
instruction.
They
conclude
that
“ITS
have
demonstrated
their
ability
to
outperform
many
instructional
methods
or
learning
activities
in
facilitating
college
level
students’
learning
of
a
wide
range
of
subjects,
although
they
are
not
as
effective
as
human
tutors.
ITS
appear
to
have
a
more
pronounced
effect
on
college-‐level
learners
than
on
K-‐12
students”
(Steenbergen-‐Hu
&
Cooper,
2014,
page
344).
Rows
10
and
11
summarise
the
results
of
the
meta-‐reviews,
excluding
the
evaluation
of
the
Cognitive
Algebra
Tutor,
and
show
a
weighted
mean
effect
size
of
0.47
for
AIED
systems
vs
conventional
classroom
teaching.
We
use
the
term
AIED
system
to
cover
all
the
systems,
step-‐based,
substep-‐based
and
answer-‐
based
looked
at
in
the
meta-‐reviews.
The
comparison
with
one-‐to-‐one
human
tutoring
shows
that
AIED
system
do
slightly
worse
with
a
mean
effect
size
of
-‐0.19.
In
both
cases
the
means
are
weighted
in
terms
of
the
number
of
comparisons
in
the
meta-‐review,
not
in
terms
of
the
original
N
values
in
the
studies
themselves.
A
large-‐scale
study
in
the
USA
of
the
Cognitive
Tutor
for
Algebra,
(Pane,
Griffin,
McCaffrey,
&
Karam,
2014)
undertook
a
between-‐schools
project
involving
73
high
Schools
and
74
middle
Schools
across
7
states.
The
schools
were
matched
in
pairs
and
half
received
the
Cognitive
Algebra
Tutor
and
adjusted
their
teaching
to
include
it
as
they
saw
fit,
while
the
others
carried
on
as
before
in
terms
of
their
normal
method
of
teaching
algebra.
The
study
ran
over
two
years
and
found
no
significant
differences
on
post-‐test
scores
in
the
first
year
of
the
study
but
a
small
but
significant
effect
size
of
0.21
in
the
high
schools
in
favour
of
the
schools
which
used
the
Cognitive
Tutor
in
the
second
year
of
the
study
(see
data
in
bold,
in
row
9
of
Table
2).
7
Note
that
how
the
Cognitive
Tutor
was
actually
used
in
the
classrooms
was
not
controlled,
though
post-‐hoc
analyses
showed
that
teachers
did
not
generally
use
the
Tutor
exactly
as
recommended
by
its
developers.
Conclusions
The
overall
conclusion
of
these
meta-‐reviews
and
analyses
is
that
AIED
systems
perform
better
than
CAI
systems
and
also
better
than
human
teachers
working
in
large
classes.
They
perform
slightly
worse
than
one-‐to-‐one
human
tutors.
Note
that
most
of
the
systems
were
teaching
mathematics
or
STEM
subjects,
as
these
are
the
kinds
of
subjects
for
which
it
is
easier
to
build
the
domain
and
student
models
mentioned
in
the
Introduction.
It
should
be
noted
that
there
was
a
degree
of
overlap
between
these
meta-‐reviews
and
analyses
in
terms
of
the
collections
of
individual
evaluations
from
which
they
have
drawn
their
conclusions.
The
specific
study
of
the
Cognitive
Tutor
for
Algebra
evaluated
its
use
as
a
blended
addition
to
the
normal
algebra
teaching
in
the
schools
where
it
was
tried
rather
than
as
a
total
replacement
for
the
teachers,
and
found
good
results
in
high
schools,
as
opposed
to
middle
schools,
and
in
the
second
year
of
the
evaluation,
as
opposed
to
the
first
year.
For
a
whole
variety
of
reasons,
the
way
forward
for
AIED
systems
in
the
classroom
must
be
the
blended
model,
classroom
assistants
if
you
like,
so
as
to
provide
detailed
one-‐to-‐one
tutoring
for
some
of
the
students
while
the
human
teacher
attends
to
others
as
well
as
having
overall
responsibility
for
all
the
students’
progress.
Of
course
good
post-‐test
results
are
not
the
only
criteria
for
judging
whether
an
educational
technology
will
be,
or
indeed
should
be,
adopted
(Enyedy,
2014).
However
the
overall
message
of
these
evaluations
is
that
blending
AIED
technology
with
other
forms
of
teaching
is
beneficial,
particularly
for
older
pupils
and
college
level
students
studying
STEM
subjects.
References
Brown,
J.
S.,
&
Burton,
R.
R.
(1975).
Multiple
Representations
of
Knowledge
for
Tutorial
Reasoning.
In
D.
G.
Bobrow
&
A.
Collins
(Eds.),
Representation
and
Understanding
(pp.
311-‐-‐349).
New
York:
Academic
Press.
Brown,
J.
S.,
Burton,
R.
R.,
&
Bell,
A.
G.
(1975).
SOPHIE.
A
step
towards
a
reactive
learning
environment.
International
Journal
of
Man
Machine
Studies,
7,
675-‐-‐696.
du
Boulay,
B.
(2016).
Recent
Meta-‐reviews
and
Meta
Analyses
of
AIED
systems.
International
Journal
of
Artificial
Intelligence
in
Education,
26(1),
536-‐537.
doi:
http://dx.doi.org/10.1007/s40593-‐015-‐0060-‐1
Enyedy,
N.
(2014).
Personalized
Instruction:
New
Interest,
Old
Rhetoric,
Limited
Results,
and
the
Need
for
a
New
Direction
for
Computer-‐Mediated
Learning.
(pp.
1-‐22).
Boulder,
Colorado:
National
Education
Policy
Center.
Johnson,
W.
L.
(2010).
Serious
Use
of
a
Serious
Game
for
Language
Learning.
International
Journal
of
Artificial
Intelligence
in
Education,
20(2),
175-‐195.
8
Koedinger,
K.
R.,
&
Aleven,
V.
(2016).
An
Interview
Reflection
on
"Intelligent
Tutoring
Goes
to
School
in
the
Big
City".
International
Journal
of
Artificial
Intelligence
in
Education,
16(1),
13-‐24.
doi:
http://dx.doi.org/10.1007/s40593-‐015-‐0082-‐8
Koedinger,
K.
R.,
Anderson,
J.
R.,
Hadley,
W.
H.,
&
Mark,
M.
A.
(1997).
Intelligent
Tutoring
Goes
to
School
in
the
Big
City.
International
Journal
of
Artificial
Intelligence
in
Education,
8(1),
30-‐43.
Kulik,
C.-‐L.
C.,
&
Kulik,
J.
A.
(1991).
Effectiveness
of
Computer-‐Based
Instruction:
An
Updated
Analysis.
Computers
in
Human
Behavior,
7(1-‐2),
75-‐94.
doi:
http://dx.doi.org/10.1016/0747-‐5632(91)90030-‐5
Kulik,
J.
A.,
&
Fletcher,
J.
D.
(2016).
Effectiveness
of
Intelligent
Tutoring
Systems:
A
Meta-‐Analytic
Review.
Review
of
Educational
Research,
86(1),
42-‐78.
doi:
http://dx.doi.org/10.3102/0034654315581420
Lane,
H.
C.,
Cahill,
C.,
Foutz,
S.,
Auerbach,
D.,
Noren,
D.,
Lussenhop,
C.,
&
Swartout1,
W.
(2013).
The
Effects
of
a
Pedagogical
Agent
for
Informal
Science
Education
on
Learner
Behaviors
and
Self-‐efficacy
Artificial
Intelligence
in
Education:
16th
International
Conference,
AIED
2013,
Memphis,
TN,
USA,
July
9-‐13,
2013.
Proceedings
(pp.
309-‐318).
Berlin:
Springer.
Leelawong,
K.,
&
Biswas,
G.
(2008).
Designing
Learning
by
Teaching
Agents:
The
Betty's
Brain
System.
International
Journal
of
Artificial
Intelligence
in
Education,
18(3),
181-‐208.
Luckin,
R.,
Holmes,
W.,
Griffiths,
M.,
&
Forcier,
L.
B.
(2016).
Intelligence
Unleashed.
An
argument
for
AI
in
Education.
London:
Pearson.
Ma,
W.,
Adesope,
O.
O.,
Nesbit,
J.
C.,
&
Liu,
Q.
(2014).
Intelligent
Tutoring
Systems
and
Learning
Outcomes:
A
Meta-‐Analysis.
Journal
of
educational
psychology,
106(4),
901-‐918.
doi:
http://dx.doi.org/10.1037/a0037123
Nesbit,
J.
C.,
Adesope,
O.
O.,
Liu,
Q.,
&
Ma,
W.
(2014).
How
Effective
are
Intelligent
Tutoring
Systems
in
Computer
Science
Education?
Paper
presented
at
the
IEEE
14th
International
Conference
on
Advanced
Learning
Technologies
(ICALT),
Athens,
Greece.
Nesbit,
J.
C.,
Liu,
L.,
Liu,
Q.,
&
Adesope,
O.
O.
(2015).
Work
in
Progress:
Intelligent
Tutoring
Systems
in
Computer
Science
and
Software
Engineering
Education.
Paper
presented
at
the
122nd
American
Society
for
Engineering
Education,
Seattle.
O'Shea,
T.
(1982).
A
Self-‐Improving
Quadratic
Tutor.
In
D.
Sleeman
&
J.
S.
Brown
(Eds.),
Intelligent
Tutoring
Systems:
Academic
Press.
Pane,
J.
F.,
Griffin,
B.
A.,
McCaffrey,
D.
F.,
&
Karam,
R.
(2014).
Effectiveness
of
Cognitive
Tutor
Algebra
I
at
Scale.
Educational
Evaluation
and
Policy
Analysis,
36(2),
127-‐144.
doi:
http://dx.doi.org/10.3102/0162373713507480
Schroeder,
N.
L.,
Adesope,
O.
O.,
&
Gilbert,
R.
B.
(2013).
How
Effective
are
Pedagogical
Agents
for
Learning?
A
Meta-‐Analytic
Review.
Journal
of
Educational
Computing
Research,
49(1),
1-‐39.
doi:
http://dx.doi.org/10.2190/EC.49.1.a
Steenbergen-‐Hu,
S.,
&
Cooper,
H.
(2013).
A
Meta-‐Analysis
of
the
Effectiveness
of
Intelligent
Tutoring
Systems
on
K–12
Students’
Mathematical
Learning.
Journal
of
educational
psychology,
105(4),
970-‐987.
doi:
http://dx.doi.org/10.1037/a0032447
9
Steenbergen-‐Hu,
S.,
&
Cooper,
H.
(2014).
A
meta-‐analysis
of
the
effectiveness
of
intelligent
tutoring
systems
on
college
students’
academic
learning.
Journal
of
educational
psychology,
106(2),
331-‐347.
doi:
http://dx.doi.org/10.1037/a0034752
VanLehn,
K.
(2011).
The
Relative
Effectiveness
of
Human
Tutoring,
Intelligent
Tutoring
Systems,
and
Other
Tutoring
Systems.
Educational
psychologist,
46(4),
197-‐221.
doi:
http://dx.doi.org/10.1080/00461520.2011.611369
Woolf,
B.
P.,
Lane,
H.
C.,
Chaudhri,
V.
K.,
&
Kolodner,
J.
L.
(2014).
AI
Grand
Challenges
for
Education.
AI
Magazine,
34(4),
66-‐84.
doi:
http://dx.doi.org/10.1609/aimag.v34i4.2490
10