Shank y Brown - Cap 13
Shank y Brown - Cap 13
Shank y Brown - Cap 13
13
Becoming a Personal
Reviewer
205
214
216
217
Ar c K ^ M M L * c
As you have been working through the chapters in this book, you have been building upon and strengthening your grasp of educational research literacy. You now
are in possession of the most basic and most important foundational insights and
skills needed for understanding educational research articles. How well you refine
and expand these basic and foundational skills is up to you. In other words, you have
covered all the basic steps needed to be a consumer of educational research. From
now on, your path will be that of a critic.
Here is the difference between a consumer and a critic of educational
researchconsumers understand an article and critics engage that article. Understanding per se is still somewhat passive and accepting. With engagement comes the
added responsibility of digging deeper and holding research to higher standards.
Let us take a moment to think about the issue of standards. Researchers and
journal editors alike are committed to the notion of publishing the highest quality
research they can jointly produce. That is why most journal articles undergo a referee process. When an article is refereed, this means that the editors have sent the
article to two or more reviewers for critical reading. These reviewers are conten
experts or methods experts, or both.
Good reviewers are excellent critical readers. Of course, they start with the
goal of extracting and then understanding the basic information from articles. But
they go beyond this consumer task by taking responsibility for evaluating articles.
Are these articles clear and well written? Are their argunients plausible and logical?
Are their research questions important and relevant? Are the methods used in the
articles sound? Were their analyses done correctly? Do the methods and analyses
really address the research questions raised? Are the discussions and conclusions
warranted and valid?
The referee process is an important part of educational research. First of all, it
provides a fair and equitable way to address the fact that only so many articles can be
published a year in a given journal. Important journals often get ten times (or more)
more submissions than they can possibly publish. Therefore, most of the articles
submitted for publication have to be rejected. Editors depend upon the review process to sort through submissions and then select the relative handful of best articles
that can then be published.
Referees do more than just sort articles into accept or reject piles. Even those
articles which are destined to be published usually need some revisin. Reviewers
point out reas that need to be clarified, and even mistakes that need to be fixed.
Reviewers also serve as the collective voice of the field. Are the researchers being
too timid in their claims? Are they being too bold in their declarations? Are there
reas that the researchers have ignored or failed to take into account in their work?
Are issues raised by the researchers really important or are they trivial?
Reviewers are the f ront Une of the voice of the field. It is their job to make sure
that articles address important issues, that questions are raised in a clear manner
and supported in a clear and logical fashion. It is their job to make sure that methods
match questions, and that analyses are done correctly. It is their job to comment
upon and evalate discussions and conclusions. But too often, we are content to let
the review process end there. There is a need for all readers to go beyond just being
good consumers of research. It is our job to pick up where the official review process has left off and to ask the same sorts of questions of articles that referees have
raised. We cannot expect every referee process to be perfect, or for every referee
to catch every important issue. We need to realize that it is our job to be personal
reviewers as well. We take the review process as we find it, and we extend our evaluation of the material based on our own reas of expertise and skills.
In this chapter, we will examine the process of becoming a personal reviewer.
A good reviewer is always asking questions. There are two types of questions we
need to ask as we review articles. First of all, there are questions we need to raise
and the articles need to answer. Finally, there are those questions we need to ask of
ourselves in order to be sure that we are capable of reading and understanding each
article on its own terms:
it is up to the researchers to make the case for relevance in the article itself. If we
can find no reason for a particular article or body of work to be cited, then it should
be eliminated.
The final dimensin is timeliness. This is a problem with far too many research
articlesthe research that they cited is just too od. Research is an ongoing process.
If a particular piece of research is, say, ten years od, then there is a good chance that
there has been more research in the field that has altered the way we might look at
the topic. While it is true that certain benchmark studies are timeless in a way, far
more research comes with an implicit expiration date. If all of the work cited is, say,
more than five years od, then this suggests that the researchers are not as current
as they might be on this topic.
There is, however, one additional caveat to these guidelines. Theoretical work,
by and large, has a longer "shelf life" than research findings, per se. It is not unusual
for researchers to cite theoretical pieces that are ten years od, or older. This same
grace period does not apply to research articles that are based on these theoretical
perspectives, however. The researchers need to show that they are conversant with
the latest relevant research to come out of those perspectives.
Do the Findings n This Article Address the Issues That It Raises?
Is there a match between questions raised and the methods used to answer them?
Sometimes, researchers shy away from asking certain questions because the methods
needed to answer them might be too complex. At the other end of the spectrum, sometimes we find researchers who are seemingly infatuated with certain complex methods, who end up making their research needlessly complicated. Both extremes are
obviously to be avoided. The best way to make sure that questions and methods match
up is to be as clear as possible in the planning and execution of research. As reviewers,
we need to be on the lookout for these sorts of problems in research articles.
Are the Measurements and Findings of This Article Reliable?
Any decent measurement book can point you to the details of reliability. Here, we
will look at the overall conceptual framework for reliability.
The key notion in reliability is accuracy. Is our measuring system or device
accurate? Here is a simple examplesuppose we are trying to measure the height
of a building. We start by finding a yardstick, and using it to measure the height. But
we need to make sure our yardstick is okay first. Is it actually a yard long? Are the
feet and inches marked off correctly? Is the yardstick constant under use, or does it
bend or warp or stretch or shrink as we are using it? We cannot trust our measurement of the height of the building unless we also trust our yardstick. We need to
ascertain that it is a reliable measuring instrument.
In the same way, we worry about the reliability of our measures in educational
research. However, the things we often measure are a lot more "slippery" than something as obvious as, say, height. Think for a moment, for example, how much harder
it is to pin down something like intelligence or motivation or aptitude. We measure
these things indirectly. For instance, we have tests of intelligence and aptitude, and
inventories to assess motivational level.
PtkSONAL
REV
If researchers are measuring things, then we need to see how they have
addressed possible reliability issues. Do they talk about reliability directly? How
have they attempted to assess reliability? One of the most common ways is to report
reliability coefficients. These involve making mltiple measures of the same thing,
and then correlating those measures to see how cise they are. Reliability coefficients ought to be very high, hopefully in excess of +0.90.
Another type of reliability measure is internal consistency. With interna! consistency, we are trying to see how much any given item on a test acts like the test as
a whole. If all the tems are measuring in the same way, then this is good evidence of
reliability. The most common measure of internal consistency is a type of correlation
known as a Crochach alpha.
If you suspect that there are reliability issues with any particular article, then
you need to look deeper into the technical aspects of reliability in order to see if your
suspicions are correct.
Are the Measurements and Findings of This Article Valid?
Again, validity is a topic that is covered in depth in most introductory works on measurement. We will be taking a conceptual look here as well.
The idea of validity is also very simple are the researchers actually measuring what they claim to measure? Note that we cannot have a valid measure unless we
are measuring accurately in the first place. If we are not measuring accurately, then
we might as well be just writing down numbers. But just because we are measuring
accurately does not guarantee we are measuring what we think we are measuring.
To make sure our measurements are on target, we have to pay attention to validity.
Like reliability, the targets of validity are also often measured indirectly. The
best way to establish validity is to compare our measurements against some kind
of standard. Some of these standards include expert judgment, other measuring
devices which have proved to be valid themselves, or external criteria. Like reliability, correlations are often used to establish validity. Validity correlations tend to be
lower than reliability correlations, but they should still be somewhere in the +0.70
range in order to be considered good.
If you suspect that there are validity issues with any particular article, then
you need to look deeper into the technical aspects of validity in order to see if your
suspicions are correct.
Are the Statistical Tests Overly Sensitive, so That They
Might Be ''Finding" Results That Are Not Really There?
In statistical parlance, this is known as a Type I error. People often make this idea
harder than it really is, so let us take a moment to get clear on it.
We start by remembering that statistical tests are almost always designed to
give us "yes" or "no" answers to our questions. But since those answers are based on
probability, then they are really "probably yes" or "probably no."
The key question is this how "probably" is "probably yes" or "probably no"?
Ideally, we want to have "almost certainly and positively yes" or "almost certainly
and positively no." The closer that we get to either "almost certainly and positively
yes" or "almost certainly and positively no" the more confident we are.
TTT C K A
As we have discussed before, p < .05 is a commonly agreed upon standard for
acceptable probability in these cases. If we are testing at the p < .05 level, then we
are roughly 95% correct when we decide any differences we find are real differences,
and not just sampling variation. But, 95% correct can also be 5% wrong. There is
always a slight chance that we are claiming to find a difference where none really
exists. This is called a Type I error, or a confidence error. Sometimes, a Type I error
is also described as a "false positive."
Since we are dealing with probabilities, we can never elimnate the chance of a
Type I error once and for all. What, then, is an acceptable chance to take when measuring? Is it always p < .05? Or can it be lower? Or higher? Most researchers agree
ihatp < .05 is as high a chance as researchers ordinarily ought to take. Type I errors
are serious problems when we claim that there is an effect where there really is
none, this can mess up lots of things. It is not uncommon to seep < .01 or even lower
when the stakes are high for mistakes. For example, if we are testing a cncer drug,
we want to be really sure that it works, since lives will depend on it.
As personal reviewers, we need to decide on whether probability levis in a
study are properly set to address the potential seriousness of a Type I error. We also
need to check and see if there are flaws in the research design that might unintentionally increase the odds of a Type I error. The most common culprit here is the
repeated use of simple probability tests. Here is an extreme case. Suppose you are
looking to see if some treatment is having an effect. You measure it 20 times. One of
those times, it is significant at the p < .05 level. Should we put any faith in that finding? Of course not. The laws of chance tell us that we should not be surprised if one
of these 20 tests turned out significant. Most of the time, design errors will not be
this blatant. Nonetheless, we still need to be vigilant, to make sure the designs are
giving us the tests that they promise.
BECOMING
PERSONAL
RE
may have been strong enough to turn up on the statistical "radar screen," but the non
significant effects could have been too subtle or too weak to show up on the tests that
were used. Sometimes, we hear researchers say that a finding "approached significance." This might also be interpreted as a possible Type II error.
How do researchers elimnate Type II errors? There are two standard
approaches they can use. First of all, the researchers can lower the probability level
of the test from, say, p< .05 to p< .10. This is not usually a good idea, since it immediately raises the odds of committing a Type I error. Since Type I errors are usually more serious than Type II errorsbecause if we do not see something once,
it should likely turn up sooner or later in future research if it is really therethen
this strategy often moves us in the wrong direction. A better alternative is simply
to increase the sample size, if possible. The larger the sample size, the more likely
we are to find even small effects. As readers, if we suspect Type II errors, then we
should immediately look at the sample size and see if it should really be bigger.
BECOMING
researchers who hold that, since the goals and methods of qualitative research are
so different, they need to be evaluated under their own terms.
Even when the notion of alternative evaluation criteria is raised, there are
differences as well. Some researchers offer concepts that are similar to reliability,
validity, and generalizabilityconcepts like triangulation and trustworthiness (see
Denzin & Lincoln, 2000, for a number of discussions on these and related topics).
Others feel we should start from scratch and develop evaluative criteria from the
assumptions of the method itself (see Shank, 2006). For now, personal reviewers are
probably best served by examining the evidence from all sides, and making up their
own minds.
PERSONAL
RE
changed a variety of aspects of an industrial setting, looking lor the relative impact of
these sorts of change. What they found was that change itself, from whatever source,
had an impact. That is, just the mere fact of being in a research study changed the way
that the subjects went about their tasks and looked at the way they did the things they
were asked to do. As a result, personal researchers need to be on the alert for the ways
that researchers strive to make conditions as natural as possible for their participants.
In addition to the chronic problem of the Hawthorne effect, the presence of
any sort of acute novelty can change the results of a study. Were there unusual events
or disruptions that happened during the course of the study, and did these affect
results? For instance, something as blatant as a fire drill will invaldate a research
setting. What about things that are more subtle, however? Did a honking horn disrupt students for a moment on a timed test? Was the surprise of seeing herself on
camera enough to cause a participant to pause or lose her train of thought? These
sorts of effects are very difficult for personal reviewers to discover, but they are part
of the landscape of potential problems, nonetheless.
Finally, we need to realize that research is not conducted by machines, but by
people. Are the people doing the research doing the same thing in each condition? Are
the subjects exposed to the same people, regardless of conditions? If not, the researchers run the risk of introducing unwanted experimenter effects. For example, if Jolly Joe
is monitoring the experimental group, and Sourpuss Sally is doing the same for the
control group, are the people affected by these radically different experimenters?
Personal reviewers should always be alert to the possible effects of circumstantial conditions. Our efforts are greatly helped by an extensive description and
discussion of all research conditions and methods, so that we can be on the alert for
possible sources of change that the researchers themselves might not have seen.
Also, we should be alert to the presence of standardization of treatments and conditions. Any variation from those standards needs to be explained and justified, and
their possible effects should have been considered by the researchers.
Finally, time can have a powerful effect on a study in a variety of ways. Time is
certainly a factor in pretest and posttest designs. Each time period has its own danger
zones. Is the pretest sensitizing participants to be on the lookoutfor a certain type of
skill or body of knowledge? That is, have we put them more "on the alert" than they
would be under ordinary circumstances?
For example, suppose we are looking for the effects of good manners by a
teacher in math instruction. If we ask pretest questions about the manners of the
teacher, then we run the risk of bringing the issue of manners into the awareness of
students when ordinarily these same students would never give a conscious thought
to manners when thinking about their math teacher. In a sense, researchers have to
be careful about "priming the pump" and personal reviewers need to be sensitive to
the possibilities of such "priming."
At the other end, sometimes the posttest has an effect of its own as well. Suppose, in the manners study from above, students are only asked about manners in
a posttreatment survey. It is not unusual to suppose that students might have never
thought about manners on their own, and so the posttest might really be "finishing
the lesson" by bringing the topic of manners into direct awareness. If students are
not asked about manners, would this awareness arise on its own? Quite possibly not.
Changing awareness is probably a good thing, but in this case it could contamnate
results. Researchers and personal reviewers both need to be aware of this sort of
possibility, and to consider its potential impact.
PLORING
EDUCATIONAL
RESEARCH
LITERACY
Research is always conducted in time and space, and so there is always a history dimensin to any research study. Ideally, research is conducted under ordinary
times. But sometimes times are not ordinary, in ways that might affect research.
Suppose, for example, researchers are interested in the effects of a peace education module on high school students. It could matter very much if these modules
were conducted in peacetime vs. times of unrest and war. Researchers and personal
reviewers both need to keep an eye on headlines to see how they might affect the
research being conducted.
Variables are measured in time, and so we need to be aware of how the time
of measurement might affect results in unanticipated ways. In particular, it is very
important when the dependent variable is measured. If it is measured too soon after
the treatments have been introduced, then there is the chance that it has not "soaked
in" long enough. If too much times passes before the dependent variable is measured,
then researchers are running the risk that other conditions or factors might intervene and interfere with the measurement of the treatment's effect on the dependent
variable. Researchers need to determine the precise best time to measure dependent
variables in order to find results, and personal reviewers need to be sensitive to this
decision-making process as well, in order to judge the researchers' judgments.
Addressing the issues of generalizability and ecological validity has been a
critical part of evaluating educational research for decades. The classic source of
Information is Campbell and Stanley (1966), which contines to be highly regarded
to this day and well worth reading for any serious personal reviewer.
Does the Article FulfII Its Potential?
This is a critical perspective that is particularly pertinent to qualitative articles. As we
noted earlier, the problem with most bad qualitative research is the simple fact that it
is not very good. In other words, these articles fail to live up to their potential.
In an earlier work (Shank, 2006), Shank Usted seven "deadly sins" that are
often pitfalls that prevent qualitative research efforts from reaching their full potential. These deadly sins are, in brief:
BECOMING
PERSONAL
REV
then this is not the end of the world. It simply means that matters are more
complex than we currently understand them. And is not this, in fact, one of the
points of doing research? Appropriation can be alleviated somewhat by fostering trust in the fact that research itself will often show us the way to make
things better.
Rigidity. Rigidity occurs when researchers are so determined to answer a particular research question in a particular way, or to apply a particular method
with total fidelity, that the overall goals for doing research end up taking second place. Qualitative research is tactical. We often do not know what to expect
when we get down to details or set out in the field. We need to embrace and use
ambiguity and change and surprise, not try to work it out of the equation ahead
of time. Rigidity can be alleviated somewhat by always remembering that flexibility is a cardinal principie of qualitative research, and a cardinal virtue of
qualitative researchers.
Superficiality. Superficiality occurs when researchers renege on the promise
of qualitative research to dig deeper and find things that are fresh and new
and that alter the way we understand phenomena. Instead, researchers take
a safe or even lazy path. And as a result, they do not tell us anything that we
do not already know. We do not need, for example, qualitative research to
tell us that poverty makes people desperate and unhappy, and that teachers
who care about students often are the best teachers. While these beliefs are
important, they are well accepted and well grounded in our culture. Good
qualitative researchers often take these commonplace sorts of understandings and apply the "well but" test to themwell, but what happens when
we look at them closer, or under different or unusual circumstances? Such
research often enriches or even challenges our beliefs, and the educational
community is often ultimately better for it. The cure for Superficiality, then,
is curiosity. Curiosity will naturally pul us away from Superficiality and into
depth.
Sentimentality. Sentimentality is just another ame for inauthenticity. When
something is inauthentic, then it is not true to its nature. We are simply using
something by manipulating it for effect, rather than looking at it on its own
terms. Sometimes Sentimentality is corruptimagine the movie producer who
tells the scriptwriter to kill off the family pet in the last scene so that people can
get a good cry. More often than not, Sentimentality is unintentional. Researchers want to show that poverty is bad, and that racism is harmful, and that men
and women should be treated equally, and so on. Fair enough, but these issues
per se are not the topics of research. Creating cartoon pictures and straw men
and sound hites will not further our understanding, even if it rallies people to
our cause. Good qualitative research can tackle important social causes, but
it needs to do so with an eye for seeing things that have never been brought to
light before. The best path to the perspectives we need to see things in fresh
ways is empathy. Empathy is where we understand things truly from another
perspective. If we let those other perspectives lead the way, instead of trying
to maniplate our readers and how they will react to what we say, then we are
on far firmer ground.
Narcissism. Narcissism occurs when a piece of qualitative research ends up
being about the researcher. Everything is turned inward, and everything
is funneled through a personal framework. We have to be careful not to
-p-rcnri N G E D U C A T I O N A L R E S E A R C H L I T E R A C Y
Personal reviewers need to be alert to the negative effects of these "deadly sins" in
qualitative work. We are also well served to think of ways that existing qualitative
research could be improved by moving away from these "sins" and then doing our
part to encourage researchers to move in more fruitful directions.
BECOMING
PERSONAL
REV
AI
BECOMING
TrETTTnrCTTTT I t K A
PERSONAL
REVI
We may be like those people who stand behind us when we are doing a crossword
puzzlewe might immediately see something that has "slipped through the cracks"
of reviewers who have poured carefully over the work. For whatever reasons, it is
important for us to be able to give a work our own personal stamp of approval. And
who knows? As we get better and better at this, someday we might be asked to review
articles for potential publication. All our work in personal reviewing can only help us
be that much more ready.
Can you think of several real-world examples of a Type I error? A Type II error?
Discuss your examples with your classmates.
You have already selected at least one quantitative and one qualitative article
in a field of your nterest. We will look at these articles from the perspective of
a personal reviewer. For these articles, consider the following issues: Are the
research questions easy to find? Are the references timely and pertinent? What
are the greatest reas of strength for each article? How might each article be
mproved? How might you have done these studies differently?
Look at your quantitative articles and consider the following points: How well
did the researchers address issues of reliability and validity? Are there any
potential problems with generalizability? Ecolgica! validity? Discuss your
thoughts with your classmates.
Look at your qualitative articles and consider the following points: How well
did the researchers address issues of reliability and validity? Are there any
potential problems with generalizability? Ecolgica! validity? Discuss your
thoughts with your classmates.
Take a look at the results and discussion for each article, and see how well
integrated the articles really are. Are the results clearly linked to the research
questions? What are the results telling you? Did the discussion capture everything the results seemed to say, or are there missing points? Does the article,
taken as a whole, make sense to you?
Crate a personal nventory for future growth as a personal reviewer. What
content reas do you need to actively pursue? What resources will you use?
How can you faciltate your technical growth? What strategies and resources
are best suited for you in this rea?