Bernard - Analyzing Qualitative Data - CAP5
Bernard - Analyzing Qualitative Data - CAP5
Bernard - Analyzing Qualitative Data - CAP5
Qualitative Data
Systematic Approaches
Second Edition
H. Russell Bernard
Arizona State University
University of Florida
Amber Wutich
A rizona State University
GeryW. Ryan
RAND Corporation
($)SAGE
Los Angeles 11..ondoo I New Delhi
S.,gapore I Washington OC I Melboume
Digitized by Original from
UNIVERSITY OF MICHIGAN UNIVERSITY OF MICHIGAN
fSAGE
FOR INFORMATION: Copyright © 2017 by SAGE Publications, inc.
SAGE Publications, lnc. AII rights reserved. No part of this book may be reproduced
2455 Teller Road or utilized in any form or by any means, electronic or
Thousand Oaks, California 91320 mechanical, including photocopying, recording, or by
any information storage and retrieval system, without
E-mail: order@sagepub.com
permission in writing from the publisher.
SAGE Publications India Pvt. Lid. Wutich, Amber, author. 1 Ryan, Gery Wayne, author.
B 1/11 Mohan Cooperative Industrial Area
Title: Analyzing qualitative data : systematic approaches /
Mathura Road, New Delhi 11 O044 H. Russell Bernard, Arizona State University,
India University of Florida, Amber Wutich, Arizona State
University, Gery W. Ryan, RANO Corporation.
SAGE Publications Asia-Pacific Ple. Ltd.
Description: Thousand Oaks, California : SAGE, (2017) 1
3 Church Street Earlier edition: 201 O. l lncludes bibliographical references
#10-04 Samsung Hub and index.
Singapore 049483 ldentifiers: LCCN 2016000036 I ISBN 9781483344386
(pbk. : alk. paper)
Subjects: LCSH: Social sciences-Research-
Methodology. 1 Qualitative research. 1 Quantitative research.
Classification: LCC H62.8438 2017 1 DDC 001.4/2-dc23
LC record available at http://lccn.loc.gov/2016000036
~ ~flfleI
SUSTAINASLE PrOG'locin,g $11S1Ml1blt Fo,e:wy
Copy Editor: Carole Bernard www.1fiprogram..ocg
SFl-01268
Typesetter: C&M Digitals (P) Lid. SFI label applles to tex1 siock
IN THIS CHAPTER
Authors' note: We rely heavily ín this chapter on our article Ryan and Bernard, Field Methods
15(1): 85-109. Copyríght © 2003 Sage Publications.
♦ INTRODUCTION
Analyzing text involves five complex tasks: (1) discovering rhemes and subthemes;
(2) describing the core and peripheral elements of themes; (3) building hierarchies of
themes or codebooks; (4) applying themes-that is, attaching them to chunks of actual
text; and (5) linking themes into theoretical models.
In this chapter, we focus on the first task: d iscovering themes and subthemes.
Then, in Chapter 6, we discuss methods for describing themes, building codebooks,
and applying themes to text. In Chapter 7, we move on to building models.
In Chapter 19, we'II show you how to use sorne computer methods (cluster anal-
ysis and multidimensional scaling) to find themes in text. In chis chapcer, we'II focus
on techniques, like line-by-line analyses, that (so far) only people can do, and on
simple word counts that can be done by a computer and that support human coders
in their search for themes in texts. Each technique has advantages and disadvantages.
As you 'il see, sorne methods are better for analyzing long, complex narratives, whiJe
others are better for short responses to open-ended questions. Sorne require more
labor and skill, others less (see Box 5.1).
But first ...
Computer analysis oftext has been going on since the 1960s (see Ogilvie et al. 1966),
but things really got rolling in the 1990s (Salton et al. 1996) and this now is a
fast-moving area of artificial intelligence in engineering and informatics. These new
systems can read free text created by physicians and offer support for clinical decisions
for specific illnesses, like pneumonia and cervical cancer (Aronsky et al. 2001;
Wagholi kar et al. 2012).
They can do this because the guidelines for diagnosis of these illnesses are so clearly
established. There are, after all, only so many words in the vocabulary of diagnosis that a
physician can choose from to diagnose any particular illness. There is a long way to go
before machines replace human coders in parsing texts about human experiences (see Noll
et al. 2013), but work on this problem is advancing quickly, with obvious applications in the
social sciences. Many programs are available today that plow through mountains of text-all
of Shakespeare's work, for example, or tens of thousands of blog pages-and isolate poten-
tial themes. (See Box 11.5 for more on automated content analysis.) (Further Reading:
automated text analysis)
WHAT'S A THEME? ♦
This question has a long history. Thompson (1932-36) created an index of folktale
motifs, or themes, that ftlled six volumes. In 1945, Morris Opler, an anthropologist,
made the identification of themes a key step in analyzing cultures. He said:
Opler established three principies for analyzing themes. First, he observed that
themes are o nly visible (and thus discoverable) through the manifestation of expres-
sions in data. And conversely, expressions are meaningless without sorne reference
to themes. Second, Opler noted that sorne expressions of a theme are obvious and
culturally agreed on, while others are subtler, symbolic, and even idiosyncratic.
And third, Opler observed that cultural systems comprise sets of interrelated themes.
The importance of any theme, he said, is related to (1) how often it appears; (2) how
pervasive it is across different types of cultural ideas and practices; (3) how people react
when the theme is violated; and (4) the degree to which the number, force, and variety
of a theme's expression is controlled by specific comexts (see Box 5.2).
Today, social scientists still talk about the linkage between themes and their expressions, but
use different terms to do so. Grounded theorists talk about "categories" (Glaser and Strauss
1967), "cedes" (Miles and Huberman 1994), or "labels" (Dey 1993:96). Opler's "expressions"
are called "incidents" (Glaser and Strauss 1967), "segments" (Tesch 1990), "thematic units"
(Krippendorf 1980), "data-bits" (Dey 1993), and "chunks" (Miles and Huberman 1994). Lincoln
and Guba refer to expressions as "units" (1985:345). Corbin and Strauss (2008:51) call them
"concepts" that are grouped together in a higher order of classification to form categories.
Here, we follow Agar's lead (1979, 1980) and remain faithful to Opler's terminology. To
us, the terms "theme" and "expression" more naturally connote the fundamental concepts
we are trying to describe. In everyday language, we talk about themes that appear in texts,
paintings, and movies and refer to particular instances as expressions of goodness or anger
or evil. In selecting one set of terms over others we surely ignore subtle differences, but the
basic ideas are justas useful under many glosses.
Induced themes come from data, while a priori themes (also called deduced
thernes) come from prior understanding of whatever phenomenon we are srudying.
A priori themes come from characteristics of the phenomena being studied-what
Aristotle identified as essences and what dozens of generations of scholars since have
relied on as a first cut at understanding any phenomenon. lf you are studying the night
sky, for example, it won't take long to decide that there is a unique, large body (the
moon), a few small bodies that don't twinkle (planets), and millions of small bodies
that do twinkle (stars).
A priori themes can come from the literature about a tapie; from local, commonsense
constructs; and from researchers' values, theoretical orientations, and personal expeti-
ences (Bulmer 1979; Maxwell 1996; Strauss 1987).
The decisions about what tapies to cover and how best to query people about
those tapies are a rich source of a priori themes (Dey 1993:98). Jn fact, the first pass
at generating themes often comes from the questions in an interview protocol (Coffey
and Atkinson 1996:34). Even with a fixed set of open-ended questions, there's no way
to anticipare ali the themes that wiJl come up befare you analyze a set of texts (Dey
1993:97-8).
Andriotis (2010), for example, explored tl1e way newspapers reported the activities
of British spring-breakers at Greek resorts. A review of the literature on spring-break-type
behavior across the world tumed up four major themes: aJcohol consumption (and espe-
cially binge drinking), drug use, sexuaJ behavior, and other risk taking (like unprotected
sex, ledge walking, stunt driving, etc.). After preliminary coding, however, Andriotis
dropped drug use (it was reported rarely in the 186 newspaper articles he was studying)
and added a new theme: host community reaction to the behavior of the tourists.
The act of discovering themes is what grounded theorists call open coding, and
what classic content anaJysts cal! qualitative analysis (Berelson 1952) or latent coding
(Shapiro and Markoff 1997). There are many recipes for arriving at a preliminary set
of themes (Tesch 1990:91). We'II describe eight observational techniques-things to
look for in texts-and four rnanipulative techniques-ways of processing texts. These
12 techniques are neither exhaustive nor exclusive. They are often combined in prac-
tice. (Further Reading: fmding themes)
Looking for themes in written material typically involves pawing through texts and
marking them up, either with different colored pens or by swiping words and phrases
in different colors on the compucer screen. Sandelowski (1995b:373) says that cext
analysis begins wich proofreading the material and simply underlining key phrases
"because they make sorne as yet inchoace sense." For recorded interviews, the process
of identifying themes begins with the acr of transcription. Whether the data come in
che formar of video, audio, or written documents, handling them physically is always
helpful for finding chemes.
Here's what to Iook for:
1. Repetitions
"Anyone who has listened to long stretches of talk," says D'Andrade, "knows how
frequently people circle through che same network of ideas" (1991 :287). Repetition
is easy to recognize in rext. Claudia Strauss 0992) did severa! in-depth interviews
with Tony, a retired blue-collar worker in Connecticut. Tony referred again and again
to ideas associated with greed, money, businessmen, siblings, and "being different."
Strauss concluded tl1at chese ideas were importanc chemes in Tony's (jfe_ To get an
idea of how these ideas were related, Strauss wrote them on a piece of paper and
connected them with lines to snippets of Tony's verbatim expressions-much as
researchers today do with text analysis software.
Owen (1984) used repetition and forcefulness as indicators of a tl1eme in his study
of narratives about family relationships. If a concept occurred at different places in a
narrative and was emphasized by the informant (in vocal inflection, or dramatic pauses
or volume), then he took chat as evidence of a tl1eme. Owen distinguished between
whac he called recurrences and repetitions-where recurrences are different uses of a
concept or theme in a narrative, using different words-and repetitions involve the use
of che same words for a concept-but tl1e idea is the same: The more the same con-
cept occurs in a text, che more likely it is a theme. How many repetitions makes an
important theme, however, is a question only you can decide.
other about their experiences, they kept mentioning che idea of "making a flop,"
which turned out to be che local term for finding a place to sleep for che night.
Spradley searched through his recorded material and his field notes for statements
about making a flop and found chat he could categorize them into subthemes such
as kinds of flops , ways to make flops, ways to make your own flop, kinds of people
who bother you when you flop , ways to make a bed, and kinds of beds. Spradley
returned to his informants and asked for more information about each of che
subthemes.
For other classic examples of coding far indigenous, categories, see Becker's
(1993) description of medica! students' use of the word "crock" and Agar's (1973)
description of drug acldicts' understandings of what it means to "shoot up."
4. Transitions
Naturally occurring shifts in content may be markers of themes. In written texts, new
paragraphs may indicate shifts in topics. In speech, pauses, changes in tone of voice,
or the presence of particular phrases may indicare transitions and chemes.
D1g1tized by Original from
UNIVERSITY OF MICHIGAN UNIVERSITY OF MICHIGAN
Chapter 5 Finding 1bemes ♦ 107
What Glaser and Strauss (1967:101-16) labeled the "constant comparison method"
involves searching for similarities and differences by making systematic comparisons
across units of data. Typically, grounded theorists begin with a line-by-line analysis,
asking: "What is this sentence about?" and "How is it similar or different from the pre-
ceding or following statements?" This keeps the researcher focused on the data rather
than on theoretical flights of fancy (Charmaz 1990, 2000; Glaser 1978:56-72; Strauss and
Corbin 1990:84-95).
Here's an exchange from our study of what people say about helping the environ-
ment (Bernard et al. 2009):
The reference to asbestos is different from the reference to the toxic waste
roundup. On the other hand, asbestos is a toxic substance. At this point, we might
tentatively record "getting rid of toxic substances" as a theme.
Another comparative method involves taking pairs of expressions-from the same
informant or from different inforrnants-and asking: "How is one expression different
or similar to the other?" Here's another informant in our study of what Americans think
they can do to help the environment:
Interviewer: Any pressing issues that you can think of right now?
lnformant: well I don't know what you can do to solve it but the places for
hazardous waste are few and far between from what I under-
stand-that sorne people are dumping where they shouldn't
(pause) and I don't know what you can do because nobody
wants any of the hazardous wastes near them.
Dig1tized by Onginal from
UNIVERSITY OF MICHIGAN UNIVERSITY OF MICHIGAN
108 ♦ ANALYZING QUALITATIVE DATA
In comparing the two responses, we asked: "Is diere a common meme here, in
hazardous waste and toxic waste?" If sorne theme is present in two expressions, then
me next question to ask is: "Is mere any clifference in degree or kind in which che
theme is articulated in both of tl1e expressions?"
Degrees of strengdi in themes may lead to the naming of subthemes. Suppose you
compare rwo video clips ancl find that both express che theme of anxiety. Looking
carefully, you notice that anxiety is expressed more verbally in one clip and more
through subtle hand gestures in the other. Depending on che goals of your research,
you might code the clips as expressing the theme of anxiety or as expressing anxiety
in two different ways.
You can find sorne diemes by comparing pairs of whole texts. As you read a text,
ask: "How is this one different from tl1e last one I read?" and "What kinds of diings are
mentioned in bodi?" Ask hypodietical questions like: "What if the infom1ant who pro-
duced this text had been a woman inscead of a man?" and "How similar is this text to
my own experiences?" These hypothetical questions will force you to make comparisons,
which often produce moments of insight about themes.
Bogdan and Biklen (1982:153) recommend reading through passages of text and
asking: "What does this remind me of?" Below, we'll introduce more formal techniques
fer identifying similarities and differences among segments of text, but we always start
widi the infonnal methods, underlining, h.ighlight, and comparing.
6. Linguistic Connectors
Look carefully for words and phrases that indicate attributes and various kinds of
causal or conditional relations (Casagrande and Hale 1967).
Causal relations: "because" and its variants 'cause, 'cuz, as a result, since, and so
on. For example: "Y'know, we always take 197 diere 'cuz it avoids all diat traffic at die
mall." But notice the use of the word "since" in the following: "Since he got married,
ít's líke he forgot bis fríends." Text analysis that involves die search for linguistic con-
nectors like these requires very strong skills in the language of the text because you
have to be able to pick out very subtle differences in usage.
Conditional relations: In conditional relations, the occurrence of one thíng, A, is
conditional on another thing, B. This shows up as "if" or "then" (and if-then pairs),
"rather than," and "instead of." For example: "If you pass the bar exam on the first try,
you'll get lots of job offers." "You can drink a lot more [alcohol] if you coat your stom-
ach with milk first."
Taxonomic categories: The phrase "is a" (as in "a moose is a kind of mammal")
is often associated with taxonomíc categories: "Vitamin C is a great way to avoid colds."
Again, watch fer variants. Notice how the "is a" reJation is embedded in the following:
"When you come right down to it, lions are just big pussy caes."
Time-oriented relations: Look for words like "befare," "after," "then," and "next."
"There's a tríck to chat <loor. Turn che key all che way to che left, twice, and chen push
hard." The concept of time-ordered events and relations can be very subtle: "By che time
I bike home, l'm sweating like a pig." "It's so damn hot, your glasses fog when you go out."
X-is-Y relations: Casagrande and Hale (1967) suggested looking for attributes of
the fom1 X is Y: "Lemons are sour." "The Greek islands are still a bargain," "This is just
bullshit," "He's lucky he's alive."
Contingent re/ations: Look for phrases of che form if X, then Y follows, or X causes
Y or Y is caused by X: "If mortgage rates go above 7%, people will rent instead of
buying houses." "For a strong harvest, plant with the full moon." This relation can be
expressed in the negative, too: "They won't wear a condom, no matter what you do."
Spatial relations: Look for phrases of the form X is close to Y: "I found my way
around pretty good in the new place [supermarket] because stuff is together. Milk
and cheese and eggs and stuff are always together and ali that stuff is near the meat"
(see Box 5.3).
Operational definitions: X is a tool for doing Y: "You can use Excel to do basic stuff, but if
you really wanna work on text you gotta get a real program for that."
Example definitions: X is an instance of Y: "So now [referring to undergraduates] they're
using the Internet to find papers they can use; new technology, same old plagiarism."
Comparison definitions: X resembles Y: "Iraq is like Vietnam in sorne ways, but we need to
remember the differences."
Class inclusions: X is a member of class Y: "Geeks and nerds are both dorky, but a geek is
a nerd who can get hired."
Synonyms: X is equivalent to Y: "Telling me you can't afford to go is just a wimpy way of
saying kiss off."
Antonyms: X is the negation of Y: "Not picking up after your dog is the definition of a bad
neighbor" [the implication is that the act is the negation of "good neighbor"].
Provenience : X is the source of Y: "A foolish consistency is the hobgoblin of little minds"
(Emerson's famous dictum (see Emerson 1907:891).
Circularity: X is defined as X: "Yellow means like when something is lemon colored."
7. Missing Data
This method works in reverse from typical the me-ide ntification techniques.
Instead of asking "What is here?" we can ask "What is missing?" Women who have
strong religious convictions may fail to mentían abortion during discussions of
birth control. In power-laden interviews, silence may be tied to implicit or explicit
domination (Gal 1991). In a study of birth planning in China, Greenhalgh reports
that she could not ask direct questions about resistance to government policy.
People made "strategic use of silence," she says, "to protest as pects of the policy
they did not like" (1994:9). Obviously, themes discovered like chis need to be
looked at critically to make sure that we are not finding only what we are looking
far (see Box 5.4).
In Lyn Richards's pioneering study of a new suburb outside Melbourne, Australia, one of
the driving research questions was "How do residents in a new outer suburb cope with
isolation and loneliness?" This was to be a five-year study, but by the end of the first year
"none of those talking to the researchers were reporting that they were lonely" (Singh and
Richards 2003:10-11 ). Perhaps a year was just not enough for lonel iness to set in.
Perhaps informants were hiding something. Or was the theory wrong? Maybe nobody was
lonely in the new suburb? This challenge to the theory guided Richards in the subsequent
years of the study.
Gaps in texts may not indicare avoidance at ali, but simply what Spradley
(1979:57-58) called abbreviating-leaving out information that everyone knows. As
you read through a text, look far things that remain unsaid and try to fill in the gaps
(Price 1987). This can be tough to do. Distinguishing between when people are unwill-
ing to discuss a tapie from their simply assuming that you already know about it
requires a lot of familiarity with the subject matter. If someone says, ''.John was broke
because it was the end of the month," they're assuming that you already know that
many people get paid once a month and that people sometimes spend all their money
befare getting their next pay check.
When you first read a text, sorne themes wHI simply pop out at you. Highlight
them-with highlighters, if you prefer to work with paper, or in your text management
program. Then read the text again. And again. Look far themes in the data that remain
unmarked. This tactic-marking obvious themes early and quickly-forces the search
for new and less obvious themes in the second pass (Ryan 1999).
By definition, rich narratives contain information on themes that characterize the expe-
rience of informants, but we also want to understand how qualitative data illuminate
questions of theoretical importance. Spradley 0979:199-201) suggested searching
interviews for evidence of social conflict, cultural contradictions, informal methods of
social control, things that people do in managing impersonal social relationships,
methods by which people acquire and maintain achieved and ascribed status, and
information about how people salve problems.
Bogdan and Biklen (1982:156-162) suggested examining che setting and context, the
perspectives of the informants, and informants' ways of thinking about people, objects,
processes, activities, events, and relationships. Strauss and Corbin (1990:158-75) urge us
to be more sensitive to conditions, actions/interactions, and consequences of a phenom-
enon and to order these conditions ancl consequences into theories. "Moving across
substantive areas," says Charmaz, "fosters developing conceptual power, depth, and
comprehensiveness" 0990:1163).
There is a trade-off, of course, between bringing a lot of prior theorizing to the
theme-identification effort and going at it fresh. Prior theorizing, as Charmaz says
0990), can inhibir the forming of fresh ideas and the making of surprising connections.
And by examining the data from a more theoretical perspective, researchers must be
careful not to find only what they are looking for. Assiduous theory avoidance, on the
other hand, brings the risk of not making the connection between data and important
research questions.
The eight techniques just described require only pencil and paper. Next, we
describe four techniques that require more physical or computer-based manipulation
of che text itself.
Sorne techniques are informal-spreading texts out on the floor, tacking bunches of
them to a bulletin board, and sorting them into different file folders-while others
require software to count words or display word-by-word co-occurrences. And, as
we'll see, sorne techniques require a fair amount of skill in computer analysis. But
more of that later ...
After the inicial pawing and marking of text, cuuing and sorting involves identifying
quotes ar expressions that seem somehow important- these are called exemplars-
and then arranging the quotes/ expressions into piles of things that go together
(Lincoln and Guba 1985:347-51). By the way, this kind of work can be done with cards
ar with computer software. There is no right way to do it. What matters is that the
process feels comfortable and productive.
There are many variations on this technique. We cut out each quote (making sure to
maintain sorne of the comext in which it occurred) and paste the materiaJ on a smaH
index card. On the back of each card, we write down che quote's reference-who said it
and where lt appeared in the text. Toen we lay out the quotes randomly on a big table
and sort them inca piles of similar quotes. Toen we name each pile. These are the themes.
When it comes to pile sorting, there are two kinds of people: splitters and
lumpers. Spliners maximize the differences between items and generate more fine-
grained themes, while lumpers minimize the differences and identify more over-arching
themes. The objective is to identify the widest possible range of themes at the end of
the process. To accomplish this, sorne researchers find it best to split first and lump
later, while others find it best to lump first and split later.
In a project with two or three researchers, each member of the research team
should sort the exemplar quotes into named piles independently. This usually gener-
ates a longer list of themes than you get in a group discussion. After sorting the piles
independently, the researchers can decide together which piles can be merged, which
should be split, and which are good candidates far further analysis.
Barlán et al. (1999) interviewed clinicians, community leaders, and parents about
what physicians could say to adolescents, during routine well-child exams, to prevent
violence among youth. There were three questions at the center of the project: (1)
What could pediatricians potentially do to deal with youch violence? (2) What barriers
did they face? (3) What resources were available to help them?
Two coders read through the transcripts and pulled out all segments of text
associated with these questions. The two coders identified 84 statements related to
potential, 74 related to barriers, and 41 related to resources. Ali the statements were
pulled out and put anta cards.
Next, faur other coders independently sorted ali the quotes from each majar
theme into piles of things that they thought were somehow similar. Talking about what
the quotes in each pile had in common and naming those piles helped Barkin et al.
identify subthemes. In really large projects, have pairs of researchers sort the quotes
and decide on the names far the piles. Record and study the conversations that
researchers have while they're sorting quotes and naming themes in arder to under-
stand the underlying criteria they are using (see Box 5.5). (Further Re acling: pile
sorting [card sorting] far themes)
Pile sorts produce similarity data-that is, a matrix of what goes with what-and similarity
data can be analyzed with sorne formidable visualization methods, like multidimensional
scaling and cluster analysis. These methods let you see patterns in your data.
Barkin et al. (1999) converted the pile-sort data 199 statements (84 potential + 74
barriers + 41 resources) into a quote-by-quote similarity matrix, where the numbers in the
cells indicated the number of coders (O, 1, 2, 3, or 4) who had placed the quotes in the same
pile. They used multidimensional scaling and cluster analysis to identify groups of quotes
that the coders thought were similar.
More about matrix analysis, including multidimensional scaling and cluster analysis, in
Chapters 7 and 18.
Word lists and che key-word-in-context (KWIC) technique draw on a simple obser-
vation: If you want to understand what people are talking about, look closely at che
words they use. To generate word lists, you identify all the unique words in a text and
then count the number of times each occurs.
As pare of a 12-year longitudinal study, Thomas Weisner and Helen Gamier (1992)
told parents of adolescents: "Describe your children. In your own words, just tell us
about them." From the transcripts, Ryan and Weisner (1996) produced a list of ali che
unique words. Toen they counted the number of times each unique word was used
by mothers and by fachers. The idea was to get some clues about chemes that could
be used for coding the full texts.
Overall, che words that mothers and fathers used to describe cheir children sug-
gested that chey were concerned with their cbildren's independence and wich cheir
children's moral, artistic, social, athletic, and academic characteristics, bue mochers were
more likely to use "friends," "creative," "time," and "honest" to describe their cbildren
while fachers were more likely to use "school," "good," "lack," "student,'' "enjoys," and
"independent." Ryan and Weisner used chis information as clues for themes that they
would use later in actually coding che cexts. (Details about chis study are in Chapter 9.)
Word-counting techniques produce what Tesch (1990:139) called data condensa-
'tion or data distillation. By telling us which words occur most frequently, these
methods can help us identify core ideas in researchers a welter of data. But condensed
data like word lists and counts cake words out of their original context, so if you do
word counts, you'U also want to use a KWIC program.
Word co-occurrence, also known as collocation, comes from linguistics and semantic
network anaJysis. It's based on an observation, by J. R. Firth (1935, 1957), that many
words commonly occur with other words to farm an idea that would not be obvious
from the individual words-collocations like "green with envy," "shrouded in mystery,"
"maiden voyage," and "vaguely remember."
In 1959, Charles Osgood created word co-occurrence matrices-i.e., matrices that
show how often every pair of words co-occurs in a text-and analyzed those matrices to
describe the relation of major themes to one another. It was rather heroic work back then,
but computers have made the construction and analysis of co-occurrence and collocation
matrices easy coday and have stimulated the development of semantic network analysis
(Barnett and Danowski 1992; Danowski 1982, 1993). More about this, too, in Chapter 19.
12. Metacoding
and correspondence analysis-show graphically how units and themes are distributed
along dimensions and into groups or clusters. (More on multidimensional scaling in
Chapters 7 and 18.)
Jehn and Doucet (1996, 1997) asked 76 U.S. managers who worked in Sino-
American joint ventures to describe recent interpersonal conflicts with their business
partners. Each person described two conflicts: one with a same-culture manager and
another with a different-culture manager.
Two coders read the 76 intracultural and 76 intercultural conflict scenarios and
evaluated them on a 5-point scale for 27 themes that Jehn and Doucet had identified
from the literature on conflict. This produced two 76x27 scenario-by-theme matrices-
one for the intracultural conflicts and one for the intercultural conflicts. Jehn and
Doucet analyzed these matrices with factor analysis. This method reduced the 27
themes to just a handful. Jehn and Doucet then pulled out quotes from their original
data to illustrate the most irnportant themes.
Quotes that characterized the first factor for intercultural relations were: "There is
a lot of hace involved in this situation," and "Toe dislike is overwhelming," and "I was
very angry." Quotes that characterized the second factor were: "I was very frustrated
with my co-worker" and "Their inconsistencies really aggravated me." And quotes that
characterized the third factor were: "She's a bitch" and "We are constantly shouting and
screaming." Jehn and Docuet Jabeled these factors personal animosity, aggravation,
and volatility in intercultural business relations (1997:2).
Numerical methods like these work best when applied to short, descriptive texts
of one or two paragraphs. They tend to produce a limited number of large, meta-
themes, but these are just the kind of themes that may not be apparent, even after
a careful and exhaustive reading of a text. Metacoding is a nice addition to our
theme-finding too! kit.
Figure 5.1 and Table 5.1 lay out the characteristics of the techniques to help you decide
which method is best in any particular project, given your own time and skill con-
straints. Looking far repetitions and similarities and differences and cutting and sorting
can be applied to any kind of qualitative data and don't require special computer skills.
It is not surprising that these techniques are the ones used most frequently in qualita-
tive research.
. There are five things to consider in selecting one or more of these 12 techniques:
(1) the kind of data you have; (2) how much skill is required; (3) how much labor is
required; (4) the number and types of themes to be generated; and (5) whether you
are going to test the reliability and validity of the themes you produce.
Textual Data?
1
No (e.g. , sounds,
Yes
images, objects)
5. Similarities and
Differences Yes No
9. Cutting and
Sorting
Difficult Easy Easy
2. lndigenous 1. Repetitions 1. Repetitions
Typologies 5. Similarities and 5. Similarities and
3. Metaphors Differences Differences
6. Linguistic 9. Cutting and 9. Cutting and
Connectors Sorting Sorting
7. Missing Data Difficult Difficult
8. Theory-related 2. lndigenous 2. lndigenous
Material Typologies Typologies
1 O. Word Lists 3. Metaphors 10. Word Lists
and KWIC 7. Missing Data and KWIC
11. Word 8. Theory-Related 11. Word
Co-occurrence Material Co-occurrence
1O. Word Lisis
and KWIC
11. Word
Co-occu rrence
12. Metacoding
Expertise Number
of Type of
Labor Stage in Themes Theme
Technique lntensitv Language Substantive Methodological Analysis Produced Produced
1 Repetitions Low Low Low Low Early High Theme
2 lndigenous Low High Low Low Early Medium Theme,
Typologies Subtheme
3 Metaohors Low High Low Low Early Medium Theme
4 Transitions Low Low Low Low Early High Theme
5 Si mi larities Low- Low Low Low Early High Theme
and High
0ifferences
6 Linguistic Low High Low Low Late High Theme
Connectors
7 Missing Data High High High High Late Low Theme
8 Theory-Related Low Low High High Late Low Theme
Material
9 Cutting and Low- Low Low Low Early or Medium Theme,
Sorting High late Subtheme,
Metatheme
10 Word Lists Low Medium Low Low Early Medium Theme,
and KWIC Subtheme
11 Word Medium Medium Low High Late Low Theme,
Co-Occurrence Metatheme
12 Metacoding Medium Medium High High Late Low Theme,
Metatheme
- - -
1. Kind of Data
With the exception of metacoding, all 12 of the techniques we've described here can
be applied to lengthy narratives. However, as texts become shorter and less complex,
looking for transitions, metaphors, and linguistic connectors is harder to do.
Discovering themes by looking for what is missing is inappropriate for very short
responses to open-ended questions because it is hard to say whether missing data
represents a new theme or is just the result of the way the data were elicited. And
short Cexts are inefficient for finding theory-related material.
Por audio and video data, we find that the best methods involve looking and lis-
tening for repetitions, similarities and differences, missing data, and theory-related
material-and doing metacoding.
One more reminder about field notes as texts: In writing field notes, we choose
what data are important to record and what data are not. Any parteros (themes) that
we discover in field notes may come from our informants-but may also come from
biases that we brought to che recording process.
2. Skill
Not ali techniques are available to everyone. You need to be truly fluent in che lan-
guage of the text to look for metaphors, linguistic connectors, and indigenous typolo-
gies or to spot missing data. If you are working in a language other than your own,
it's best to stick to the search for repetitions, transitions, similarities and differences,
and etic categories (theory-related material) and to have native speakers do any sorting
of exemplars. Word lists and co-occurrences, as weU as metacoding, also require less
language competence and so are easier to apply.
Using word co-occurrence or metacoding requires know-how about producing
and managing matrices, as well as skill in using methods for exploring and visualizing
data. If you don't have training in the use of multidimensional scaling, cluster analysis,
factor analysis, and correspondence analysis, then use techniques like cutting and
sorting, word lists, and KWIC.
Word lists and KWIC are easily done on a computer with many of the popular
CAQDAS packages. CAQDAS (pronounced "cactus") stands for "computer assisted
qualitative data analysis software." Consult the CAQDAS Networking Project website
(http://www.surrey.ac. uk/sociology/researcb/researchcentres/caqdas/) for information
on these resources. (Further Reading: computer-assisted qualitative data analysis
software)
3. Labor
A generation ago, observation-based techniques required less effort than did process tech-
niques. Today, computers and software bave made counting words and co-occurrences of
words, as well as analysis of matrices very easy, though the cost, in time and effort, to
leam these computer methods can be daunting.
Sorne of the observation-based techniques (searching for repetitions, indigenous
typologies, metaphors, transitions, and linguistic connectors) are best done by eyeball-
ing, but this can be really time consuming. In team-based applications research, the
premium on getting answers quickly often means a preference for methods that rely
on computers and less on human labor.
In our own work, we find that a careful look at a word frequency list and sorne
quick pile sorts are goods ways ro start. Studying word co-occurrences and metacoding
require more work and produce fewer themes, but they are excellenc for discovering
big themes that can hide in mountains of texts.
In theme discovery, more is better. It's not that all themes are equally important. You
stilJ have to decide which themes are most salient and how themes are related to each
other. But unless themes are discovered in the first place, none of this additional anal-
ysis can cake place.
We know of no research comparing the number of themes that each technique
generares, but in our experience, looking for repetitions, similarities and differences,
transitions, and linguistic connectors that occur frequently in text produces more
themes, while looking for indigenous metaphors and indigenous categories (which
occur less frequently) produces fewer themes. Of all the observation techniques,
searching for theory-related material or for missing data produces the smallest number
of new themes.
Of the process techniques, the cutting-and-sorting method, along with word lists
and KWIC analysis, yield many themes and subthemes, while word co-occurrence and
metacoding produce a few, larger, more inclusive metathemes. But at the start of any
project, the primary goal is to discover as many tl1emes as possible. And this meaos
applying severa! techniques until you reach saturation-that is, until you stop flnding
new tl1emes.
Cutting and sorting expressions into piles is the most versatile technique. You can
identify major themes, subtl1emes, and even metathemes witl1 this method and
altl1ough tl1e analysis of tl1is kind of data is enhanced by computational metl1ods,
much of it can be done without a computer. In contrast, techniques tl1at apply to
aggregated data such as word co-occurrences and metacoding are particularly good at
identifying more abstract tl1emes but really can't be done without che help of good
software.
"There is," says Ian Dey (1993:110-11) "no single set of categories [themes] waiting to
be discovered. There are as many ways of 'seeing' the data as one can invent." In their
study of Chinese and American managers (above), Jehn and Doucet 0996, 1997) used
three different discovery techniques on the same set of data and each produced a
different set of tl1emes. All three of their theme sets have sorne intuitive appeal, and
ali three yield analytic results tl1at are useful. But Jehn and Doucet might have used
any of the other rechniques we've described here to discover even more themes.
How can we tell if the themes we've identified are valid? That is, are the concepts
we've identified really in the text? The answer is that there is no ultimare demonstration
of validity. The validity of a concept depencls on the utílity of the device that mea-
sures it and on che collective judgment of the scientific community that a concept and
its measure are valid (Bernard 2012:51; Denzin 1970:106).
Reliability, on the other hand, is about agreement among coders and across
metbods and across studies. Do coders agree on what theme to assign a segmem of
text? Strong interrater reliability- about which more in Chapter 11-suggests that a
theme is not just a figment of your imagin ation and adds to the likelihood that the
theme is also valid (Sandelowski 1995b).
Lincoln and Guba's (1985) team approach to sorting and naming piles of expres-
sions is so appealing because agreement need not be limited to members of che core
research team. Jehn and Doucet (1996, 1997) asked local experts to sort word lists
into themes, and Barkin et al. (1999) had both experts and novices sort quotes into
piles. The more agreement among team members, the more confidence we have that
emerging themes are internally valid.
Sorne researchers recommend that respondents be given tbe opportunity to exam-
ine and comment on themes (Lincoln and Guba 1985:351; Patton 1990:468-69). This
is certainly appropriate when one of the goals of research is to identify and apply
themes that are recognized o r used by the people whom one studies, but this is not
always possible. The discovery of new ideas derived from a more theoretical approach
may involve the application of etic rather than emic themes-that is, unclerstandings
held by outsiders rather than those held by insiders. In these cases, researchers should
not expect their findings necessarily to correspond with the ideas and beliefs of study
participants. (Further Reading: reliability and validity in qualitative research)
We have much to learn about the process of fmding themes in qualitative data. Since
the early 1960s, researchers have been working on fully automated, computer-based
methods for identifying themes in text. These computer-based content dictionaries, as
they're known, may not sit well with sorne. After ali, if the "qualitative'' in qualitative
methods meaos analysis by humans, then how can we give over such an important
piece of qualitative analysis-theme identification-to machines?
The answer, of course, is that ultimately, we are responsible for all analysis. We
are comfortable using text management software to belp us recognize connections in
a set of themes, and we are comfortable letting machines count words aod creare
matrices for us from texts. Text analysts of every epistemological persuasion can hardly
wait for voice recognition software to become sufficiently effective that it will relieve
us of alJ cranscription chores. Computer-based conrent dictionaries that can parse a text
and identify its underlying thematic componencs will, we believe, be just another tool
that will (1) make the analysis of qualitative data easier and (2) leacl to much wider
use and appreciation of qualitative data in ali Lhe social sciences.
Summary
• Analyzing text involves five tasks: (1) discovering themes and subthemes; (2) describ-
ing the core and peripheral elements of themes; (3) building hierarchies of themes or
codebooks; (4) applying themes-i.e., attaching them to chunks of actual text; and
(5) linking themes into theoretical models. This chapter focuses on the füst task:
discovering themes ancl subthemes.
• There are two kinds of themes: Induced themes come from data, while a priori
themes (also called decluced themes) come from prior research, from commonsense
constructs and from researchers' experiences. Research projects may involve both
kinds of themes.
• In Jooking for themes, there are at least eight things to look for in texts. These
inclucle (1) repetitions; (2) incligenous categories (local words for topics of impor-
tance); (3) the use of metaphors and analogies; (4) naturalJy occurring shifts in
concent; (5) similarities ancl clifferences in pairs of statements about a copie; (6) use
of linguistic connectors, like "because," "rather than," and "instead of," is a kincl of,
time statement (like "before" and "after that,"), if-then statements, spatial statements
(like "close to"), comparisons, synonyms, and antonyms; (7) missing data; and
(8) theory-related material (like evidence of social conflict, cultural contradictions,
informal methods of social control, things that people do in managing impersonal
social relationships, methods by which people acquire and maintain achieved and
ascribed status, and information about how people solve problems).
o Other techniques in looking for themes require more physical or computer-based
manipulation of the text itself. These include (1) creating cards with quotes from a
text, sorting them into piles of similar quotes, and naming each pile; (2) creating
word listS and concordances with a key-words-in-context program; (3) looking for
co-occurrences and collocations; and (4) metacoding (examining the relationship
among themes to discover potentially new themes and overarching metathemes).
• Consider five things in selecting techniques for finding themes: (1) the kind of data
you have; (2) how much skill is required; (3) how much labor is required; (4) the
number and types of themes to be generated; and (5) whether you are going to test
the reliability and validity of the themes you produce. For example, looking for
transitions, metaphors, and linguistic connectors is hard to do in short responses to
open-ended questions because it is hard to say whether missing data represent a
new theme or is just the result of the way the data were elicited. And short texts
are inefficient for finding theory-related material.
• Cutting and sorting expressions into piles is the most versatile technique for finding
major themes, subthemes, and even metathemes. Cutting-and-sorting, along with
word listS and KWIC analysis, yields many themes and subthemes, while word
co-occurrence and metacoding produce a few, larger, more inclusive metathemes.
At the start of any project, use severa! techniques until you reach saturation- that
is, until you stop finding new themes.
Exercises
The idea here is simply to reduce a set of texts to a set of themes and to do so
in a way that is agreed u po n by two people. Don't be surprísed íf you wind up w ith
a few themes that you can't agree on.
Further Reading
Finding themes. Bradley et al. (2007), Yeh and Inman (2007). Finding themes in group
research: Carey and Gelaude (2008), MacQueen et al. (2008), Neche les et al. 2007.
Automated text processing. Colley and Nea! (2012), Grimmer and Stewart (2013),
Van Holt et al. (2013).
Pile sorting. Eastman et al. (2005), Hsiao et al. (2006), Nolle et a l. (2012), Patterson
et a l. 1993, Sayles et al. (2007).
Computer-assisted qualitative data analysis software. Text analysis info (http://
www.textana lysis.info/), CAQDAS Networking Project (http ://www.surrey.ac.u k/
sociology/ research/researchcentres/caqdas/).
Reliability and validity in qualitative research . Kirk and Miller (1986), Lo ng and
Johnson (2000), Moret et a l. (2007), Ryan (1999).