Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Text Mining Digital Humanities Projects: Assessing Content Analysis Capabilities of Voyant Tools

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Journal of Web Librarianship

ISSN: 1932-2909 (Print) 1932-2917 (Online) Journal homepage: http://www.tandfonline.com/loi/wjwl20

Text Mining Digital Humanities Projects: Assessing


Content Analysis Capabilities of Voyant Tools

A. Miller

To cite this article: A. Miller (2018) Text Mining Digital Humanities Projects: Assessing Content
Analysis Capabilities of Voyant Tools, Journal of Web Librarianship, 12:3, 169-197, DOI:
10.1080/19322909.2018.1479673

To link to this article: https://doi.org/10.1080/19322909.2018.1479673

Published online: 26 Dec 2018.

Submit your article to this journal

Article views: 37

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=wjwl20
JOURNAL OF WEB LIBRARIANSHIP
2018, VOL. 12, NO. 3, 169–197
https://doi.org/10.1080/19322909.2018.1479673

Text Mining Digital Humanities Projects: Assessing


Content Analysis Capabilities of Voyant Tools
A. Miller
Middle Tennessee State University, Murfreesboro, Tennessee, USA

ABSTRACT KEYWORDS
Text mining is a method that aids in the analytic process and Content analysis; data
interpretation of research. Voyant Tools (voyant-tools.org) is visualization; digital
an open source text-mining option that is user-friendly and humanities; digital
scholarship; Drupal;
well documented. This tool was chosen as a test study for text-mining; Voyant Tools
one of the latest projects, entitled Trials and Triumphs, at
Middle Tennessee State University. The Trials and Triumphs
project has been reengineered with new content, themes, and
connections relating to Tennessee’s history between 1865
and 1965. Transformations are not just the subject during
this historic time period but are equally met with transforma-
tive technical upgrades to the project’s previous interpretative
layout. Digital Scholarship Initiatives at Middle Tennessee
State University’s Walker Library tested the application
and use of Voyant Tools to determine whether its text
analysis capabilities are well suited for the Trials and Triumphs
revitalization project (now called Trials, Triumphs, and
Transformations: Tennesseans' Search for Citizenship,
Community, and Opportunity) and whether its interoperability
with Drupal was worth pursuing. The author describes the
results of this test study and consequently intends this article
to be a practical guide for librarians or similar scholars who
develop digital humanities projects, and who are interested in
beginning a text mining project.

Introduction
Digital scholarship is interdisciplinary. Its own definition is varied among
scholars.1 After studying and discussing various definitions in use, the
author developed the definition now in use by Middle Tennessee State
University’s Walker Library’s Digital Scholarship Initiatives:
Digital scholarship is scholarship that is enhanced by the design of digital projects,
incorporation of digital tools, collaboration among digital partners, and
dissemination through digital platforms. Digital scholarship is changing the nature of
how research is conducted, produced, and shared.2

CONTACT A. Miller a.miller@mtsu.edu Middle Tennessee State University, Murfreesboro, Tennessee, USA.
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/wjwl.
ß 2018 The Author(s)
170 A. MILLER

The applications used to produce such scholarly output are vast and are
continuously being developed. Text mining is just one method, among sev-
eral, that aids in the analytic process and interpretation of how research is
conducted, produced, and shared. Voyant Tools (voyant-tools.org) is one
open source option of the available web-based text analysis tools that is
user-friendly and well documented. This tool was chosen for a test study,
which focused on the use and incorporation of various text analysis and
data visualization tools as part of the revitalization phase on one of the lat-
est digital scholarship projects at Middle Tennessee State University. Trials,
Triumphs and Transformations: Tennesseans’ Search for Citizenship,
Community, and Opportunity (http://dsi.mtsu.edu/trials) is a project devel-
oped by the Center for Historic Preservation and the James E. Walker
Library at Middle Tennessee State University and supported with funds
from the Tennessee Board of Regents Office of Academic Affairs and the
Tennessee Civil War National Heritage Area, a unit of the National
Park Service.
Tennessee’s history between the end of the American Civil War (1865)
and the passage of the 1964 Civil Rights Act and 1965 Voting Rights Act
often gets ignored. This historic period offers insights into the transforma-
tions that took place, including challenges and achievements, as
Tennesseans searched for citizenship, community, and opportunity.
Citizenship—what that has meant and how that has changed—is at the
heart of this digital exploration into Tennessee’s history and culture. This
digital collection’s objects, songs, photographs, paintings, and documents
often reveal the challenges faced by Tennesseans as they pursued the rights
and benefits of citizenship.3
This digital thematic research collection was designed to make hidden
collections accessible by placing images of rare historical documents, works
of art, and material culture objects, including lesson plans and scholarly
essays, within easy reach of researchers, teachers, and students of
Tennessee history.4 With the primary audience of teachers and researchers
in mind, this project explores the search for citizenship in Tennessee
between 1865 and 1965.
“For many Tennesseans, the right to vote and the right to fair and equal treatment
under the law were contested even after these rights had been written into the
United States Constitution after the Civil War. As Tennessee’s new citizens began to
gain their place in society, the state’s governmental institutions, industry, agriculture,
and infrastructure were transformed” (Knowles, 2017, para 1).

During 2016–2017, the project was reengineered with new content,


themes, and connections relating to Tennessee’s history between 1865 and
1965. Additionally, the project migrated to the Drupal content management
system (CMS), which provides an upgrade to the technology and back-end
JOURNAL OF WEB LIBRARIANSHIP 171

capabilities of the previous website and its content, which were originally
hosted in a customized front-end CONTENTdm instance. Photos, audio,
lesson plans, and scholarly essays remain features of the new project web-
site, but with extended features that now include an additional theme,
essay, and more data visualizations to complement the other features.
Voyant Tools was selected to serve as the tool for a test study on apply-
ing text analysis to this research collection because it is a free web-based
environment that is user-friendly and well documented by its developers.
Other digital scholarship tools tested were the Drupal CMS, StoryMap,
Tableau, and TimelineJS. All of these were essential to the revitalization
phase of this research collection; however, this article focuses on a new
area of interest for the project developers: text analysis capabilities.
Additionally, the testing of Voyant Tools helped inform what to emphasize
in data visualizations (which are developed with the other tools mentioned
above and visible at http://dsi.mtsu.edu/trials/visualizations). During the
project revitalization phase, the author used Voyant Tools to determine
whether its text analysis capabilities are well suited for the Trials,
Triumphs, and Transformations project and whether its interoperability
with Drupal is worth pursuing.

Literature review
Before working with Voyant Tools itself, the author conducted a literature
review for background and preparation. According to the Voyant Tools
website, version 1.0 has been used for several years. In 2016, version 2.0
was released. Both versions are web-based. There is also a stand-alone
desktop version in beta that is available for download. Although the stand-
alone desktop instance is ideal as it allows the data and resulting visualiza-
tions to be saved locally rather than in HTML (with a certain lifespan),5 it
is still in beta and therefore was not a candidate for this study. Instead, the
literature review focused on the web-based versions 1.0 and 2.0 with the
hope of finding practical guides for implementation.
A search on text mining with Voyant Tools yields a number of blogs,
tutorials, and YouTube videos on the use of Voyant Tools 1.0. A review of
the literature, including peer-reviewed articles, yields a larger number of
references on the use of Voyant Tools, again with version 1.0. Despite
more recent (2015–2016) publications that may have used version 2.0 and
failed to note the version used, the majority of these publications were dis-
sertations and merely mention the use of the tool rather than explain the
approach or implication of its use. Although the general web and scholarly
literature searches revealed resources applicable to the use of Voyant Tools
in general, they were not specific to the number of upgrades and enhanced
172 A. MILLER

features that are available with the newer version as well as possible
approaches for analysis. Although Voyant Tools 2.0 was released in 2016,
there is a lack of literature on the use and application of version 2.0.
In the literature that is available on Voyant Tools, one review describes
Voyant Tools as a text analysis tool for the average humanities scholar
(Welsh, 2014). This brief review of the free web-based tool confirms its
usefulness to both the beginner and advanced user. However, it is not just
researchers in the humanities that are taking advantage of this tool. The
authors of a medical study used a variety of web-based text processing
tools, including Voyant Tools, to investigate the feasibility of using such
tools to extract useful information from large amounts of patient data
(Maramba et al., 2015). Although the authors were using Voyant for a
medical study, they used the same tools that humanities scholars would
undertake for the same purpose—an alternative approach to traditional
textual analytic methods. Finally, in another 2015 study, a humanities
scholar used Voyant Tools for literary context, which adds a “new dimen-
sion to understanding” of how a novelist crafts a character (T. L. Lynch,
2015, p. 72).
These three articles, although published recently, used the earlier version
(1.0) of Voyant Tools. Additionally, although the articles are relevant, they
merely introduce the tool or show a few results of using the tools—they do
not explain the process involved with manipulating the tool itself. Rather,
they detail the findings of their study, not the tools used. The goal of this
paper is the opposite. The author hopes to provide examples on the appli-
cation and use of version 2.0 with the intended purpose of serving as a
guide for scholars and librarians interested in beginning a text mining
approach to digital scholarship projects. The Trials, Triumphs, and
Transformations project’s test study of using Voyant Tools for text analysis
will be the basis for producing guide documentation.

Methodology
For a more in depth look at this transformative time period (1865–1965) in
Tennessee, the author used text mining and visualizations to provide con-
tent analysis of the text within the eight scholarly essays6 associated with
the project. The essays were written by current scholars regarding the proj-
ect’s time period (1865–1965) and were written to provide evidence from
the perspective of historians, musicians, and other scholars of the digital
collection’s theme on Tennesseans’ search for citizenship, community, and
opportunity.
Specific goals of text mining the scholarly essays of Trials, Triumphs, and
Transformations were to determine:
JOURNAL OF WEB LIBRARIANSHIP 173

1. What terms occur throughout all eight scholarly essays;


2. How education was perceived in this time period based on the themes;
3. How transformations relate to the themes of this period;
4. Whether there is any unusual discovery by visualizing the essays;
5. If Voyant Tools is compatible with the Drupal content management sys-
tem (CMS).

Each scholarly essay was extracted from the Trials, Triumphs, and
Transformations website (http://dsi.mtsu.edu/trials) and inserted into a
Microsoft (MS) Word file (other word-processing editing software would
also suffice). Typically, when performing text analysis, cleaning the data or
removing any markup tags or special characters is a critical part of the
preparation prior to using the actual text mining software selected (Maceli,
2015). However, Voyant Tools, a web-based text reading and analysis
environment, is capable of handling a variety of input formats including
URLs, plain text, HTML, XML, PDF, RTF, and MS Word (Sinclair &
Rockwell, 2016, “Getting Started”). Although MS Word was used to pull
the individual essays into a single document for uploading to Voyant
Tools, the photos and their citations were removed before saving.
Alternatively, the URLs of each scholarly essay page could have been used
(instead of saving the text as a MS Word file), which can be a time saving
advantage because of the simplicity of merely copying and pasting a URL
for quick analysis. However, for this test study that method would provide
a difficult analytic process as each essay page includes images, tags, audio,
lesson plans, and other content (such as text from website headers and
footers) that is not necessarily associated with the goals of analyzing the
collection’s essay content. Only the text of the essays is important for this
analysis. For this reason, the URL was not used as the input method.
Next, the eight scholarly essays were batch loaded to the Voyant site.
This online text-analysis program allows multiple documents to be
uploaded together in version 2.0 and is called a “corpus.” Once it is
uploaded, the research analysis can begin by selecting the different tools,
also called “skins” (Sinclair, 2015). Figure 1 is the scholarly essays corpus,
and the default skin shows five different tools (tool names are circled) that
can be used for content analysis.
It is important to note that although there are five clear tools in Figure 1
(Cirrus, Summary, Trends, Reader, and Contexts), analysis is not limited to
just those tools, and the tools can be changed within the skin. This skin
was produced using Voyant Tools 2.0 (Voyant Tools 1.0 is still available)
and includes access to a variety of tools including (Sinclair & Rockwell,
2016c, “List of Tools”):
174
A. MILLER

Figure 1. Default Voyant Skin for content analysis of Trials, Triumphs, and Transformations scholarly essays showing a selection of the different
tools available.
JOURNAL OF WEB LIBRARIANSHIP 175

 Bubblelines (for frequency and distribution of terms),


 Cirrus (word cloud visualizing frequency of words),
 Corpus Collocates (graph representing keywords in close proximity to
other terms),
 Contexts (occurrence of keywords with surrounding context),
 Corpus Terms (table view of term frequencies),
 Knots (terms in a single document),
 Documents (table of documents in corpus with modifications),
 Phrases (repeating sequences),
 Reader (text fetched on demand),
 Scatterplot (graph of how words cluster for similarity or correspond-
ence analysis),
 StreamGraph (change in frequency of words),
 Summary (textual overview),
 Terms Radio (change of frequency in words),
 Trends (line graph of distribution of word occurrence).

See the accompanying reference for more details and an example layout
of each tool.7
An extremely valuable feature of Voyant Tools is the interactive and rela-
tional aspects of each tool. This means the tools can interact with each
other simultaneously. For example, selecting a word from the Summary
tool (for example, the word “education” as circled in Figure 2), automatic-
ally changes what is depicted in the Reader tool (top center of image) and
the word cloud configuration in the Cirrus tool (top left of image).
Compare this skin depiction in Figure 2 with the default/original skin in
Figure 1. Searching for other words in this context is also allowed with ver-
sion 2.0’s increased searching capabilities within each tool. (This feature is
displayed in Figure 2 with the arrows).
Due to the variations of tools and the relational data that can emerge
from using parts of the corpus, content analysis is versatile and user
dependent. For the purposes of the Trials, Triumphs, and Transformations
analysis, there are two key terms that will be addressed. One is the term
education. The other is the term transformation (but using the root
“transform”), which as the title Trials, Triumphs, and Transformations sug-
gests, describes the period of transition exhibited within the overarching
project. One objective of the project was to reengineer new content,
themes, and an interactive layout. The addition of “transformations” to the
title of the project helped draw out potential themes and analysis of terms
used in the collection, which informed areas of emphasis for data visualiza-
tions. To do this, the author used text analysis of the scholarly essays
176
A. MILLER

Figure 2. Voyant Skin change based on simultaneous tool use. Selecting the word “education” from the Summary tool displays related content in the other
tools allowing for analysis to be interactive and relational.
JOURNAL OF WEB LIBRARIANSHIP 177

during the revitalization phase to look for connections that complemented


or enhanced the project.

Results
Evaluation of tools by analyzing the term “education”
From the Terms tool (upper left section of Figure 2), there is a word cloud
of the top 15 terms used within the scholarly essays, and the “count” or
number of times a term is used is shown in a list view in Figure 3. This
feature allows a researcher to see what terms occur throughout all eight
scholarly essays (Goal 1), by displaying the term count, regardless of the
location in the project.
From this list it is clear that a few of the top ten terms are related:
school, university, and education. By selecting those three terms in the
Terms tool (displayed in the top left of Figure 4), the Trends tool auto-
populates, reflecting the usage of those terms among the eight scholarly
essays (addressing Goal 1). The Trends tool line graph shows the relative

Figure 3. Terms tool of default Voyant Skin showing top fifteen terms used in the corpus.
178
A. MILLER

Figure 4. Selection of related terms in Terms tool and their display in the Trends tool.
JOURNAL OF WEB LIBRARIANSHIP 179

frequency of these selected terms’ usage across the eight documents (dis-
played in the upper right of Figure 4), rather than relative frequency of the
overall top five terms, which was initially displayed in the Trends tool line
graph in Figure 2.
To more closely examine the data and figures, the Trends tool can be
expanded into a single view by hovering over the right-hand corner of that
tool and clicking on the pop-out box (indicated by the arrow in Figure 5).
This allows the data or visualization to be exported to a URL.
If the new URL with the Trends tool depiction defaults back to the over-
all top five terms in the dataset, thereby not displaying the desired terms
(in this case, the terms university and school were lost), the desired terms
can be added back in by typing them in the search box as identified by the
arrow in Figure 6. Adding terms via this method, re-displays the raw fre-
quency of the selected terms across the documents as show in Figure 7.
The three related terms of university, school, and education represented
in Figure 7 show these terms are used at various frequencies in the eight
scholarly essays (addresses Goals 1 and 2). Using the truncation asterisk
allowed a broader view of each term by enabling results to include the root
term and a number of suffixes. For example, searching school could
retrieve school, schooling, schools, and so forth. From Figure 7, there is an
obvious spike in the frequency of the term school but this is not surprising
as it was used the most in the scholarly essay entitled “Public Education in
Tennessee” (document 5 along the x-axis in Figure 7), which discussed
issues related to school frequently. Conversely, the term school is not
used at all in the scholarly essay entitled “Political Separation and
Exclusivity: Musical Dialogue and Transcendence” (document 6 in Figure
7) or in the biographical essay entitled “Sampson Wesley Keeble” (docu-
ment 2 in Figure 7). This does not necessarily mean that school was not
important during this time period or in the themes associated with these
two essays, which include Performing Identity and Embracing Citizenship
respectively.8 Rather, this analytical process opens up more questions. For
example, a different term may have been used for school during that time
period. Or other factors may have been more important to identity or citi-
zenship than school. Perhaps politics of that time did not focus on school
as a key issue. It is even possible that Keeble’s background or goal influ-
enced his perspective, which determine how he included or excluded ideas
related to school.
In addition to the frequency (both low and high) of the usage of the
term school in these essays, the use of the term university was also inter-
esting. As depicted with the line referencing university in Figure 7, it is
the most horizontal line, reflecting a more steady usage across nearly all
eight essays. From this, one could presume—university is a more
180
A. MILLER

Figure 5. Exporting tools and data to a URL.


JOURNAL OF WEB LIBRARIANSHIP
181

Figure 6. Adding in lost terms using the search box.


182
A. MILLER

Figure 7. Raw frequencies of the three related terms from the Terms tool reflected in the single view of the Trends tool.
JOURNAL OF WEB LIBRARIANSHIP 183

consistently used term (compared to the usage of school and education)


during this period as it is reflected in several essays, each focused on a dif-
ferent theme. Although school has a high raw frequency, the majority of
that usage come from just one essay, “Public Education in Tennessee” and
does not have steady usage in all eight essays. School is used heavily in the
Embracing Citizenship theme but also uses the terms education and uni-
versity at a lower frequency. To browse the six themes (Embracing
Citizenship, Transforming the Economy, Claiming Space, Finding
Community, Achieving Recognition, and Performing Identity) of the project,
visit http://dsi.mtsu.edu/trials/browsethemes.
Looking at the term frequencies from a different view (to further support
or contradict initial results) can be done by hovering over the upper right-
hand corner of the single view of the Trends tool and navigating to
“Bubblelines” from the Visualizations Tools submenu as identified with the
arrow in Figure 8.
The resulting Bubblelines tool shows the frequency of the top five terms
that were listed in the Trends tool of the default Skin as shown in Figure 3.
Unfortunately, when changing tools, the latest data modifications are lost,
and the data reverts to the default skin data. This glitch could be something
Voyant Tools 2.0 is still working on as it was only recently released from
beta. However, with the new search capabilities of version 2.0, modifying
the terms included is easy (as explained in Figures 5 and 6). To remove
unwanted data, simply click on the term and select “Remove Term” as
shown in the top left with the arrow in Figure 9.
To further modify the results, the features listed along the bottom of the
tool (see the three arrows at the bottom of Figure 9) can be used to search
for those terms within a specific essay(s). The granularity can be adjusted,
and the lines can be separated for each term so the bubbles are not stacked
on top of each other as they are in Figure 9. As an example, both the
granularity and line separation were modified in the bubblelines shown in
Figure 10. The two biographical essays on James Carroll Napier and
Sampson Wesley Keeble were removed from the analysis because, as shown
in Figure 9, these essays do not have as high of a frequency of the selected
terms compared to the rest of the scholarly essays. These essays were
unchecked using the “Documents” option at the bottom of the Bubblelines
tool. In addition, the term college was included for analysis as it is a syno-
nym of the selected terms (see Figure 10).
From the Bubblelines visualization, it is clear that the term school is
prevalent in the essay on public education. But interestingly, there is also a
high frequency of the term education in that same essay. This connection
was not as obvious when using the Trends tool (see Figure 8).
Additionally, the Bubblelines tool shows that the essay “Black Higher
184
A. MILLER

Figure 8. Changing from the Trends tool to the Bubblelines tool.


Figure 9. Bubblelines tool comparing the frequency of the terms by removing default terms and modifying other elements.
JOURNAL OF WEB LIBRARIANSHIP
185
186
A. MILLER

Figure 10. Bubblelines tool with modifications of granularity, document selection, and separation of lines by term.
JOURNAL OF WEB LIBRARIANSHIP 187

Education in Tennessee” uses all four terms (education, university, school,


and college) rather frequently. This observation could lead a researcher to
explore questions like: Why does this essay contain all four words? Why
are all four terms frequently used when treating the topic of higher educa-
tion but not when discussing politics (the sixth essay in Figure 9)? Again,
the use of the various visualization tools can generate more questions that
can be further explored by the user of this—or any—thematic collection.
For example, the essay “Political Separation and Exclusivity”9 discusses the
agency of music as a conversation of identities and communities, often
with musicians singing and playing what could not be spoken about earlier
in history without controversy. Music is a natural ability but also a learned
skill, and during this time period, this essay signified that music education
came from schools, private homes, juke joints, colleges, or barbershops. Of
the four terms (education, university, school, and college), only the latter
two are used in this essay, further emphasizing that this generation typic-
ally learned music making from direct personal experiences rather than for-
mal education. Voyant Tools helps bring attention to these nuances in the
text, enabling an engaged researcher to focus on new lines of inquiry.

Changing visualizations directs new inquiries


The second term of concern for this test study is transformation. The term
transformation was not included in the top fifteen list of words used in the
corpus (see Figure 3), yet it is important to the goal of the project. Voyant
Tools were used for content analysis and ultimately explored all iterations
of the word transform, including transformative, transformational, trans-
forming, etc … , in the search boxes of various tools to see how often,
when, and in proximity to what, the term was used. Typing transform
into the search box in the Terms tool yields ten instances where some iter-
ation of the term transform is used in the essays as seen in Figure 11. The
same search was applied to the Trends tools where the term transform is
reflected in the line graph of Figure 12. The line graph indicates transform
was used in four of the scholarly essays “Black Higher Education,” “New
Economies, New Communities,” “Separation and Exclusion,” and “They
Took Their Stand … with Race in Mind.”
These visualizations inspire the questions: How was the term transform
being used? Was it in a negative or positive context? To explore these kinds
of questions, a different visualization can help give more context. By chang-
ing to the Collocates tool, the term, its collocate (the keyword occurring in
close proximity), and the count (frequency of use) are revealed as seen in
Figure 13.
188 A. MILLER

Figure 11. Searching for the frequency of the term transform in the corpus with the
Terms tool.

These collocated terms, however, do not provide enough detail to deter-


mine definitively the context of their use. For this, it is better to use the
Context tools (also known as Keywords in Context, or KWIC) as seen in
Figure 14. However, with the default settings there are not enough words
to the left or the right of the term to get an idea of the full context.
Modifications were made by sliding the “context” feature to the right,
which expands the number of words to the left and right of the term as
shown with the arrow in Figure 15.
The left-hand side of the Contexts tool visualization shows within which
document the term was used. These results show that the term transform
was used in four of the eight essays, a total of ten times overall. To look
within each of those specific essays, clicking on the boxed plus sign (see
the arrow in Figure 16) will expand the view. A section or sections of that
specific document where the term is used is displayed, giving the full scope
of the text surrounding that term, and thus allowing the researcher to
evaluate the context. Similarly, the context tool allows the researcher to
spend less time on irrelevant mentions of the term (for example, instances
where transform is used in the title of the project or the footer of the web-
site), and focus more on the substance of the essays with the
term transform.
For this particular case, the essays using the term transform and their
associated themes can provide a researcher with direction on the trans-
formative disposition during this time period (addresses Goal 3). Questions
a researcher could generate include: Are there connections between these
JOURNAL OF WEB LIBRARIANSHIP 189

Figure 12. Visualizing the frequency of the term transform with the Trends tool.

themes regarding iterations of the word transform? Does one influence the
other? The analysis can be split in several ways and narrowly dissected
according to the researcher’s focus. For example, the Contexts tool (see
Figure 16) shows that the term transform is used in four essays,10 three of
which specifically deal with the education of African Americans and race
(example addressing Goal 4). Due to the frequency and importance of the
term education in this transformative period as illustrated by the use of
text analysis, data on education by race was researched further and dis-
played through a data visualization (developed with Tableau) titled
Tennessee School Attendance.11 Tools used in digital scholarship, such as
Voyant for text analysis and Tableau for data visualization, are extremely
useful, providing visualizations that can help direct or guide research, espe-
cially in ways that may not have been possible via traditional human-
ist methods.
190 A. MILLER

Figure 13. Visualizing the frequency of the term transform and other keywords in close prox-
imity with the Collocates tool.

Practical uses and Drupal capability


Useful tips not discussed above include the need to filter out stop words.
These consist of the typical articles and prepositions (a, an, the, and, it,
etc.). Voyant Tools 2.0 automatically has this feature enabled to auto detect
the language of use. But the list is editable so researchers can add their
own list of words to avoid as well. This can be done by hovering over the
right-hand corner of the Cirrus tool and clicking on the “Options” feature
as shown in Figure 17. There you can select the language of use, edit the
list, and check the box for “apply globally” so that the filter is applied
throughout all tools and not just Cirrus.
One of the best features of Voyant Tools is that it allows users to publish
the data to a URL. Additionally, an individual tool can be exported as a
PNG. The HTML link can be embedded in blogs, websites, and some con-
tent management systems (CMS). For example, Wordpress is not
Figure 14. Visualizing the context of the term transform with other words that surround the term with the Contexts tool.
JOURNAL OF WEB LIBRARIANSHIP
191
192
A. MILLER

Figure 15. Expanding the number of words displayed in context with the Contexts tool.
Figure 16. Expanding the view to evaluate the context surrounding the term highlighted using the Contexts tool.
JOURNAL OF WEB LIBRARIANSHIP
193
194 A. MILLER

Figure 17. Cirrus tool options allows for filtering out stop words in a specific language, adding
to the pre-made list of stop words, and selecting the maximum number of terms to show.

immediately compatible with the embedded URL syntax and requires the
iframe plugin (Sinclair & Rockwell, 2016a, “Embedding”). Rather than con-
firming the iframe plugin works with Wordpress, the author tested the
embedded URL syntax with the LibGuides and Drupal (both examples of a
CMS) test production site. This type of embedding can be done for the
entire corpus or individual tools, which increases the richness of the visual-
ization more than a static screenshot. The URL can be shared in this cap-
acity thereby allowing the user or others to work with the same texts at
different times. This workflow eliminates the need to reload documents
every time. For Trials, Triumphs, and Transformations, the process of
embedding URL syntax was specifically tested, and not only were the visu-
alizations successfully embedded into both the LibGuides CMS and the
Drupal CMS, they also linked to the Voyant Tools URL when clicked
(addresses Goal 5). This feature allows the users (in Trials, Triumphs, and
Transformations’ case, the readers/users of the project) to manipulate live
data themselves. This last step rounded off the goals of the project’s test
study of Voyant Tools.

Limitations
There are some limitations to the current version of Voyant Tools. One
limitation is that the corpus URL will only remain accessible if it is
accessed at least once a month (Sinclair & Rockwell, 2016b, “Getting
Started”). It will be interesting to see if that constraint is modified with
future versions. Also, at times, the display reverts to the default search
when exporting a single tool view to its own URL. Because this problem
occurred only on some occasions, it is presumed to have been a glitch,
though not detrimental to the project as the specific searches can easily be
re-entered.
The new 2.0 version of Voyant Tools implies that the data can be
exported as HTML, tab separated values, or JSON, but the ability to export
JOURNAL OF WEB LIBRARIANSHIP 195

in these ways was not an intended goal of this particular project and may
be something to test in future projects. Additionally, testing and studying
the standalone desktop version would be worthwhile once it is out of beta.
The features associated with a standalone version would be beneficial to
researchers wanting more data privacy and long-term storage options.

Conclusion
Based on using Voyant Tools to review and interact with the data visualiza-
tions in relation to the specific goals set forth for this example project,
shows that text mining the scholarly essays of Trials, Triumphs, and
Transformations is a valuable contribution to the interpretative project.
Analyzing the text of the essays with Voyant Tools helped frame important
terms through count, frequency, and relativity that ultimately gave way to
new areas of desired inquiry. For example, analyzing the text of the New
Economies, New Communities: 1865–1915 essay12 led to a greater under-
standing of the prominence of certain industries such as banking, farming,
mining, railroads, iron, and lumber during this period. Further research led
the author to focus on analyzing the data of industry from this time period
as depicted in the Urban Tennessee Industry by Number of Products
(1870) data visualization.13 This interactive chart adds value to the essay
and the entire collection by showing industries that dominated the three
largest counties in the state, and the relative number of products produced
by industry compared to the number of employees in that industry.
Together, the use of text analysis and data exploration yielded a visualiza-
tion that helps users interpret meaning from raw data. Voyant Tools was
extremely useful in this regard, with its visualizations helping to direct and
guide research in ways that may not have been possible with traditional
research methods. But the visualizations and text mining alone will not
yield the interpretation needed to produce scholarship. The evaluation of
the text mining results and the decisions that lead up to them are still
inherently dependent upon the scholar. Thus, Voyant Tools is just one way
to add the digital to the digital humanities scholar or project.
This test study also confirmed that Voyant Tools can work with the
Drupal CMS. Surprisingly, the embed link also allows other users to inter-
act with live data, which is an added benefit for scholars who are reviewing
the Trials, Triumphs, and Transformations project. In fact, it is an added
benefit for any digital humanities project that uses data visualizations, as
making the data versatile and interactive for users can affect its appeal and
use. Not only are the project developers able to interact with the data, but
the user can reproduce the same analysis or perform the analysis in
196 A. MILLER

different ways, potentially making the project an even more collaborative


resource and tool for Tennessee history.
It is the hope that the above examples can be used as a guide on how
text mining, and in particular Voyant Tools, can assist with analyzing texts.
Its use can help confirm hypotheses, raise new questions, and provide dif-
ferent viewpoints for reading, visualizing, and interpreting texts.

Notes
1. To review some definitions of “digital scholarship,” consider C. Lynch (2014) who
states digital scholarship is awkward and nonsensical since scholarship is just
scholarship; or Rumsey (2011) who states, “digital scholarship is the use of digital
evidence and method, digital authoring, digital publishing, digital curation and
preservation, and digital use and reuse of scholarship” (p. 2).
2. This definition is available on the DSI website at http://dsi.mtsu.edu/about.
3. An infographic summary of the Trials, Triumphs, and Transformation digital
collection is available by clicking view/open at http://jewlscholar.mtsu.edu/xmlui/
handle/mtsu/5612.
4. For a more detailed introduction to meaning and search for citizenship in this
project, read the about the collection at http://dsi.mtsu.edu/trials/about.
5. When using the free online version of Voyant Tools, the HTML created can be saved
to a URL and remains accessible as long as it is accessed once a month (http://
voyant-tools.org/docs/#!/guide/start).
6. The scholarly essays are available at http://dsi.mtsu.edu/trials/essays. Each essay is
written by a scholar in the field and lists a recommended citation.
7. List of tools available at http://www.voyant-tools.org/docs/#!/guide/tools.
8. All themes can be browsed here http://dsi.mtsu.edu/trials/browsethemes.
9. The essay Political Separation and Exclusivity: Musical Dialogue and Transcendence is
available at http://dsi.mtsu.edu/trials/cockrell.
10. The four essays that use the word transform are Black Higher Education in
Tennessee, Separation and Exclusion: Fisk University and the Arts, They Took Their
Stand … with Race in Mind, and New Economies, New Communities: 1865–1915. All
essays are available at http://dsi.mtsu.edu/trials/essays.
11. In addition to the interactive charts showing school attendance by race (1850–1890),
the observations and data sources derived from these charts are shared at dsi.mtsu.
edu/trials/attendance.
12. Available at http://dsi.mtsu.edu/trials/west.
13. This chart was developed in Tableau and it is available at http://dsi.mtsu.edu/trials/
industry1870.

Acknowledgments
All images and tools used in this article were created with Voyant Tools 2.0/2.4, an open
source reading and analysis environment developed by Stefan Sinclair, Geoffrey Rockwell,
and partners, available at http://voyant-tools.org. Voyant Tools is an open-source project
and the code is available through GitHub. Documentation is available at http://voyant-
tools.org/docs.
JOURNAL OF WEB LIBRARIANSHIP 197

References
Knowles, S. W. (2017). Trials, triumphs, and transformations focuses on citizenship.
Retrieved from https://chpblog.org/2017/06/21/trials-triumphs-and-transformations-
focuses-on-citizenship
Lynch, C. (2014). The “Digital” scholarship disconnect. EDUCAUSE Review, 49, 10–15.
http://er.educause.edu/articles/2014/5/the-digital-scholarship-disconnect
Lynch, T. L. (2015). Soft(a)ware in the English classroom. English Journal, 104, 74.
Maceli, M. (2015). What technology skills do developers need? A text analysis of job list-
ings in Library and Information Science (LIS) from jobs.code4lib.org. Information
Technology and Libraries, 34, 8–21. doi:10.6017/ital.v34i3.5893
Maramba, I. D., Davey, A., Elliott, M. N., Roberts, M., Roland, M., Brown, F., Burt, J.,
Boiko, O., & Campbell, J. (2015). Web-based textual analysis of free-text patient experi-
ence comments from a survey in primary care. JMIR Medical Informatics, 3, e20.
doi:10.2196/medinform.3783
Rumsey, A. (2011). New-model scholarly communication: Road map for change. Scholarly
Communication Institute 9. Retrieved from http://uvasci.org/institutes-2003-2011/SCI-9-
Road-Map-for-Change.pdf
Sinclair, S. (2015). DH2015 workshop: The new, the neat & the gnarly. Retrieved from
http://docs.voyant-tools.org/author/stefansinclair
Sinclair, S., & Rockwell, G. (2016a). Embedding Voyant Tools. Voyant Tools. Retrieved
from http://www.voyant-tools.org/docs/#!/guide/embedding
Sinclair, S., & Rockwell, G. (2016b). Getting started. Voyant Tools. Retrieved from http://
voyant-tools.org/docs/#!/guide/start
Sinclair, S., & Rockwell, G. (2016c). List of tools. Voyant Tools. Retrieved from http://www.
voyant-tools.org/docs/#!/guide/tools
Welsh, M. E. (2014). Review of Voyant Tools. Collaborative Librarianship, 6, 96–97.

You might also like