Beyond Comparative Analysis: Making Arguments with
Similarity Metrics and Structured Manuscript Data, with a
Case Study in Marginal Iconography
Alexander Patrick Brey, Maeve K. Doyle
Manuscript Studies: A Journal of the Schoenberg Institute for
Manuscript Studies, Volume 8, Number 2, Fall 2023, pp. 232-281
(Article)
Published by University of Pennsylvania Press
DOI: https://doi.org/10.1353/mns.2023.a916130
For additional information about this article
https://muse.jhu.edu/article/916130
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
Beyond Comparative Analysis:
Making Arguments with Similarity Metrics
and Structured Manuscript Data,
with a Case Study in Marginal Iconography
A lex a nder Patr ick Br ey
Wellesley College
M a eve K . Doyle
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
Eastern Connecticut State University
esearchers specializing in medieval European manuscripts
have a long tradition of systematically describing their objects of
investigation. As objects produced through copying, manuscripts
tend to share certain physical characteristics: pages within a gathering, lines
on a page, measurements of page and text block, conventions of script.
Furthermore, as manuscripts materialize conceptual systems (reflecting
religious practices, intellectual frameworks, or literary taste), they lend
themselves to a certain conceptual standardization. Since the late nineteenth
century, the material for data-driven manuscript studies has been rich but
siloed. Today, aggregating manuscript data creates opportunities for largescale corpus analyses that were previously rare and laborious.
R
Alexander Patrick Brey and Maeve K. Doyle contributed equally to this work. This project
developed in part through our participation in the Getty Advanced Workshop in Network
Analysis and Digital Art History (NA+DAH), and we are grateful to the participants and
organizers for their feedback and suggestions, as well as John Ladd and Carolyn Anderson for
their comments on drafts. Our research has been supported by Eastern Connecticut State
University, the Getty Research Institute, and Wellesley College. We presented part of this
project at the International Medieval Congress at the University of Leeds in 2021.
Brey and Doyle, Beyond Comparative Analysis | 233
Manuscript scholars wishing to explore these new opportunities can learn
from other disciplines’ data-driven approaches and adapt them to their own
research questions. In this paper, we consider the application of one quantitative approach that manuscript historians could use to understand the large
groups of manuscripts becoming available for study either as metadata or
images: similarity measurement. This approach permeates the technological
infrastructure behind academic, commercial, and even governmental systems.1 Our ongoing work with similarity measurement, and our research
into its established use in ecological, archaeological, and linguistic research,
has led us to our method of contextualizing results through comparison with
random simulations. Measures of similarity, sometimes referred to as proximity or distance metrics, offer promising possibilities for manuscript research.
In this article we introduce and model the use of similarity metrics for
manuscript scholars who are considering adopting such approaches or who
may never even have heard of them.2 Although we frame this discussion in
terms of an interdisciplinary community of manuscript scholars, we especially
seek to address scholars in our own field of art history, which has lagged
behind the other humanities in exploring the analytical possibilities of
computational methods.3 We argue that quantitative measurements of similarity can help researchers understand the large groups of manuscripts now
available for study either as metadata or images. In adapting approaches from
other fields, our experience has led us to conclude that researchers benefit
the most from such measurements when they can compare them to simulations to contextualize observed data (what statisticians would call a “sample”
1
For a general introduction, see Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining:
Concepts and Techniques (Waltham, MA: Morgan Kaufmann, 2012), 65–79.
2
Strictly speaking, distance metrics must adhere to a set of axioms or requirements pertaining to spatial reasoning, while approaches to quantifying similarity that diverge from these
axioms are usually referred to as “indices.” In this article, we will use the term “metric”
throughout, even when discussing approaches that violate the axioms of metrics.
3
The contributions of Lev Manovich on methods from data science are particularly relevant
to our argument. Manovich, “Data Science and Digital Art History,” International Journal for
Digital Art History 1 (2015): 11–35; for a survey of recent developments in the field, see Kathryn
Brown, ed., The Routledge Companion to Digital Humanities and Art History (New York:
Routledge, 2020).
234 | Manuscript Studies
from the historical “population”). This simulation-based approach roughly
corresponds to the intuitive understanding that experienced researchers
develop: whether similarities are simply expected within a shared manuscript
culture or whether they might indicate a more specific connection.
This article introduces formal methods for quantifying the similarity of
manuscripts based on a set of features, focusing on iconographic overlap as
a case study. First, we argue that the value of such methods is increasing
thanks to the deluge of digitized materials and manuscript metadata that
is becoming available. Then, we introduce different approaches to quantifying similarity and discuss how manuscript researchers might learn from
disciplines like archaeology, in which such approaches are widely employed.4
Next, we turn to a concrete case study: images in the margins of English,
French, and Flemish Gothic manuscripts. We use this example to illustrate
the process of selecting a relevant similarity metric. We then demonstrate
some of the ways that researchers may use similarity metrics to answer
research questions. We conclude that similarity measurement permits
manuscript researchers to answer otherwise intractable questions while
provoking exciting opportunities to revisit assumptions about key disciplinary concepts.
Studying Manuscripts in the Age of Computers
Although the rapid adoption of computers and online databases has revolutionized how researchers find and access manuscripts, these technological
advancements have played a more limited role in transforming the core
methods of manuscript research. The rapid increase in digitization has
allowed traditional manuscript research to move outside of the reading room,
4
We restrict our discussion in this article to methodological issues related to such metrics
but direct readers who wish to pursue technical implementation to the excellent tutorial in the
Programming Historian: John R. Ladd, “Understanding and Using Common Similarity
Measures for Text Analysis,” Programming Historian 9 (2020), https://doi.org/10.46430/
phen0089.
Brey and Doyle, Beyond Comparative Analysis | 235
inspiring previously unimaginable depth and engagement in remote analyses.5
New approaches to digitization may offer users the ability to manipulate
manuscripts in three dimensions, examine their internal physical structure,
or experience how surfaces respond to changes in the direction and intensity
of illumination.6 As literary historian Katarzyna Anna Kapitan has recently
shown, shifting research agendas have prompted the manuscript community
to reconsider which metadata to produce and share.7 In addition to these new
forms of digital surrogates, researchers have begun to develop new methods
in response to the creation of institutional and multi-institutional databases,
innovations in computer vision, and their own increasing digital literacy.8
As researchers grapple with these new resources and approaches, techniques that help them understand large datasets will become increasingly
useful—both to detect macroscale patterns in manuscript groups and to
contextualize individual objects. As the number of features to compare across
objects grows, computational methods become essential. Visualizing two
features using simple graphical devices like a scatterplot is relatively intuitive,
but intuition quickly fails as we increase the number of variables. For example,
we can easily graph a comparison of page height and the number of lines per
page in two dimensions, and even add a third dimension for the number of
folios in a manuscript. But if we add a fourth variable, such as the number
of scribes, visualizations grounded in spatial intuition start to break down.
Similarity metrics become most helpful at this point because they have the
5
See, for example, the results of digitization-based studies in Benjamin Albritton, Georgia
Henley, and Elaine M. Treharne, eds., Medieval Manuscripts in the Digital Age (London:
Routledge, 2021).
6
Bill Endres, Digitizing Medieval Manuscripts: The St. Chad Gospels, Materiality, Recoveries,
and Representation in 2D and 3D (Leeds: Arc Humanities Press, 2019); Jana Dambrogio,
Amanda Ghassaei, Daniel Starza Smith, Holly Jackson, Martin L. Demaine, Graham Davis,
David Mills, Rebekah Ahrendt, Nadine Akkerman, David van der Linden, and Erik D. Demaine,
“Unlocking History through Automated Virtual Unfolding of Sealed Documents Imaged by
X-Ray Microtomography,” Nature Communications 12, no. 1 (2 March 2021): 1184.
7
Katarzyna Anna Kapitan, “Perspectives on Digital Catalogs and Textual Networks of Old
Norse Literature,” Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies
6, no. 1 (2021): 93–95.
8
L. W. Cornelis van Lit, Among Digitized Manuscripts: Philology, Codicology, Paleography
in a Digital World (Leiden: Brill, 2019), 227.
236 | Manuscript Studies
capacity to compress the information from multiple variables or features into
a single metric (a concept known as dimensionality reduction). We believe
that similarity metrics will form a valuable part of the conceptual tool kit
that researchers bring to any comparative study of manuscripts.
Similarity by the Numbers: Basics of Similarity Metrics
All similarity metrics compare pairs of objects or features. Calculating the
similarity metrics for an entire set of objects therefore entails comparing
every possible pair of objects in the set. One of the most straightforward
ways to quantify similarity is in terms of distance. Distance metrics build on
spatial reasoning: they quantify the distance between two points in multidimensional space. For a simple measurement between two points on a grid,
the Pythagorean theorem defines the hypotenuse, or shortest path between
them. We usually imagine the two axes of this grid as representing two
dimensions of a plane in Euclidean space, but there is no reason they cannot
represent something else. Each axis of the grid could instead stand for a
quantitative feature of a manuscript: for example, the number of folios or
distinct watermarks it contains. Likewise, there is no need to limit these
dimensions to the usual two or three dimensions familiar from spatial reasoning, although the equations to calculate distance must change accordingly.
We might position points along seven or twenty different axes or dimensions,
each of which represents a different feature we deem relevant to our research
question. Similarity metrics may simply invert distance metrics (observations
with a low distance would have a high similarity index, and vice versa), or
they may adopt other approaches to quantifying overlap that violate some of
the axioms or requirements for distance metrics. Over the past two decades,
a vibrant debate has emerged around the similarities and differences between
physical and cultural distances.9
9
Ted Underwood and Richard Jean So, “Can We Map Culture?,” Journal of Cultural Analytics 6, no. 3 (June 17, 2021): 34.
Brey and Doyle, Beyond Comparative Analysis | 237
There are as many different equations for calculating similarity and
distance as there are ways of conceptualizing these terms.10 Some distance
metrics consider sequence, measuring how many additions, subtractions, or
substitutions are necessary to move from one string of features to another.
This approach, known as edit distance, is employed by specialists in genetics
and stemmatic analysis.11 Some applications of edit distance in manuscript
research could include comparing the order of psalms in books of hours (one
of the key methods for determining liturgical use) or the collation of different
manuscripts. Edit distance would be useful for comparing the lists of feast
days in the liturgical calendars that frequently open Christian manuscripts,
which could, as Aaron Macks has proposed, automatically group and localize
manuscripts.12
Other distance metrics ignore sequence entirely, comparing two sets of
features regardless of the order in which they occur, as in the iconographic
comparison case study we describe below. Data that simply record a presence
or absence represent another special case for which researchers have developed
tailored distance metrics. Similarity measures known as Q measures compare
objects based on their associated features, while R measures assess the
dependence between these associated features or descriptors.13 Take as an
example a paleographic dataset that catalogs folios as the main objects and
letterforms as associated features. A Q measure might quantify the similarity
of two folios (objects) based on the presence or absence of specific letterforms
10 For recent surveys of common metrics, see Sung-Hyuk Cha, “Comprehensive Survey on
Distance/Similarity Measures between Probability Density Functions,” International Journal
of Mathematical Models and Methods in Applied Sciences 1, no. 4 (2007): 300–307; Louis Legendre
and Pierre Legendre, Numerical Ecology (Boston: Elsevier, 2012), 265–335; Habiba, Jan C.
Athenstädt, Barbara J. Mills, and Ulrik Brandes, “Social Networks and Similarity of Site
Assemblages,” Journal of Archaeological Science 92 (April 2018): 63–72.
11 Joris van Zundert, Armin Hoenen, Sara Manafzadeh, Yannick M. Staedler, Teemu Roos,
and Jean-Baptiste Guillaumin, “Computational Methods and Tools,” in Handbook of Stemmatology: History, Methodology, Digital Approaches, ed. Philipp Roelli (Berlin: De Gruyter, 2020),
308–9, 330, 343–44.
12 Aaron Macks, “Data Sanctorum: The Corpus Kalendarium Database of Devotional
Calendars,” Manuscript Studies 6, no. 2 (Fall 2021): 348.
13 Legendre and Legendre, Numerical Ecology, 266.
238 | Manuscript Studies
(features). An R measure of the same dataset would quantify how “close”
or “distant” two letterforms (features) are based on the frequency of their
co-occurrence on various folios. All of the concrete examples of similarity
measurement we discuss in this article fall into the category of Q
measures.
Although these approaches may feel alien to some manuscript researchers, there is in fact a broad swathe of manuscript scholarship that deals
with similarity in one form or another. In the following section, we offer
a brief survey of several areas in which informal and formal analyses of
similarity have shaped manuscript research, along with some guidelines
for preparing observations about manuscripts with the goal of calculating
similarity metrics.
Similarity Measurement and Manuscript Studies
Manuscript scholars use the term “similarity” (and its inverse, “difference,”
or “dissimilarity”) in sophisticated but sometimes ambiguous ways. Manuscripts can be similar in terms of texts, iconography, script, style, layout,
materials, or codicological makeup.14 Similarity might also refer to the
contexts in which books were made or used: to define genres, model/copy
relationships, shared structures of production or patronage, overlapping
chains of provenance, or common functions. These assessments of similarity,
often qualitative, work well for the narrow scope of most historical studies,
such as analyses of specific texts or genres or the work of a given artist or
14 Comparative paleographic studies abound, such as François Déroche, The Abbasid Tradition:
Qurʾans of the 8th to 10th Centuries (New York: Nour Foundation in association with Azimuth
Editions and Oxford University Press, 1992). For examples of comparative studies focused on
materials, see Abigail Quandt, “The Purple Codices: A Report on Current and Future Research
and Conservation Projects,” Care and Conservation of Manuscripts 16 (2018): 121–52; Maurizio
Aceto et al., “Mythic Dyes or Mythic Colour? New Insight into the Use of Purple Dyes on
Codices,” Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 215 (May 15,
2019): 133–41. For an argument in favor of comparative codicological analyses, see Malachi
Beit-Arié, “Why Comparative Codicology?,” Gazette du livre médiéval 23, no. 1 (1993): 1–5.
Brey and Doyle, Beyond Comparative Analysis | 239
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
workshop.15 The relatively small number of manuscripts typically considered
in these studies encourages scholars to read closely and consider multiple
types of similarity simultaneously.
In studies with larger scopes, manuscript scholars have long used structured data to facilitate comparison, focusing their analyses on a limited set
of features for each object. These may take the form of standardized catalogs,
tables, or vocabularies. Günther Haseloff’s 1938 Die Psalterillustration im 13.
Jahrhundert includes twenty tables of psalter iconography; seventy-five years
later, Alison Stones devoted nearly a whole volume of her Gothic Manuscripts
catalog to iconographic tables for a variety of texts.16 Feature-based comparative analyses also form the basis for connoisseurial groupings or attributions.17
Detailed paleographic studies like François Deroche and Albert Derolez’s
categorizations of early Abbasid and Gothic scripts, respectively, produced
15 For example, Persis Berlekamp, Wonder, Image, and Cosmos in Medieval Islam (New
Haven, CT: Yale University Press, 2011); Kathryn M. Rudy, Piety in Pieces (Cambridge: Open
Book Publishers, 2016), 223, on the refurbishment of Netherlandish manuscripts; Benjamin
Anderson, Cosmos and Community in Early Medieval Art (New Haven, CT: Yale University
Press, 2017), 77–79, on Carolingian cosmological manuscripts; William Noel, “The Utrecht
Psalter in England: Continuity and Experiment,” in The Utrecht Psalter in Medieval Art: Picturing
the Psalms of David, ed. Koert van der Horst, William Noel, and Wilhelmina C. M. Wüstefeld
(Tuurdijk, Netherlands: HES, 1996), 120–65; Richard H. Rouse and Mary A. Rouse, Manuscripts and Their Makers: Commercial Book Producers in Medieval Paris, 1200–1500, 2 vols.
(Turnhout: Harvey Miller, 2000); Richard H. Rouse and Mary A. Rouse, Renaissance Illuminators in Paris: Artists & Artisans 1500–1715 (London: Harvey Miller, 2019); Kathryn A. Smith,
Art, Identity and Devotion in Fourteenth-Century England: Three Women and Their Books of
Hours (London: British Library, 2003); Michael A. Michael, “Oxford, Cambridge and London:
Towards a Theory for ‘Grouping’ Gothic Manuscripts,” Burlington Magazine 130, no. 1019
(1988): 107–15, on groups among English manuscripts.
16 Günther Haseloff, Die Psalterillustration im 13. Jahrhundert: Studien zur Geschichte der
Buchmalerei in England, Frankreich, und den Niederlanden (Kiel: [n.p.], 1938), 100–123; Alison
Stones, Gothic Manuscripts 1260–1320, 4 vols. (London: Harvey Miller, 2013–2014).
17 For example, Georg Vitzthum, Die Pariser Miniaturmalerei von der Zeit des hl. Ludwig bis
zu Philipp von Valois und ihr Verhältnis zur Malerei in Nordwesteuropa (Leipzig: Quelle & Meyer,
1907), 88–111 (“Die Gruppe des ‘Romans de la Poire’”), to name just one example; Robert
Branner, Manuscript Painting in Paris during the Reign of Saint Louis: A Study of Styles (Berkeley:
University of California Press, 1977).
240 | Manuscript Studies
controlled vocabularies for describing handwriting in manuscript books.18
Indices, databases, and vocabularies developed at research centers such as the
Index of Medieval Art (since 1919) and the Institut de recherche et d’histoire
des textes (founded 1937) have also added to the wealth of structured data
available for the study of manuscripts.19 Computational approaches to similarity build on these comparative-analytical traditions. The many available tools
for assessing similarity among large sets of objects or texts with complex
traits (such as manuscripts) empower humanistic researchers to use structured
data on even larger scales.
The timely mathematical calculation of similarity at such scales requires
computers and therefore also computer-readable data. Some of the types of
manuscript similarity discussed above are more amenable to quantification
than others, but for the purpose of quantifying similarity, computer-readable
data need not be restricted to numbers. Qualities that can be classified or
cataloged—such as iconography, paleography, textual contents, artistic
attributions, watermarks, or pigments—lend themselves well to calculating
similarity, especially if they are cataloged with controlled vocabularies.
Provenance involves discrete chains of ownership, which are remarkably well
suited to similarity measurement if they can be reconstructed. Qualities that
are more difficult to quantify include artistic style, although computer scientists have begun developing approaches that analyze brushstrokes.20 For
this article, however, we will limit our discussion to features that manuscript
18 Déroche, The Abbasid Tradition; Albert Derolez, The Paleography of Gothic Manuscript
Books: From the Twelfth to the Early Sixteenth Century (Cambridge: Cambridge University Press,
2003); Derolez, “Possibilités et limites d’une paléographie quantitative,” in Hommages à Carl
Deroux, ed. Pol Defosse, vol. 5 (Brussels: Latomus, 2004), 98–102.
19 Colum P. Hourihane, “Classifying Subject Matter in Medieval Art: The Index of Christian
Art at Princeton University,” Visual Resources 30, no. 3 (July 3, 2014): 255–62; Louis Holtz,
“Les premières années de l’Institut de recherche et d’histoire des textes,” La revue pour l’histoire
du CNRS, no. 2 (May 5, 2000), https://doi.org/10.4000/histoire-cnrs.2742.
20 Fang Ji, Michael S. McMaster, Samuel Schwab, Gundeep Singh, Lauryn N. Smith, Shishir
Adhikari, Márcio O’Dwyer, Farah Sayed, Anthony Ingrisano, Dean Yoder, et al., “Discerning
the Painter’s Hand: Machine Learning on Surface Topography,” Heritage Science 9, no. 1
(November 12, 2021): 152.
Brey and Doyle, Beyond Comparative Analysis | 241
specialists have traditionally categorized or quantified, to highlight existing
manuscript data as a foundation for new digital analysis.
Manuscript researchers may easily tackle small-scale analyses of this type
without computers, but as the number of items to compare increases linearly,
the number of possible comparisons increases exponentially. Analyzing these
large numbers of combinations manually would take a prohibitively long
time, but computers can do so in a matter of seconds or, in extreme cases,
hours. Quantitative approaches to similarity allow manuscript researchers
to study large-scale trends difficult or impossible to discern in smaller groups.
Scholars wishing to collect data for similarity analysis should organize
their observations on a spreadsheet, placing the basic objects under investigation in separate rows and the features they wish to compare in separate
columns. For example, a project measuring similarity between a set of
manuscripts should have a row for each manuscript and columns for a range
of manuscript properties: number of pages, quires, or lines per page; dates
of production or change in ownership; geographic coordinates; patrons’ or
expected owners’ names; textual or visual contents; presence or absence of
illumination or annotations; materials; and more. Some similarity metrics,
such as Gower’s distance, can analyze both quantitative and qualitative
features, so columnar data can be either numerical or categorical.21 As long
as the columns represent comparable features, and as long as some repetition
(similarity) exists within them, it will be possible to measure similarity. For
the sake of computational analysis, traditional catalogs and published tables
may be considered “legacy data,” that is, information stored in a format that
is difficult to process with computers. Beyond the initial step of digitizing
published data (or other legacy data), as in the project we describe below,
researchers may have to standardize and reconfigure their data to facilitate
similarity analysis.
Some datasets will be too varied or too small to meaningfully apply similarity metrics. Since similarity metrics measure the repetition of qualities or
attributes within a group of objects or observations, a set of objects that lacks
21 J. C. Gower, “A General Coefficient of Similarity and Some of Its Properties,” Biometrics
27, no. 4 (1971): 857–71.
242 | Manuscript Studies
repeated features will appear uniformly dissimilar. As a consequence, although
manuscript similarity studies do not require very large datasets (much less
“big data”), the number of objects included in such studies may still feel large
relative to the conventions of the field of manuscript research. Calculating
these metrics may be helpful for producing accurate, systematic comparisons
based on the quantitative and categorical features of as few as three objects.
Additional caution is merited, however, when considering whether arguments
based on such small samples accurately represent the larger corpus or population from which they are drawn. Tools for evaluating the robustness of a
quantitative method given the size of a dataset can help to determine whether
computational analysis is likely to yield meaningful results.22
Although a comprehensive comparison that includes numerous, varied
features like the one described above may be tempting, researchers should
focus selectively on limited comparisons of features (or variables) that answer
research questions. That is to say, the features from which researchers calculate similarity metrics should be the closest possible proxies to the phenomenon they want to understand. A researcher answering questions about
textual communities will choose different features to analyze than a researcher
interested in patronage or workshop practices. If possible, the features under
consideration should have direct, causal relationships with the subject of the
research question. For instance, a research question about workshop practices
might consider page-ruling techniques that vary from workshop to workshop
22 In addition to tests of statistical power, which can yield a minimum viable sample size,
researchers may be interested in two particular approaches. One, known as simulation testing,
is to simulate a “perfect” dataset and test a method on smaller and smaller samples to determine
the point at which a given method consistently yields incorrect results. The other approach
tests the robustness of a method by applying it to smaller and smaller samples from an existing
dataset. For the theory of simulation testing, see Tim P. Morris, Ian R. White, and Michael
J. Crowther, “Using Simulation Studies to Evaluate Statistical Methods,” Statistics in Medicine
38, no. 11 (2019): 2074–102. For applications in archaeology and digital humanities, see Luce
Prignano, Ignacio Morer, and Albert Diaz-Guilera, “Wiring the Past: A Network Science
Perspective on the Challenge of Archeological Similarity Networks,” Frontiers in Digital
Humanities 4 (2017), https://doi.org/10.3389/fdigh.2017.00013; Yann C. Ryan and Sebastian
E. Ahnert, “The Measure of the Archive: The Robustness of Network Analysis in Early Modern
Correspondence,” Journal of Cultural Analytics 6, no. 3 (July 21, 2021), https://doi.
org/10.22148/001c.25943.
Brey and Doyle, Beyond Comparative Analysis | 243
but exclude data about page size (which is typically determined by supply
chains and production techniques shared by multiple workshops).
Including features in an analysis that are unrelated to the research
question may seem harmless, but more data is not always better. If a feature
has no logical reason to be connected to a research question, including it
in similarity computations will introduce noise into the results, even to
the extent of obscuring real patterns. Instead of lumping all these features
into a single similarity metric, researchers may wish to keep them separate
and perform two comparative analyses to identify where their phenomenon
of interest diverges from other trends. Once researchers have identified a
research question and collected the necessary observations in a computationally tractable format, they can use a programming language (or collaborate with someone who can) to calculate similarity metrics quickly and
systematically.23
With such varied approaches to conceptualizing similarity, manuscript
researchers can learn from other disciplines, many of which have employed
similarity metrics for decades. In the following section, we introduce a brief
overview of some salient methodological developments and debates.
Lessons from Other Fields: Applications and Limitations
of Similarity Measurement
Researchers across disciplines continue to investigate different approaches
to quantifying similarity, as well as to debate how the resulting values should
be used to answer questions. One area of continuing research focuses on how
resistant various metrics are to errors introduced by analyzing just a sample
of a larger population.24 Researchers in biology and chemistry have recently
23 Many programming languages already feature implementations of the necessary equations
for similarity measurement. See Ladd, “Understanding and Using Common Similarity Measures
for Text Analysis.”
24 Stephen A. Bloom, “Similarity Indices in Community Studies: Potential Pitfalls,” Marine
Ecology Progress Series 5, no. 2 (1981): 125–28; Henk Wolda, “Similarity Indices, Sample Size
and Diversity,” Oecologia 50, no. 3 (1981): 296–302; Anne Chao, Robin L. Chazdon, Robert K.
244 | Manuscript Studies
developed methods for quantifying the similarities of larger sets of data that
may share entities and features.25 Theoretical advances in understanding
the mathematical relationships between similarity indices and distance
metrics may further revolutionize the field.26 Critics of similarity metrics
have demonstrated they may not yield clear results in practice because they
discard or compress information that other methods preserve.27 Manuscript
researchers should be aware of these other uses for similarity metrics and
their limitations.
One of the most relevant applications is in archaeology, where similarity
metrics are now a standard methodological tool for comparing assemblages
of artifacts excavated at multiple sites. Pioneering publications in the 1950s
and 1960s prompted widespread adoption of similarity metrics (first referred
to as indices or coefficients of “agreement”) to study cultural data.28 Over the
past two decades, archaeologists have embraced similarity metrics as the
basis for reconstructing cultural and socioeconomic networks between sites
of human activity.29 The primary evidence for such analyses is usually the
Colwell, and Tsung-Jen Shen, “Abundance-Based Similarity Indices and Their Estimation
When There Are Unseen Species in Samples,” Biometrics 62, no. 2 (2006): 361–71; Prignano,
Morer, and Diaz-Guilera, “Wiring the Past.”
25 Anne Chao, Lou Jost, S. C. Chiang, Y.-H. Jiang, and Robin L. Chazdon, “A Two-Stage
Probabilistic Approach to Multiple-Community Similarity Indices,” Biometrics 64, no. 4 (2008):
1178–86; Ulf G. Indahl, Tormod Næs, and Kristian Hovde Liland, “A Similarity Index for
Comparing Coupled Matrices,” Journal of Chemometrics 32, no. 10 (2018): e3049.
26 Ondřej Rozinek and Jan Mareš, “The Duality of Similarity and Metric Spaces,” Applied
Sciences 11, no. 4 (2021): 1910.
27 J. W. Johnston, Similarity Indices II: The Power of Goodall’s Significance Test for the
Simple Matching Coefficient (Richland, WA: Battelle Pacific Northwest Laboratories, December 1976); David I. Warton, Stephen T. Wright, and Yi Wang, “Distance-Based Multivariate
Analyses Confound Location and Dispersion Effects,” Methods in Ecology and Evolution 3,
no. 1 (2012): 89–101.
28 W. S. Robinson, “A Method for Chronologically Ordering Archaeological Deposits,”
American Antiquity 16, no. 4 (1951): 293–301; George W. Brainerd, “The Place of Chronological
Ordering in Archaeological Analysis,” American Antiquity 16, no. 4 (1951): 301–13; George
L. Cowgill, “Archaeological Applications of Factor, Cluster, and Proximity Analysis,” American
Antiquity 33, no. 3 (1968): 367–75.
29 Per Östborn and Henrik Gerding, “Network Analysis of Archaeological Data: A Systematic
Approach,” Journal of Archaeological Science 46 (June 2014): 75–88; Anna Collar, Fiona Coward,
Brey and Doyle, Beyond Comparative Analysis | 245
composition of ceramic assemblages discovered at various sites, but some
researchers have focused on the physical properties of fired bricks or iconographic patterns in shell art.30 The relationship between similarities in
material culture and other forms of social, political, or economic interaction
remains undertheorized, leading archaeologist Matthew Peeples to advocate
that “archaeologists and ethnographers [need] to more directly collaborate
on projects explicitly focused on tracking how formally defined social networks (as reckoned by people themselves) relate to patterns of material similarity, production, and consumption at various scales.”31 Manuscript studies
likewise foregrounds the intersection of material culture and human networks, so developments in this area of archaeological research may have
significant ramifications for the field.
Researchers in fields beyond manuscript studies have also investigated
the limitations of similarity measurement. Archaeologists caution that
reshaping or omitting information to fit the constraints of a data structure
entails a sacrifice of qualitative complexities.32 Manuscript researchers may
therefore wish to adopt these approaches as a complement to, rather than a
replacement for, existing forms of description and analysis. In the 1970s, as
ecologists began adopting similarity metrics to quantify environmental
degradation, it became clear that some tests based on similarity metrics fail
to discern whether two samples of different species come from the same
Tom Brughmans, and Barbara J. Mills, “Networks in Archaeology: Phenomena, Abstraction,
Representation,” Journal of Archaeological Method and Theory 22, no. 1 (March 2015): 2–3;
Barbara J. Mills, “Social Network Analysis in Archaeology,” Annual Review of Anthropology 46,
no. 1 (2017): 387.
30 Per Östborn and Henrik Gerding, “The Diffusion of Fired Bricks in Hellenistic Europe:
A Similarity Network Analysis,” Journal of Archaeological Method and Theory 22, no. 1 (2015):
306–44; Jacob Lulewicz and Adam B. Coker, “The Structure of the Mississippian World: A
Social Network Approach to the Organization of Sociopolitical Interactions,” Journal of
Anthropological Archaeology 50 (June 2018): 113–27.
31 Matthew A. Peeples, “Finding a Place for Networks in Archaeology,” Journal of Archaeological Research 27, no. 4 (December 2019): 477–78, 482.
32 Piraye Hacigüzeller, James Stuart Taylor, and Sara Perry, “On the Emerging Supremacy
of Structured Digital Data in Archaeology: A Preliminary Assessment of Information,
Knowledge and Wisdom Left Behind,” Open Archaeology 7, no. 1 (2021): 1710–11.
246 | Manuscript Studies
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
underlying ecological distribution.33 Since then, researchers have proposed
new ways to approach this problem, confronting the confounding effects of
spatial and temporal trends that ecologists, archaeologists, and historians of
manuscripts all encounter.34 These methods also merit consideration,
although they are beyond the scope of this article.
This cursory overview has introduced just a small fraction of the developments in the usage of similarity metrics. Manuscript researchers who delve
into similarity metrics would benefit from conversations and collaborations
with experts in fields like biostatistics, paleontology, and archaeology and
from engaging with the substantial body of research on the topic. Critiques
and alternatives may indeed lead them to refine their methods. Depending
on the kind of contextual records that survive, similarity-based approaches
to manuscripts may also offer theoretical insights of interest to those in
adjacent fields like archaeology.
Case Study: Similarity of Marginal Iconography
in Medieval European Manuscripts, 1250–1350
In the remainder of this essay, we discuss examples from an ongoing study
to illustrate some ways to apply similarity metrics to a concrete dataset. Our
project investigates patterns within the fashion for figural decoration in
manuscript margins, a phenomenon that emerged in the regions around the
English Channel in the mid-thirteenth century, by analyzing iconographic
similarity in the marginal images in later medieval manuscripts (fig. 1). We
work with legacy data from Lilian M. C. Randall’s Images in the Margins of
Gothic Manuscripts, an index describing 13,234 images in 237 manuscripts,
most of which were produced in France, Flanders, and England between
1250 and 1350.35 As part of the Manuscript Connections project, our larger,
33 J. W. Johnston, Similarity Indices I: What Do They Measure? (Richland, WA: Battelle
Pacific Northwest Laboratories, November 1976); Johnston, Similarity Indices II.
34 See Legendre and Legendre, Numerical Ecology, 17–21.
35 Lilian M. C. Randall, Images in the Margins of Gothic Manuscripts (Berkeley: University
of California Press, 1966). For further research on marginal illumination, see Michael Camille,
Brey and Doyle, Beyond Comparative Analysis | 247
Figure 1. Flies in the margins of three manuscripts, one example of iconographic overlap.
Left: A man attacks a fly with a spear. The Gorleston Psalter, ca. 1310–24, London, British
Library, Add. MS 49622, fol. 7v. Center: Two flies. The Maastricht Hours, ca. 1310–20(?).
London, British Library, Stowe MS 17, fol. 64r. These two images produced with
permission from the © British Library Board. Right: A fly pursued by a swallow. The
Rothschild Canticles, ca. 1295–1300. New Haven, CT, Yale University, Beinecke Rare
Book and Manuscript Library, MS 404, fol. 157r.
ongoing study of illuminated manuscripts utilizing computational techniques,
we use similarity metrics and network analysis together to investigate similarity in marginal iconography across manuscripts.36 Examples from this project
demonstrate different approaches to quantifying similarity and model the
types of questions that similarity metrics can help researchers answer.
Image on the Edge: The Margins of Medieval Art (Cambridge, MA: Harvard University Press,
1992); Lucy Freeman Sandler, “The Study of Marginal Imagery: Past, Present, and Future,”
Studies in Iconography 18 (1997): 1–49; Laura Kendrick, “Making Sense of Marginalized Images
in Manuscripts and Religious Architecture,” in A Companion to Medieval Art: Romanesque and
Gothic in Northern Europe, ed. Conrad Rudolph (Oxford: Blackwell, 2006), 274–94; Jean Wirth,
Les marges à drôleries des manuscrits gothiques (1250–1350) (Geneva: Droz, 2008); Kathryn A.
Smith, “Margin,” in “Medieval Art History Today: Critical Terms,” ed. Nina Rowe, special
issue, Studies in Iconography 33 (2012): 29–44.
36 Manuscript Connections, http://manuscriptconnections.org. This project has benefited
immensely by participation in the 2019–2021 Getty Advanced Workshop for Network Analysis
+ Digital Art History. A project description from 2018 can be found at https://sites.haa.pitt.
edu/na-dah/.
248 | Manuscript Studies
Figure 2. Page 100 from Randall’s Images in the Margins with sample entries color-coded based on
their content. Dark red highlights the main theme, light red the subtheme, dark blue the manuscript
identifier, and light blue the location within the manuscript.
We elected to use Randall’s indexing system largely unchanged as the
basis for our data, in an experiment to see whether it could yield meaningful
quantitative results without revision.37 Randall’s index lists iconographic
themes in alphabetical order, dividing them into groups (which we call
“themes” or “main themes”) based on the key actors or objects represented,
and subgroups (which we call “subthemes”), often based on the relationship
or action in which actors are engaged. Randall’s main themes frequently
occur in multiple manuscripts and occasionally even occur multiple times
37 Some of Randall’s themes are more similar than others, although this is arguably true of
any system of iconographic categorization. Alternative systems like Iconclass offer affordances
such as a deep hierarchy of categories that permit researchers to automatically move from more
granular to more general characterizations of images. They provide a fruitful starting point for
researchers generating iconographic datasets from scratch. Researchers working with other
types of manuscript data may find existing systems for systematically recording categorical and
quantitative features meet their needs, or they may need to create their own.
Brey and Doyle, Beyond Comparative Analysis | 249
Figure 3. A screenshot of part of the spreadsheet produced by digitizing and subdividing the
entries in Randall’s index of marginalia. Entries with no subtheme contain the code NA (not
applicable) to remove potential ambiguity about why the cell lacks a value. Only the main theme
and manuscript identifier columns were used to create the manuscript-theme matrix from which
similarity metrics were calculated.
within a single manuscript, while her subthemes are often limited to a single
instance across her corpus (see fig. 2 for a sample page). For example, marginalia that Randall categorized under the theme “Fly” appear in three
manuscripts: once without a qualifying subtheme, once with the subtheme
“and swallow,” and once with the subtheme “attacked by man with spear.”
Although we included these subthemes in our digitization of the data, our
analysis throughout this article is based solely on Randall’s main themes.
Translating Randall’s index into a format amenable to computational
analysis involved several steps. First, we scanned the text and used Adobe
Acrobat to perform automatic optical character recognition (OCR). Then
we entered each instance of marginalia into a separate row in a spreadsheet
(fig. 3). We divided the information in each entry into several columns, which
included the major and minor theme for each instance of marginalia, the
identifier for the manuscript in which it occurs, and the folio on which it
appears. Because the OCR introduced typographic errors into the text, we
then used the free, open-source software tool OpenRefine to identify and
correct these errors. Next, we used a short script written in the statistical
programming language R to transform this spreadsheet into a matrix in
which each row represents a manuscript (237 manuscripts/rows) and each
250 | Manuscript Studies
column represents one of the major iconographic themes (2,002 distinct
themes/columns). The values in the cells of this manuscript-theme matrix
consist of the total number of instances of each major theme in a given
manuscript. Because many themes occur in only a few manuscripts, like the
“Fly” example above, and most manuscripts contain only a few instances of
marginalia, just 8,184, or 1.7 percent, of the cells in this manuscript-theme
matrix contain nonzero values indicating the presence of one or more
instances of a theme in a given manuscript.38 Finally, we used open-source
packages (prewritten sets of functions) to calculate the manuscript-manuscript
similarity for each possible pair of manuscripts with a variety of different
similarity metrics (27,966 total pairwise comparisons).39 As we show in the
sections that follow, researchers can use these comparisons to make not just
observations but also arguments on the basis of similarity between
manuscripts.
Our experience exemplifies the opportunities and challenges that arise
when using legacy data. Randall’s publication defined the parameters for the
study of marginal iconography in the later twentieth century and remains
an essential resource for researchers today. The wealth of structured data it
provides presents an extraordinary opportunity for computer-assisted scholarship. However, Randall’s index is organized for human readers, with headings
of varying specificity and redundant listings. The index is also necessarily
incomplete, constrained by exigencies of time and print media. Randall
explicitly prioritized depictions of figures in action, although the index also
contains numerous actionless figures.40 Even within these parameters, the
38 Both the number of themes in each manuscript and the number of marginalia in each
theme have severely right-skewed distributions, perhaps even logarithmic distributions, a subject
that merits further investigation beyond the scope of this article.
39 The basic R language includes a function called dist() that can calculate several different
types of distance metrics. Additional distance/similarity metrics may be calculated using
imported packages available in the CRAN (Comprehensive R Archive Network) repository.
See, for example, Mark van der Loo and David Turner, “Gower: Gower’s Distance,” accessed
2 February 2023, https://CRAN.R-project.org/package=gower. It is also possible to write
custom functions to calculate distance or similarity metrics.
40 “With these considerations in mind, the present iconographic selection consists of scenes
depicting humans, animals, or hybrids in some sort of activity. These constitute the essence
Brey and Doyle, Beyond Comparative Analysis | 251
index could not be comprehensive: in a survey of a random subset of twenty
manuscripts, we found that, on average, Randall indexed about 36 percent
of the figural marginalia that we observed in these manuscripts. Given the
substantial amount of data the index contains, it may provide an adequate
sample to reflect the larger population of marginalia that we seek to analyze.41
Still, we must understand that our findings do not represent a direct reflection
of marginal iconography but rather estimates based on one researcher’s
descriptions. Accordingly, we take care at each stage to check our computational findings against recent publications and our own observations of
manuscript images whenever possible.
Choosing and Comparing Similarity Metrics
Our experience with the marginal iconography data indicated that choosing
a similarity metric is not just a technical but an interpretive decision, entailing
careful consideration of the nature of the data, the research question, and
the contextualization of individual comparisons. Researchers should experiment with different similarity metrics to find one that fits their data and
understanding of similarity. We tested several metrics to determine which
could identify similarities meaningful to our research questions before
ultimately selecting the Gower distance metric.
of marginal subject matter and provide the most valuable insight into contemporary mores and
ideas.” Randall, Images in the Margins, 15.
41 The extent to which surviving iconographic themes reflect the full set of images that
must have existed during this period is yet a separate question, but one that is beyond the
scope of this article. For a sophisticated discussion of this problem as it applies to the preservation of trecento songs in manuscripts dated 1380–1415, see Michael Scott Cuthbert, “Trecento
Fragments and Polyphony beyond the Codex” (PhD diss., Harvard University, 2006), 44–86;
Michael Scott Cuthbert, “Monks, Manuscripts, and Other Peer-to-Peer Song Sharing
Networks of the Middle Ages,” in Cantus Scriptus: Technologies of Medieval Song: Proceedings
of the 3rd Annual Lawrence J. Schoenberg Symposium on Manuscript Studies in the Digital Age,
November 20–21, 2010, ed. Lynn Ransom and Emma Dillon (Piscataway, NJ: Gorgias Press,
2012), 110–22.
252 | Manuscript Studies
The Type of Similarity Matters
Similarity metrics can be divided, broadly speaking, into different families
based on how they interpret overlaps in data.42 Some of the simplest focus
on absolute overlap. One example is the intersection metric, which simply
adds up the minimum number of shared traits in a given comparison (e.g.,
the minimum number of images for each theme shared between two
manuscripts).43 This approach contrasts with others that produce proportional
or relative values, like the Brainerd-Robinson similarity metric (see appendix
for a comparison of calculations using the intersection metric, the BrainerdRobinson similarity metric, and Gower’s distance). Developed to analyze
archaeological assemblages, this metric compares the proportions of the
overall set of features (the relative frequency distributions) shared by any
particular pair of objects.44 Other metrics take an entirely different approach—
neither absolute nor proportional. Cosine similarity, widely used to find
similarities between textual documents, positions objects based on the
features they contain, then measures their orientation (that is, their angle)
relative to one another within an abstract, multidimensional space.45 The
Brainerd-Robinson and cosine metrics are specifically designed to ignore the
absolute magnitude of the feature sets they analyze, allowing them to assess
the similarity of objects that have very different quantities of features.
Researchers should consider with care which kind of similarity metric best
fits their questions and data.
Testing these options on our data, we found that metrics that ignored
the absolute magnitude (the total number of shared images) ran counter to
our art-historical understanding that the presence of abundant marginalia
42 For a more comprehensive overview of similarity metrics, see above, n10.
43 This metric is so basic it is not even discussed in most overviews of similarity metrics,
but it is the basis for the more complex Steinhaus metric and Bray and Curtis’s percentage
difference equation. Legendre and Legendre, Numerical Ecology, 285, 311.
44 Brainerd, “The Place of Chronological Ordering in Archaeological Analysis”; Robinson,
“A Method for Chronologically Ordering Archaeological Deposits”; Habiba et al., “Social
Networks and Similarity of Site Assemblages,” 64.
45 Han, Kamber, and Pei, Data Mining, 77; Legendre and Legendre, Numerical Ecology,
301–2 (related approaches).
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
Brey and Doyle, Beyond Comparative Analysis | 253
itself constitutes a meaningful similarity: books with abundant marginalia
are often more similar to each other than they are to books with few images.
Both cosine and Brainerd-Robinson metrics overstated the similarity of
sparsely illuminated manuscripts with other books. For instance, the English
Vaux Psalter (London, Lambeth Palace Library, MS 233) and a Netherlandish
psalter (London, British Library, Yates Thompson MS 42) produced one of
the highest cosine similarity metrics in our dataset. The two manuscripts
share only one image in common, but since this is the sole image recorded
for the Netherlandish psalter in our data, the cosine metric interpreted it as
highly significant. As art historians, when we considered the single image
from the Netherlandish psalter alongside the fifty-nine rich and varied
illuminations recorded from the Vaux Psalter, the books appeared to us very
different. Although the Brainerd-Robinson and cosine metrics produce results
that differ from our intuitive understanding of our data and our analytical
goals, they may still be useful for other types of manuscript research when
substantial differences in the magnitudes of features obscure real parallels.
After ruling out certain types of similarity metrics, it can be instructive
to compare the remaining options to learn more about how they work and
what conception of similarity underpins their algorithm (see appendix). Our
comparisons of the intersection metric, discussed above, and Gower’s distance
metric, led us ultimately to conclude that Gower’s metric better reflects our
intuitive understanding of similarity between manuscripts. Gower’s metric
is similar to the intersection metric, in that it starts with the absolute overlap
between two features. It goes a step further by normalizing each feature
differently based on whether there is wide variation in its abundance across
a dataset.46 One consequence of this normalization is that features that occur
consistently impact the final similarity more than features that vary dramatically in quantity. The Gower index also counts matches in the absence of
images toward its similarity measurement. Thus, pairs of manuscripts that
46 This normalization or scaling based on the differing ranges of values within each feature
accomplishes something similar to other methods such as term frequency–inverse document
frequency (TF-IDF) scaling or feature scaling, although each approach produces slightly different results. Gower, “A General Coefficient of Similarity and Some of Its Properties”; Legendre
and Legendre, Numerical Ecology, 278–80.
254 | Manuscript Studies
both avoid certain popular images (such as “Obscaena,” “Fables,” or “Jesus
Christ, life of ”) may also have a positive similarity score, even if they share
no images. This property of the Gower metric may produce some counterintuitive results for our data, because researchers typically conceive of
comparisons based on the features present in one or both objects but may
struggle to account comprehensively for features wholly absent within the
comparison. This surprising feature of the Gower metric illustrates the
importance of understanding the specific concept of similarity encoded within
a given metric.
Random Baseline
In our experience, similarity metrics are most useful when compared to a
baseline. This baseline may represent either a simple random scenario (the
approach we adopt here) or the output of a more sophisticated model or
simulation for how the data were generated. Because art historians currently
lack a mathematical model for artists’ use of iconographic themes in marginalia, our baseline represents an alternative past in which illuminators selected
iconographic themes at random.47 Our simulated manuscripts contain the
same quantities and frequencies of images as our real manuscripts (between
47 Specifically, we used permutations like those employed in statistical tests where researchers
do not wish to make assumptions about the underlying distributions of their data. See David
Spiegelhalter, The Art of Statistics: How to Learn from Data (New York: Basic Books, 2021),
261. The use of stochastic simulations as a baseline for interpreting similarity indices was
established (and debated) in ecological studies as early as the late 1960s, although it has not
been widely adopted in other areas like document retrieval or archaeological assemblage
comparisons. David W. Goodall, “A Probabilistic Similarity Index,” Nature 203, no. 4949
(September 1964): 1098; David M. Raup and Rex E. Crick, “Measurement of Faunal Similarity
in Paleontology,” Journal of Paleontology 53, no. 5 (1979): 1213–27; James F. Heltshe, “Jackknife
Estimate of the Matching Coefficient of Similarity,” Biometrics 44, no. 2 (1988): 447–60. Some
researchers have expressed reservations about this approach, perhaps most forcefully articulated
in J. W. Johnston’s dismissal of Goodall’s probabilistic index, Similarity Indices I, 51–53. Simulations based on ecological data suggest that stochastic permutation is underpowered as a test
to detect significant associations between observations, but it may still be used descriptively.
Legendre and Legendre, Numerical Ecology, 294.
Brey and Doyle, Beyond Comparative Analysis | 255
1 and 480 images in each manuscript), but we randomly assigned iconographic
themes to each manuscript. Imagine iconographic speed dating, where
manuscripts remain seated at their tables with anywhere from 1 to 480 chairs,
and marginal themes get shuffled randomly among the tables. The random
scenarios (known as permutations) preserve both the number of individual
images and the frequency of repeated themes recorded in each manuscript.
Thus, extending the metaphor, if a given theme was observed five times in
a manuscript, five chairs at its table would be assigned as a bloc to a new
iconographic theme in each random scenario. In this way, the randomizations
reflect the unequal distribution of themes observed within manuscript
production. This baseline allowed us to interpret the observed iconographic
similarity of a pair of manuscripts depending on whether it is substantially
higher or lower than the similarity in our simulated random dataset.
Using this approach, we created five thousand simulated datasets and
calculated the similarity metrics for each pair of manuscripts.48 Because each
of these simulations can produce a different value, we took the mean for each
pair of manuscripts to get a sense of an expected or typical value under the
assumption of random theme selection. We then subtracted the mean values
of these simulations from the observed similarity values to normalize the
observed measurements, revealing the degree to which the actual metrics
are higher or lower than we would expect to see by chance.49 Of course, since
no manuscript illustrator picked themes at random, we would anticipate that
48 The number of permutations is somewhat arbitrary, and the pragmatic approach to
determining the necessary number is simply to track the point at which the statistic(s) of
interest stabilize. The less variation there is within the observations, the fewer permutations
are necessary to produce a sufficiently varied set of simulations. Ecologists, for example, recommend preliminary tests with five hundred to one thousand simulations, with a more stringent
requirement of ten thousand simulations for publishable results. Legendre and Legendre,
Numerical Ecology, 31.
49 While one can guess whether any given pair of manuscripts will have a high or low Gower
distance based solely on their average number of marginalia, this correlation is broken when
the metric is normalized by the mean of the randomized simulations. In addition to considering
the difference from the mean of the simulated values, researchers may also wish to calculate a
z score (divide the difference from the mean by the standard deviation of the simulated distribution) or create a p value (the percent of the simulated values are greater or less than the observed
values). We are grateful to John Ladd for suggesting these alternatives.
256 | Manuscript Studies
our observed values diverge from these randomized similarity metrics.
However, the observed divergence is relatively small: the mean of the random
simulations never even reaches 50 percent above or below the observed similarity. In the greater or lesser divergence, we found intriguing patterns of
similarity among the manuscripts in our dataset.
One important consequence of incorporating these probabilistic simulations is that, when considered solely on the basis of iconographic categories,
the actual similarity metrics of manuscripts with few marginalia closely
resemble mean values from our random simulation. By contrast, pairs with
many marginalia may diverge significantly from the values expected by
chance. Because manuscripts in particularly high- or low-similarity pairs
tend to have abundant marginalia, these books also tend to be well studied.50
Archaeologists typically deal with this problem of small, “noisy” samples
even before calculating similarity metrics, excluding sites below a certain
threshold of observations. Manuscript researchers may wish to do the same,
as comparable inconsistencies and lacunae characterize manuscript data and
archaeological data.
50 The Maastricht Hours (London, British Library, Stowe MS 17), the Breviary of Renaud
de Bar (London, British Library, Yates Thompson MS 8), and the breviary portion of the
Aspremont-Kievraing Prayer Book (Melbourne, National Gallery of Victoria, inv. 1254-3) are
among the most similar manuscript pairs identified by both the intersection and the Gower
metrics. Other well-studied books with high intersection similarity include the Gorleston
Psalter (London, British Library, Add. MS 49622), the Hours of Jeanne d’Evreux (New York,
Metropolitan Museum of Art, Cloisters Collection, 54.1.2), and the Aspremont-Kievraing
Psalter (Oxford, Bodleian Library, MS Douce 118). Volumes of the Ghent Psalter (Oxford,
Bodleian Library, MS Douce 5) and the Belleville Breviary (Paris, Bibliothèque nationale de
France, MS lat. 10483), the Pontifical of Renaud de Bar (Cambridge, Fitzwilliam Museum,
MS 298), and both volumes of the Arthurian miscellany attributed to the Dampierre group
(Paris, Bibliothèque nationale de France, MS fr. 95; and New Haven, CT, Yale University,
Beinecke Rare Book and Manuscript Library, MS 229) have high similarity according to the
Gower metric. Bibliography for most of these books can be found in Stones, Gothic Manuscripts.
For the others, see Margot McIlwain Nishimura, “The Gorleston Psalter: A Study of the
Marginal in the Visual Culture of Fourteenth-Century England” (PhD diss., New York
University, 1999); Kyunghee Pyun and Anna D. Russakoff, eds., Jean Pucelle: Innovation and
Collaboration in Manuscript Painting (London: Harvey Miller, 2013), especially the essays by
Barbara D. Boehm and Pascale Charron.
Brey and Doyle, Beyond Comparative Analysis | 257
Different Similarity Metrics in Practice
Case studies can demonstrate how different similarity metrics function in
practice and clarify how metrics diverge. The Luttrell Psalter (London,
British Library, Add. MS 42130; 343 marginal images recorded) and the
Taymouth Hours (London, British Library, Yates Thompson MS 13; 279
marginal images recorded) have seventy-one instances of iconographic
overlap between them. Both manuscripts contain many marginal images,
but, when contextualized against a random baseline, the intersection metric
and Gower metric disagree about whether they overlap more or less than
expected by chance. According to the intersection metric (.44), these manuscripts are substantially more similar than we would expect: in our five
thousand random scenarios, the highest overlap was forty-four images,
while the actual observed overlap is significantly higher, at seventy-one.
Because Gower’s metric measures distance, lower or negative numbers
indicate proximity or similarity, while higher numbers indicate distance or
dissimilarity. In contrast to the intersection metric, the adjusted Gower
metric (.0009) indicates they approximate values from our random scenarios,
based not just on the images they share (uniformly weighted so that less
common themes contribute just as much as more common ones) but also
on the images each artist excluded.
To illustrate the different types of similarity that these two metrics
measure, consider a Venn diagram. The intersection metric describes only
the overlapping center of a Venn diagram, focusing exclusively on a pair’s
shared features. Thus, the high intersection metric for the Luttrell Psalter
and the Taymouth Hours reflects shared themes such as images from the
life of Christ, the life of the Virgin, and representations of saints. By contrast,
the Gower metric incorporates every part of the Venn diagram: the individual
circles, their overlap in the center, and everything outside the circles (every
iconographic theme in the dataset absent from both manuscripts). As a result,
the Gower metric better reflects the stark variation in this manuscript pair’s
marginal imagery overall. While the Taymouth Hours presents literary and
religious narratives in marginal cycles, the Luttrell Psalter’s much more
eclectic imagery includes imaginative scenes of hybrid figures, agricultural
258 | Manuscript Studies
labors, courtly pastimes, animals, and more.51 By de-emphasizing a few highly
overlapping categories of images, Gower’s metric counterbalances the intersection metric’s tendency to allow a few areas of intense overlap to overshadow
broader variations. The Luttrell Psalter and the Taymouth Hours illustrate
the implications of Gower’s more comprehensive interpretation of
similarity.
The metrics also diverge in their measurement of similarity between the
Maastricht Hours (London, British Library, Stowe MS 17; 344 marginal
images recorded) and the winter volume of the Breviary of Renaud de Bar
(London, British Library, Yates Thompson MS 8; 253 marginal images
recorded), which share seventy-four images (figs. 4 and 5). In this case, these
two manuscripts are considerably more similar than we would expect to see
based on the number and types of marginal themes they include. The
probabilistic Gower metric (−.0278) is the lowest (most similar) of any pair.
The intersection metric (.4846) also reflects greater similarity than we would
expect to see in a random dataset; however, it is unexceptional compared to
other manuscript pairs in the corpus. From an art-historical perspective, the
higher-than-expected similarity of the Maastricht Hours and the Breviary
of Renaud de Bar is less surprising than that of the Luttrell Psalter and the
Taymouth Hours, given the shared visual emphasis in the former pair on
satirical imagery in the margins throughout each book.52 Here again the
Gower metric matches art-historical expectations.
51 For thorough analyses of the marginal imagery in these two manuscripts, see Michael
Camille, Mirror in Parchment: The Luttrell Psalter and the Making of Medieval England (Chicago:
University of Chicago Press, 1998); Kathryn A. Smith, The Taymouth Hours: Stories and the
Construction of the Self in Late Medieval England (London: British Library, 2012).
52 For the Maastricht Hours, see Judith Oliver, Gothic Manuscript Illumination in the Diocese
of Liège (c. 1250–c. 1330) (Leuven: Peeters, 1988), 1:28–30. For the Breviary of Renaud de
Bar, see Patrick M. de Winter, “Une réalisation exceptionnelle d’enlumineurs français et anglais
vers 1300: Le bréviaire de Renaud de Bar, évêque de Metz,” in La Lorraine: Études archéologiques,
Actes du 103e congrès national des Sociétés savantes (Nancy-Metz, 1978), Section d’archéologie et
d’histoire de l’art (Paris: Bibliothèque nationale de France, 1980), 27–62; Kay Davenport, The
Bar Books: Manuscripts Illuminated for Renaud de Bar, Bishop of Metz (1303–1316) (Turnhout:
Brepols, 2017).
Brey and Doyle, Beyond Comparative Analysis | 259
Figure 4. Marginal illumination of a buffeting game (frog in
the middle). The Maastricht Hours, ca. 1310–20(?). London,
British Library, Stowe MS 17, fol. 142v. © British Library Board.
Metrics diverge in their measurement of similarity between the
Maastricht Hours and the winter volume of the Breviary of
Renaud de Bar (fig. 5), which share seventy-four images.
While the Gower metric is more intuitive in these instances, there are
other cases where it veers from art-historical understandings of similarity.
Including shared absences as well as shared motifs leads Gower distance to
ascribe greater-than-expected similarity to manuscripts that have no overlap.
260 | Manuscript Studies
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
Figure 5. Detail of a buffeting game (frog in the middle). Breviary of Renaud de Bar
(winter volume), 1302–3. London, British Library, Yates Thompson MS 8, fol. 222v. ©
British Library Board.
The Winter Breviary of Renaud de Bar (fig. 5) and a copy of a literary work
the Romance of the Rose, by Guillaume de Lorris and Jean de Meun (Tournai,
Bibliothèque de la Ville, Cod. 101; twenty-six images recorded) epitomize
this counterintuitive feature of the metric. The intersection metric of the
pair (−.0108) signals slightly lower similarity than expected by chance, yet
the Gower metric (−.0094) indicates that the pair slightly exceeds expected
similarity. Not only do these manuscripts share no images in our dataset;
their marginal illuminations draw from entirely different sources and serve
contrasting roles in the text. The Breviary of Renaud de Bar combines
assertions of its clerical owner’s aristocratic identity (heraldry, hunting, and
erotically charged games like frog in the middle; fig. 5) with parodic, often
violent topsy-turvy imagery (including its infamous “killer rabbits”) and
explicitly spiritual content.53 The marginal images in the Romance of the Rose,
53 See Davenport, The Bar Books, 671–94, for a comprehensive index of marginal subjects
in the manuscripts for Renaud de Bar; see also Eleanor Jackson, “Medieval Killer Rabbits:
When Bunnies Strike Back,” British Library Medieval Manuscripts Blog, 16 June 2021, https://
blogs.bl.uk/digitisedmanuscripts/2021/06/killer-rabbits.html. On frog in the middle and other
games in the margins, see Richard H. Randall, “Frog in the Middle,” Metropolitan Museum of
Art Bulletin 16, no. 10 (1958): 269–75; Lilian M. C. Randall, “Games and the Passion in
Pucelle’s Hours of Jeanne d’Évreux,” Speculum 47, no. 2 (1972): 246–57; Madeline H. Caviness,
“Patron or Matron? A Capetian Bride and a Vade Mecum for Her Marriage Bed,” Speculum
Brey and Doyle, Beyond Comparative Analysis | 261
in contrast, illustrate specific passages from the literary work, featuring
characters and allegorical personifications absent from the typical marginal
repertoire.54 These two decorative programs differ in their complete lack of
iconographic overlap and their radically different functions for marginal
illumination. Ultimately, researchers may demur from asserting similarity
based solely on shared exclusions. The similarity by omission of the Tournai
Romance of the Rose and the Winter Breviary of Renaud de Bar reminds
researchers that it is important to understand how metrics are calculated
and to account for this when analyzing results.
No metric perfectly matches the flexible usage of similarity in arthistorical discussions of iconography, but we find that a probabilistic version
of Gower’s metric produces satisfactory results due to its holistic consideration
of the iconographic data, considering the images a pair of manuscripts share
as well as the images that are unique to each and the images that both eschew.
By including all themes but weighting them based on their variation within
the corpus, Gower’s metric also avoids disproportionately emphasizing some
themes over others, as in the comparison of the Luttrell Psalter and the
Taymouth Hours discussed above. Finally, contextualizing the results of a
similarity metric in terms of a random, simulated baseline produces a
measurement not of direct similarity but rather of how the observed similarity
diverges from what we would expect to see by chance.
Using Similarity Metrics
The variety of measurements of similarity reflect the numerous ways historians conceptualize these relationships. Our discussion thus far demonstrates
the nuances between different similarity or distance measurements, despite
the high degree of overlap between them. Once a researcher has selected a
68, no. 2 (1993): 339. Although Martine Meuwese does not address the inspiration for the
killer rabbit in Monty Python and the Holy Grail, she identifies Randall’s Images in the Margins
of Gothic Manuscripts as a main source for Terry Gilliam’s animations for the film in “The
Animation of Marginal Decorations in ‘Monty Python and the Holy Grail,’” Arthuriana 14,
no. 4 (2004): 45–58.
54 Lucien Fourez, “Le roman de la rose de la Bibliothèque de la ville de Tournai,” Scriptorium
1, no. 2 (1946): 216.
262 | Manuscript Studies
suitable similarity measurement, they can use it to answer questions about
their dataset. In this section, we highlight four approaches for analyzing the
results of similarity metrics we have found especially effective for our project:
working directly with similarity metrics, hunting for outliers among concise
numerical representations known as summary statistics, statistical tests like
Analysis of Similarity, and exploratory approaches such as clustering.
Working Directly with Similarity Metrics
Researchers can work directly with similarity metrics to answer questions
about specific pairings within their dataset.55 Since our corpus includes several
multivolume manuscripts, we were able to pose questions about the iconography of manuscript volumes or fragments created together. How much
iconographic consistency (or repetition) characterizes multivolume works?
Although artists and patrons likely conceived these multivolume or fragmentary manuscripts as single projects, our data distinguish between volumes
in such a way that they function, as far as the computational analyses go, as
separate books. Here, we used these sometimes arbitrary divisions to demonstrate how similarity metrics illuminate the iconographic relationship
between two sections or parts of a single work.
The question is this: Taking the pairs of multivolume manuscripts in
isolation, is their similarity significantly higher than we would expect to
see by chance? Thirty of the 237 total manuscripts in our dataset represent
either discrete volumes or substantial fragments of fourteen complete works.
Most of these multivolume manuscripts are split into two volumes or fragments, but two works are bound in three volumes. Having calculated similarity measurements for every manuscript pair represented in our dataset, we
filtered the results to show only the manuscript pairs that comprise these
multipart works.
55 Because of the large number of comparisons produced by a similarity analysis, researchers
should exercise some caution when interpreting individual results. Specifically, more stringent
criteria for significance may be appropriate. For a brief discussion of this issue, see Spiegelhalter,
The Art of Statistics, 278–80.
Brey and Doyle, Beyond Comparative Analysis | 263
The similarity metrics within this small set of multivolume manuscripts
reveal patterns in artists’ working practices that shifted based on the density
of illumination they created. Books with dense or abundant marginal imagery
tend to have iconographically similar volumes or fragments, showing artists
revisiting certain iconographic motifs throughout the work (fig. 6). In
contrast, iconographic similarity is less pronounced in books with more
sparse marginal illumination, reflecting both the close matches between
expected and simulated metrics for sparse-marginalia manuscripts and a
preference for greater variety in manuscripts with fewer illuminations.
Densely illustrated multivolume manuscripts show how iconographic
themes tend to repeat across volumes, resulting in higher similarity scores
than expected by chance. As figure 6 shows, one manuscript pair with high
similarity measurements is a two-volume psalter for Ghent use, today in
Figure 6. Scatterplot of pairs of multivolume manuscripts, showing the average marginal
image count per pair against the probabilistic Gower distance. Manuscript pairs with more
marginal images have lower distances (higher similarity) than manuscript pairs with fewer
marginalia, which have distances closer to the random baseline (the 0 grid line on the y
axis). Based on metrics calculated from the entries in Randall, Images in the Margins of
Gothic Manuscripts.
264 | Manuscript Studies
the Bodleian Library, Oxford (MS Douce 5 and 6, figs. 7–9).56 The two
volumes of this work were each executed by different artists, both with
wide-ranging careers and distinct approaches to the planning and design
of their marginal illumination.57 Despite these differences, the manuscripts
share an overlap of 63 images between the 158 recorded for MS Douce 5
and the 279 for MS Douce 6. The two volumes’ iconographic coherence
results in higher-than-expected measurements of similarity (−.02). Coordination between the patron and the two artists precipitated the high
measurement of similarity within this multivolume manuscript: both artists
of the Ghent Psalter were working toward the same thematic specifications.
Their higher-than-expected similarity also demonstrates the role of repetition in densely illuminated books, as artists often chose to reprise important
themes throughout.
Among these pairs of manuscripts, measured similarity decreases substantially once the average image counts between manuscripts falls below
forty-five.58 This value reflects a threshold above which artists tended to
begin repeating themselves but below which they maintained iconographic
variation akin to that expected in our random scenario. The three volumes
of the Beaupré Antiphonary (Baltimore, Walters Art Museum, MS W.759–761)
have only between thirteen and sixteen images recorded in each volume and
overlaps between zero and two (fig. 10).59 The similarity measurements for
the Beaupré volumes (between −.0031 and −.0023) are effectively indistinguishable from random overlap. Whereas multivolume manuscripts with a
profusion of marginal images seem to draw repeatedly on the same marginal
motifs, creating a coherent and measurably similar collection of images,
56 Stones, Gothic Manuscripts, part 1, 2:344–54; Elizabeth Solopova, Latin Liturgical Psalters
in the Bodleian Library: A Select Catalogue (Oxford: Bodleian Library, 2013), 379–87.
57 Stones, Gothic Manuscripts, part 1, 2:351–52.
58 By contrast, this threshold is closer to an average of one hundred instances of marginalia
for the probabilistic intersection metric, suggesting that the probabilistic Gower metric performs
substantially better when analyzing more sparsely illustrated manuscripts.
59 Lilian M. C. Randall, Medieval and Renaissance Manuscripts in the Walters Art Gallery,
vol. 3, pt. 1, Belgium, 1250–1530 (Baltimore: Johns Hopkins University Press, 1997), 25–56;
Stones, Gothic Manuscripts, part 1, 2:384–97.
Brey and Doyle, Beyond Comparative Analysis | 265
Figure 7. Marginal illuminations of a beggar with a basket on his back
containing an ape (right) and a bestiary representation of a unicorn hunt
(below). Psalter for Ghent use, ca. 1315–25. Oxford, Bodleian Library, MS
Douce 5, fol. 74r. Reproduced according to the terms and conditions of the
CC-BY-NC 4.0 license. The manuscript pair Douce 5 and 6 (see figs. 8 and 9),
made by different artists for the same commission, show higher-than-expected
measurements of similarity, reflecting the coordination between the artists
and the patron and demonstrating the role of repetition of key iconographic
themes in extensively illuminated manuscript projects.
266 | Manuscript Studies
Figure 8. Detail of a beggar with a basket
on his back containing an ape. Psalter for
Ghent use, ca. 1315–25. Oxford, Bodleian
Library, MS Douce 6, fol. 153r.
Reproduced according to the terms and
conditions of the CC-BY-NC 4.0 license.
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
Brey and Doyle, Beyond Comparative Analysis | 267
Figure 9. Detail of a bestiary representation of a unicorn hunt. Psalter for Ghent use, ca. 1315–25.
Oxford, Bodleian Library, MS Douce 6, fol. 39r. Reproduced according to the terms and conditions
of the CC-BY-NC 4.0 license.
manuscripts with fewer marginal images contain more limited repetitions,
making their discrete parts no more similar than we would expect to see by
chance. Whether this quantitative threshold of marginal images reveals a
trend in manuscript artists’ approaches to large-scale book design and what
it might mean remain open questions for future work. Similarity measurement provides a means to investigate such large-scale patterns in artists’
working practices across manuscripts.
268 | Manuscript Studies
Figure 10. Composite image of sequential openings from the Beaupré Antiphonary,
including one page with marginal illuminations of a fishmonger, a man holding a scroll,
and a winged hybrid, 1289–90. Baltimore, Walters Art Museum, MS W.759, fols. 105v–113r
(marginalia on fol. 108r). Reproduced according to the terms and conditions of the CC0
license.
Brey and Doyle, Beyond Comparative Analysis | 269
Summary Statistics and Outliers: Average Similarity
We can quickly discern whether a manuscript is typical or unusual in its
iconographic themes by calculating summary statistics like the average of
all its similarity metrics. This approach can identify specific outliers at each
extreme to serve as case studies for further research.
Manuscripts that use marginal images as a kind of subject-specific illustration tend to have quite low average similarity, such as the Veil rentier
d’Audenarde (The rent-book of Audenarde, Brussels, Bibliothèque royale de
Belgique, MS 1175), a manuscript unique in our dataset that lists and
illustrates the landholdings of a lord in eastern Flanders.60 Devotional works
that contain lengthy cycles of marginal illustrations, often depicting unusual
biblical episodes in sequence, also have low average similarity but, in some
cases, high pairwise similarity to each other.61 Manuscripts with numerous,
specialized marginalia that exclude many common themes make up the bulk
of both categories. As heavily illustrated outliers, these low-similarity manuscripts have often attracted scholarly attention for their unusual approach to
the use of images.
60 Léo Verriest, Le Polyptyque illustré dit “Veil rentier” de Messire Jehan de Pamele-Audenarde
(vers 1275) (Brussels: printed by the author, 1950); Margaret Goehring, “Signs of the City:
Seigniorial Power and Vernacular Visual Culture in Two Northern French Rent-Books,” Studies
in Iconography 41 (2020): 1–29; see also Stones, Gothic Manuscripts, part 1, 2:297–98. Other
unusual manuscripts with low average Gower similarity include Frederick II, De arte venandi
cum avibus (Paris, Bibliothèque nationale de France, MS fr. 12400) and the Bird Psalter
(Cambridge, Fitzwilliam Museum, 2-1954). Interestingly, two manuscripts with relatively high
average intersection similarity have low average Gower similarity: the Copenhagen Psalter
(Copenhagen, Det Kongelige Bibliotek, MS G.K.S. 3384.8°) and the Hours of Jeanne d’Évreux
(New York, Metropolitan Museum of Art, Cloisters Collection, 54.1.2).
61 The Gower distance metric in particular assigned manuscripts with extensive narrative
cycles low average similarity. These included the Taymouth Hours (London, British Library,
Yates Thompson MS 13), the Isabella Psalter (Munich, Bayerische Staatsbibliothek, MS Cod.
gall. 16), the Tickhill Psalter (New York, New York Public Library, MS Spencer 26), and the
Queen Mary Psalter (London, British Library, Royal MS 2 B VII). The Gower metric measured
the Isabella Psalter as more similar to the Queen Mary Psalter (−.011) than expected by chance
but slightly more dissimilar to the Taymouth Hours (.007).
270 | Manuscript Studies
Manuscripts with the highest average similarity primarily consist of
personal devotional books (psalters, hours, and breviaries), such as the winter
volume of the Breviary of Renaud de Bar, the Maastricht Hours, and one
volume of the psalter for Ghent use (MS Douce 5; Figs. 4, 5, and 7). These
manuscripts use marginalia as a signifier of courtliness, alternating between
moralizing parodies, allusive commentaries on or references to the adjacent
text, and decorative fillers. Their eclectic approach stands in contrast to the
narrative or illustrative cycles that characterize highly illuminated manuscripts with low average similarity.
Three manuscripts with particularly high average similarity depart from
these standard devotional texts: two volumes from a set of Arthurian
romances from the late thirteenth century (Paris, Bibliothèque nationale de
France MS fr. 95; and New Haven, CT, Yale University, Beinecke Rare Book
and Manuscript Library, MS 229) and a mid-fourteenth-century copy of the
Romance of the Rose (Paris, Bibliothèque nationale de France, MS fr. 25526).
The Arthurian volumes belong to the so-called Dampierre Group of luxury
manuscripts, illuminated for members of the courtly circle of Guy of Dampierre, count of Flanders (d. 1305), largely by two artists working in the region
of Thérouanne.62 These artists also produced five other manuscripts included
in our dataset, four of which are devotional manuscripts.63 Crossover in their
repertoire of marginal images and their patrons’ desire for prolific marginalia
62 Alison Stones, “The Illustrations of BN, Fr. 95 and Yale 229: Prolegomena to a Comparative Analysis,” in Word and Image in Arthurian Literature, ed. Keith Busby (New York: Garland,
1996), 203–83; Elizabeth Moore Hunt, Illuminating the Borders of Northern French and Flemish
Manuscripts, 1270–1310 (New York: Routledge, 2007), 6, 79–110; Stones, Gothic Manuscripts,
part 1, 2:550–75; Emily R. Shartrand, “Sexual Warfare in the Margins of Two Late-ThirteenthCentury Franco-Flemish Arthurian Romance Manuscripts” (PhD diss., University of Delaware,
2020). On the Dampierre Group more broadly, see Hunt, Illuminating the Borders; Kerstin
Carlvant, Manuscript Painting in Thirteenth-Century Flanders: Bruges, Ghent and the Circle of
the Counts (London: Harvey Miller, 2012), 117–35.
63 The Psalter of Guy of Dampierre (Brussels, Bibliothèque royale de Belgique, MS 10607);
Guillaume de Tyre, Histoire de la guerre sainte (Paris, Bibliothèque nationale de France, MS
fr. 2754); the Margaret the Black Psalter (private collection; formerly Christie’s, Arcana
Collection, lot 29; Sotheby’s 21.vi.88, lot 37; Kraus 75/88); and the two-volume Franciscan
Psalter-Hours (Paris, Bibliothèque nationale de France, MS lat. 1076 and Marseille, Bibliothèque municipale, MS 111).
Brey and Doyle, Beyond Comparative Analysis | 271
may help explain the surprisingly high average similarity of these Arthurian
manuscripts. The Romance of the Rose comes from the Paris workshop of
married manuscript makers Jeanne and Richard de Montbaston, who appear
to have specialized in vernacular literature, serving both courtly and more
modest patrons.64 The nineteen copies of the Rose attributed to the Montbastons reflect the wide economic range of their patrons; this is the most
extensively illuminated of the group.65
These devotional and literary volumes possess high average iconographic
similarity, but they are not “typical” manuscripts as such—far from it. They
contain many more instances of marginalia and many more iconographic
themes than the average manuscript. Contextualizing these manuscripts
using summary statistics, we may understand them as extreme examples
from the center of their manuscript culture.
Analysis of Similarity
Manuscript researchers can use similarity metrics to test assumptions about
a priori groups with a specific statistical test known as Analysis of Similarity.
This type of statistical hypothesis testing is designed to see whether the
observed difference in a sample is likely to be reflected in the population
from which it is drawn. This is especially helpful for understanding how
surviving manuscripts might reflect the much larger corpus of manuscripts
lost to time. For example, art historians may wish to ask whether two regions
had distinct or interconnected manuscript cultures, especially when researchers have traditionally studied those manuscripts within nationalist frameworks that separate objects based on modern or historical geopolitical
borders. Our example dataset consists of 237 manuscripts produced on either
side of the English Channel, with 59 localized to England, and 178 localized
64 Rouse and Rouse, Manuscripts and Their Makers, 1:234–60; see also Camille, Image on
the Edge, 147–49; Sylvia Huot, The Romance of the Rose and Its Medieval Readers: Interpretation, Reception, Manuscript Transmission (Cambridge: Cambridge University Press, 1993),
273–322.
65 Rouse and Rouse, Manuscripts and Their Makers, 1:242–43.
272 | Manuscript Studies
to continental Europe, predominantly France and Flanders. Given the iconographic data discussed above, we can ask, Were manuscripts produced on
one side of the English Channel more iconographically similar to each other
than they were to manuscripts produced on the other side? This is precisely
the type of question that can be answered with an Analysis of Similarity
test, which evaluates whether the similarity of all pairs within each group
is significantly greater than the similarity of all pairs across groups (that is,
greater than we would expect to see by chance if there was no difference in
the larger corpus from which extant manuscripts are preserved).66
Performing an Analysis of Similarity test on our data, using Gower’s
metric, we found that there is no clear difference between the marginalia of
trans-Channel pairs of manuscripts and cis-Channel pairs of manuscripts.
The Analysis of Similarity test produces a result between −1 and 1. Higher
in-group similarities yield a number closer to 1, and higher between-group
similarities produce a number closer to −1. In the case of our analysis, the
test statistic is −.02941, indicating that the difference between in-group and
between-group similarity is very small. This number is also one we would
expect to see in a random scenario: 73 percent of ten thousand random simulations with no underlying difference between groups produced a result at
least as extreme as this one, so this tiny divergence between the in-group
and between-group similarity is quite plausibly a product of chance.67 A few
motifs exhibit clear regional patterns. Elephants and the miracles of the
Virgin, for example, are more common in English manuscripts, whereas
66 An Analysis of Similarity test could also be a first step in extending Madeline H. Caviness’s
inquiry into the gendered imagery of the margins in fourteenth-century prayer books made
for elite women and men. Caviness, “Hedging in Men and Women: The Margins as an Agent
of Gender Construction,” in Reframing Medieval Art: Difference, Margins, Boundaries (selfpublished, 2001), ch. 3, http://dca.lib.tufts.edu/caviness/chapter3.html. Ecologists often use
this approach to compare the impact of different variables on ecological communities. Legendre
and Legendre, Numerical Ecology, 603–11; Pier Luigi Buttigieg and Alban Ramette, “A Guide
to Statistical Analysis in Microbial Ecology: A Community-Focused, Living Review of
Multivariate Data Analyses,” FEMS Microbiology Ecology 90, no. 3 (December 2014): 547.
67 In this analysis, we use the original version of Gower’s metric, with ten thousand permutations to check the significance of our findings, that is, whether the observed results are distinguishable from randomized datasets. The statistical significance is the probability of obtaining
test results at least as extreme as the results actually observed within these ten thousand randomized permutations.
Brey and Doyle, Beyond Comparative Analysis | 273
bandyball and fables are more common in French and Flemish manuscripts.
On the whole, however, the test shows no evidence of articulated distinctions
between manuscripts on opposing sides of the Channel, but neither does it
preclude their existence. Analysis of Similarity indicates that the regional
origins of a manuscript pair can predict almost nothing about their iconographic similarity, underscoring long-standing critiques of nationalist
approaches to manuscript studies and challenging the continued use of such
frameworks in major catalogs and funding initiatives.68
Clustering
While Analysis of Similarity measures the similarity of manuscripts within
pre-identified groups, other methods known as clustering or community
detection use indices of similarity as a quantitative basis for identifying groups
in a dataset. Many researchers calculate similarity measures principally as a
grounds for clustering, rather than analyzing the resulting metrics directly.
Clustering methods assign entities to groups based on some set of criteria
related to their proximity or density.69 Some clustering tools automatically
determine an optimal number of groups based on some set mathematical
criteria, while others require users to select a number.70 Clustering is an
open-ended, exploratory method that can provide researchers with a different
perspective on relationships within their dataset and prompt new questions.
When a set of clusters is well differentiated, different proximity-based clustering
68 Lucy Freeman Sandler, “Illuminated in the British Isles: French Influence and/or the
Englishness of English Art, 1285–1345,” Gesta 45, no. 2 (2006): 177–88; Sandler, Gothic
Manuscripts 1285–1385 (London: Harvey Miller, 1986), 16–23; Stones, Gothic Manuscripts,
part 1, 1:18–19. For nationalism and the historiography of medieval art more broadly, see
Jonathan J. G. Alexander, “Medieval Art and Modern Nationalism,” in Medieval Art: Recent
Perspectives: A Memorial Tribute to C. R. Dodwell, ed. Gale R. Owen-Crocker and Timothy
Graham (Manchester: Manchester University Press, 1998), 206–23; Richard Marks, “The
Englishness of English Gothic Art?,” in Gothic Art & Thought in the Later Medieval Period:
Essays in Honor of Willibald Sauerländer, ed. Colum Hourihane (Princeton, NJ: Index of
Christian Art, 2011), 64–89.
69 Han, Kamber, and Pei, Data Mining, 444.
70 Christian M. Hennig, Marina Meila, Fionn Murtagh, and Roberto Rocci, eds., Handbook
of Cluster Analysis (Boca Raton, FL: CRC Press, 2016), 14–15.
[192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library
274 | Manuscript Studies
algorithms will often identify very similar groups. By contrast, when the size
or even the presence of clusters is more ambiguous, as is often the case with
similarity indices based on many features, different methods can produce very
different results. Researchers must draw on established knowledge about their
subjects and their own expertise to determine whether the resulting clusters
can serve as proxies for some real historical phenomenon. Biologists often use
clustering to create taxonomies or to formulate preliminary hypotheses about
causal relationships that can be tested further.71 In manuscript studies, stemmatologists already use distance measurements and clustering algorithms to
produce manuscript stemmae based on textual discrepancies.72 Given lacunae
in manuscript preservation, it may be especially helpful here to learn from
fields like paleontology, where the chance loss or preservation of fossils blurs
together groups that would otherwise be easily distinguished.73
We applied an agglomerative hierarchical clustering algorithm known as
Ward’s method to the distance matrix based on the probabilistic Gower
index.74 Although this method, like many standard methods for clustering,
is not ideal for high-dimensional data like our iconographic themes, we
adopted this approach to illustrate clustering here because of its widespread
use on datasets with fewer features.75 We visualized the result using a dendrogram, or tree diagram (fig. 11), showing the proximity of manuscripts to
71 Historians of material culture and textual transmission have criticized biological metaphors,
yet these analogies continue to shape our models of historical processes. Bence Nanay, “George
Kubler and the Biological Metaphor of Art,” British Journal of Aesthetics 58, no. 4 (December
18, 2018): 424–26; Armin Hoenen, “Evolutionary Models in Other Disciplines,” in Roelli,
Handbook of Stemmatology, 534–86.
72 Teemu Roos, “Computational Construction of Trees,” in Roelli, Handbook of Stemmatology,
317–19.
73 Matthew J. Vavrek, “A Comparison of Clustering Methods for Biogeography with Fossil
Datasets,” PeerJ 4 (February 25, 2016): e1720. This article focuses on results from clustering
methods that classify observations into a predetermined number of clusters.
74 Ward’s method is one of several approaches to clustering that seeks to minimize variance
within clusters, prioritizing this over other criteria. Legendre and Legendre, Numerical
Ecology, 360.
75 Existing approaches to clustering high-dimensional data, such a subspace clustering,
focus on reducing the number of dimensions while minimizing information loss. Ira Assent,
“Clustering High Dimensional Data,” WIREs Data Mining and Knowledge Discovery 2, no. 4
(2012): 340–50.
Brey and Doyle, Beyond Comparative Analysis | 275
Figure 11. Dendrogram of the iconographic clusters of French, Flemish, and English
manuscripts with marginalia. Of the fifteen color-coded clusters, the two largest
(purple and magenta) contain manuscripts that are about as similar to each other as
expected by chance. The remaining sixty-nine manuscripts form smaller clusters of
iconographically similar manuscripts, with a few thematically unusual manuscripts in
clusters of one (visible in details A and B). Data and manuscript identifiers based on
Randall, Images in the Margins of Gothic Manuscripts. For modern shelfmarks and
locations, see nn 77 and 78.
276 | Manuscript Studies
each other, which we cut into fifteen clusters. The number of clusters here
is subjective. In this case, we determined it using the minimum number of
clusters that satisfied a stopping criterion: that no single cluster should
contain more than half of the manuscripts. Our goal in setting this stopping
criterion was to create a usable visualization of the data, rather than a definitive typology.
The resulting diagram offers the possibility to consider similarity as a basis
for groupings. Of the fifteen color-coded clusters, the two largest (purple and
magenta, containing 100 and 68 manuscripts, respectively) contain manuscripts that are about as similar to each other as expected by chance.76 Smaller
clusters contain between 2 and 16 manuscripts that are more similar to each
other than we would expect to see by chance, and 3 thematically unusual
manuscripts form their own clusters in isolation.77
Some of the smaller clusters appear to be linked by patronage or other
aspects of production that were not directly factored in to calculating the
distance metrics or clusters. The concentration of five royal women’s manuscripts with an iconographic emphasis on narrative cycles within two closely
related clusters corresponds with our findings regarding manuscripts with
low average similarity and further affirms them as a distinct group within
the dataset ripe for further investigation (fig. 11, detail A).78
76 The manuscripts in these two clusters also tend to have fewer recorded marginal images
than the manuscripts sorted elsewhere. Manuscripts in the largest clusters have an average of
4.7 and 24.1 instances of marginalia recorded, compared with an average of 56.8 instances in
the next-largest substantial cluster. These two large clusters tend to group manuscripts about
which it is difficult to make substantive conclusions solely on the basis of this iconographic data.
77 The Smithfield Decretals (London, British Library, Royal MS 10 E IV), the rent-book
of Audenarde (Brussels, Bibliothèque royale de Belgique, MS 1175), and the Tickhill Psalter
(New York, New York Public Library, MS Spencer 26) form their own clusters.
78 These manuscripts are the Hours of Jeanne de Navarre (Paris, Bibliothèque nationale de
France, MS nouv. acq. lat. 3145, ex-Yates Thompson MS 75), the Hours of Yolande of Flanders
(London, British Library, Yates Thompson MS 27), the Taymouth Hours (London, British
Library Yates, Thompson MS 13), the Queen Mary Psalter (London, British Library, Royal
MS 2 B VII), and the Isabella Psalter (Munich, Bayerische Staatsbibliothek, MS Cod. gall.
16). See above, n61. While many of these manuscripts have been thoroughly studied individually
or in the context of artist and workshop productions, we know of no work considering these
Brey and Doyle, Beyond Comparative Analysis | 277
Clustering may also produce surprising juxtapositions, which can raise
new questions about well-studied manuscripts. For instance, why was the
East Anglian Luttrell Psalter clustered with the Belleville Breviary (Paris,
Bibliothèque nationale de France, MS lat. 10483–4) and the Breviary of
Charles V (Paris, Bibliothèque nationale de France, MS lat. 1052), from the
Paris workshops of Jean Pucelle and his follower, Jean le Noir, respectively
(fig. 11, detail B)? Because Jean le Noir copied elements of the Belleville
Breviary’s marginal iconography in his breviary for Charles V, their close
relationship is unsurprising, rendering their association with the Luttrell
Psalter all the more unexpected. Although the Luttrell Psalter artist did
not intentionally reference the iconography of Pucelle’s breviary as Jean le
Noir did, their association in the cluster raises interesting questions about
the nature of the visual language of marginal iconography shared across the
Channel.79
Conclusion
The complexity of manuscripts as objects has inspired a rich tradition of
structured description, which researchers and catalogers continue to expand.
Methods like similarity measurement and its analysis permit researchers to
use this wealth of structured data at scales that were previously too time
consuming to be practical. We hope that this article has demonstrated the
utility of similarity metrics beyond specialized applications in areas such as
stemmatic analysis, and that they will continue to find wider use among
manuscript researchers. Researchers can use similarity measurement to
compare multifaceted historical objects, drawing on both existing sources
for structured manuscript data and original research. Similarity metrics will
early fourteenth-century manuscripts with narrative marginal imagery as a cross-Channel
courtly phenomenon.
79 François Avril, Les fastes du Gothique: Le siècle de Charles V (Paris: Bibliothèque nationale
de France, 1981), 295–96, 333–34; Joan A. Holladay, “Jean Pucelle and His Patrons,” in Jean
Pucelle: Innovation and Collaboration, ed. Kyunghee Pyun and Anna D. Russakoff (London:
Harvey Miller, 2013), 25.
278 | Manuscript Studies
not suit every situation, particularly cases where researchers compare objects
with many features. When applied to appropriate research questions, however,
they may provide new insights into datasets that are too time consuming to
analyze using traditional approaches.
Similarity metrics can find outliers, answer specific questions, and
highlight broad patterns within a set of manuscripts. When compared to
a random baseline, they can also provide a sense of whether observed
patterns may be attributed to chance or should instead be interpreted as
reflecting substantive historical phenomena. On their own, similarity
measurements allow researchers to compare pairs of objects and identify
exceptional cases based on summary statistics like average similarity.
Methods like Analysis of Similarity and clustering extend their usefulness,
permitting investigators to answer research questions about the comparison
of a priori categories, the identification of new groupings, and the centrality
of objects in a corpus. Many quantitative methods for comparing and
categorizing multifeature observations build on principles akin to those
employed in the calculation of similarity metrics, so experimenting with
these concepts can be a step toward other computational approaches to
analyzing manuscript data.
Quantitative metrics require explicit definitions of similarity, which
encourages researchers to reflect on conventional conceptions of similarity
and difference. In addition to inviting researchers to reevaluate and define
foundational disciplinary concepts, such transparency also encourages less
sweeping, more carefully circumscribed arguments. More than simply a tool,
similarity measurement, like other computational approaches, presents a
framework for reimagining the possibilities of humanistic inquiry.
As new forms and quantities of data about manuscripts become accessible
around the world, researchers seeking to see both the forest and the trees
must increasingly turn to such scalable, quantitative approaches if they wish
to discern macro-patterns and the place of individual objects within them.
While quantitative approaches suited to analyzing these macro-patterns have
traditionally been perceived as simplifications or reductions of nuance and
historical ambiguity, the work we have presented here demonstrates the
potential benefits of this trade-off. Similarity metrics let researchers step back
Brey and Doyle, Beyond Comparative Analysis | 279
and consider the big picture, while capturing more of the complexity of
humanistic usage of “similarity” than one might expect. We see the profusion
of distinct quantitative metrics of similarity as a counterpart to, rather than
foreclosure of, the subtleties inherent in conventional usage of the term similarity. If the criteria for comparing manuscripts rest in the eye of the beholder,
similarity metrics render them transparent, explicit, and expandable.
280 | Manuscript Studies
Appendix: Sample Calculations
Although general equations for calculating metrics are readily accessible
online, step-by-step demonstrations of their implementation are less common.
The calculation of these metrics by hand is not necessary thanks to their
implementation as functions in programming languages like R or Python,
or in free, open-source packages or libraries, however we do so here so readers
can get an intuition of how different metrics may produce startlingly different
results from the same data. In table 1, we provide a toy dataset, which we
will use as the basis for calculating metrics. Because Gower’s metric requires
the range of each feature, we include a separate row totaling the range for
each column.
Table 1. A Toy Dataset with Feature Counts
Feature 1
Feature 2
Feature 3
MS 1
10
5
0
MS 2
0
5
10
MS 3
20
10
80
Range of Feature
(difference between largest and smallest value)
20
5
80
The Brainerd-Robinson similarity metric is calculated with percentages
rather than absolute values, so we convert the values to percentages for each
manuscript in table 2.
Table 2. Features as Percentages
Feature 1 (%)
Feature 2 (%)
Feature 3 (%)
MS 1
67
33
0
MS 2
0
33
67
MS 3
18
9
73
Sample calculations follow for each metric, using the feature values for one
pair of manuscripts (MS 1 and MS 2).
Brey and Doyle, Beyond Comparative Analysis | 281
min{10,0} + min{5,5} + min{0,10} = 5
Intersection
Similarity
min{10,0} + min{5,5} + min{0,10} = 5
Brainerd-Robinson Similarity
200 − 67 − 0 + 33 − 33 + 0 − 67
= 66
200Distance
− 67 − 0 + 33 − 33 + 0 − 67 = 66
Gower’s
1 –
1
× 1−
3
10 − 0
20
+ 1−
5−5
5
+ 1−
0 − 10
80
= .21
Calculating
these10values
1
− 0 for the remaining
5 − 5 two possible0pairs
− 10 of manu×
1
−
+
1
−
+
1
−
= .21
1 –
scripts results
in table 3,
to each metric
3
20with the most similar
5 pairs according 80
highlighted in bold. Note that Gower’s metric is a distance metric rather
than a similarity metric, so a lower value result indicates greater proximity
or similarity.
Table 3. Pairwise Metrics
Intersection
Similarity
Brainerd-Robinson
Similarity
Gower’s Distance
MS 1-MS 2
5
66
0.21
MS 1-MS 3
15
54
0.83
MS 2-MS 3
15
152
0.95
The intersection metric assigns MS 3 high similarity to the other two
manuscripts based in part on the large number of features present in that
manuscript overall. By contrast, because Brainerd-Robinson’s metric considers only percentages, it clearly differentiates between the similarity of MS 1–MS
3 and MS 2–MS 3. Gower’s distance down-weights features with large ranges
(feature 3) and puts emphasis instead on features with small ranges (feature 2),
resulting in the highest pairwise similarity going to MS 1 and MS 2 for that
metric.