Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Beyond Comparative Analysis: Making Arguments with Similarity Metrics and Structured Manuscript Data, with a Case Study in Marginal Iconography Alexander Patrick Brey, Maeve K. Doyle Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies, Volume 8, Number 2, Fall 2023, pp. 232-281 (Article) Published by University of Pennsylvania Press DOI: https://doi.org/10.1353/mns.2023.a916130 For additional information about this article https://muse.jhu.edu/article/916130 [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library Beyond Comparative Analysis: Making Arguments with Similarity Metrics and Structured Manuscript Data, with a Case Study in Marginal Iconography A lex a nder Patr ick Br ey Wellesley College M a eve K . Doyle [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library Eastern Connecticut State University esearchers specializing in medieval European manuscripts have a long tradition of systematically describing their objects of investigation. As objects produced through copying, manuscripts tend to share certain physical characteristics: pages within a gathering, lines on a page, measurements of page and text block, conventions of script. Furthermore, as manuscripts materialize conceptual systems (reflecting religious practices, intellectual frameworks, or literary taste), they lend themselves to a certain conceptual standardization. Since the late nineteenth century, the material for data-driven manuscript studies has been rich but siloed. Today, aggregating manuscript data creates opportunities for largescale corpus analyses that were previously rare and laborious. R Alexander Patrick Brey and Maeve K. Doyle contributed equally to this work. This project developed in part through our participation in the Getty Advanced Workshop in Network Analysis and Digital Art History (NA+DAH), and we are grateful to the participants and organizers for their feedback and suggestions, as well as John Ladd and Carolyn Anderson for their comments on drafts. Our research has been supported by Eastern Connecticut State University, the Getty Research Institute, and Wellesley College. We presented part of this project at the International Medieval Congress at the University of Leeds in 2021. Brey and Doyle, Beyond Comparative Analysis | 233 Manuscript scholars wishing to explore these new opportunities can learn from other disciplines’ data-driven approaches and adapt them to their own research questions. In this paper, we consider the application of one quantitative approach that manuscript historians could use to understand the large groups of manuscripts becoming available for study either as metadata or images: similarity measurement. This approach permeates the technological infrastructure behind academic, commercial, and even governmental systems.1 Our ongoing work with similarity measurement, and our research into its established use in ecological, archaeological, and linguistic research, has led us to our method of contextualizing results through comparison with random simulations. Measures of similarity, sometimes referred to as proximity or distance metrics, offer promising possibilities for manuscript research. In this article we introduce and model the use of similarity metrics for manuscript scholars who are considering adopting such approaches or who may never even have heard of them.2 Although we frame this discussion in terms of an interdisciplinary community of manuscript scholars, we especially seek to address scholars in our own field of art history, which has lagged behind the other humanities in exploring the analytical possibilities of computational methods.3 We argue that quantitative measurements of similarity can help researchers understand the large groups of manuscripts now available for study either as metadata or images. In adapting approaches from other fields, our experience has led us to conclude that researchers benefit the most from such measurements when they can compare them to simulations to contextualize observed data (what statisticians would call a “sample” 1 For a general introduction, see Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques (Waltham, MA: Morgan Kaufmann, 2012), 65–79. 2 Strictly speaking, distance metrics must adhere to a set of axioms or requirements pertaining to spatial reasoning, while approaches to quantifying similarity that diverge from these axioms are usually referred to as “indices.” In this article, we will use the term “metric” throughout, even when discussing approaches that violate the axioms of metrics. 3 The contributions of Lev Manovich on methods from data science are particularly relevant to our argument. Manovich, “Data Science and Digital Art History,” International Journal for Digital Art History 1 (2015): 11–35; for a survey of recent developments in the field, see Kathryn Brown, ed., The Routledge Companion to Digital Humanities and Art History (New York: Routledge, 2020). 234 | Manuscript Studies from the historical “population”). This simulation-based approach roughly corresponds to the intuitive understanding that experienced researchers develop: whether similarities are simply expected within a shared manuscript culture or whether they might indicate a more specific connection. This article introduces formal methods for quantifying the similarity of manuscripts based on a set of features, focusing on iconographic overlap as a case study. First, we argue that the value of such methods is increasing thanks to the deluge of digitized materials and manuscript metadata that is becoming available. Then, we introduce different approaches to quantifying similarity and discuss how manuscript researchers might learn from disciplines like archaeology, in which such approaches are widely employed.4 Next, we turn to a concrete case study: images in the margins of English, French, and Flemish Gothic manuscripts. We use this example to illustrate the process of selecting a relevant similarity metric. We then demonstrate some of the ways that researchers may use similarity metrics to answer research questions. We conclude that similarity measurement permits manuscript researchers to answer otherwise intractable questions while provoking exciting opportunities to revisit assumptions about key disciplinary concepts. Studying Manuscripts in the Age of Computers Although the rapid adoption of computers and online databases has revolutionized how researchers find and access manuscripts, these technological advancements have played a more limited role in transforming the core methods of manuscript research. The rapid increase in digitization has allowed traditional manuscript research to move outside of the reading room, 4 We restrict our discussion in this article to methodological issues related to such metrics but direct readers who wish to pursue technical implementation to the excellent tutorial in the Programming Historian: John R. Ladd, “Understanding and Using Common Similarity Measures for Text Analysis,” Programming Historian 9 (2020), https://doi.org/10.46430/ phen0089. Brey and Doyle, Beyond Comparative Analysis | 235 inspiring previously unimaginable depth and engagement in remote analyses.5 New approaches to digitization may offer users the ability to manipulate manuscripts in three dimensions, examine their internal physical structure, or experience how surfaces respond to changes in the direction and intensity of illumination.6 As literary historian Katarzyna Anna Kapitan has recently shown, shifting research agendas have prompted the manuscript community to reconsider which metadata to produce and share.7 In addition to these new forms of digital surrogates, researchers have begun to develop new methods in response to the creation of institutional and multi-institutional databases, innovations in computer vision, and their own increasing digital literacy.8 As researchers grapple with these new resources and approaches, techniques that help them understand large datasets will become increasingly useful—both to detect macroscale patterns in manuscript groups and to contextualize individual objects. As the number of features to compare across objects grows, computational methods become essential. Visualizing two features using simple graphical devices like a scatterplot is relatively intuitive, but intuition quickly fails as we increase the number of variables. For example, we can easily graph a comparison of page height and the number of lines per page in two dimensions, and even add a third dimension for the number of folios in a manuscript. But if we add a fourth variable, such as the number of scribes, visualizations grounded in spatial intuition start to break down. Similarity metrics become most helpful at this point because they have the 5 See, for example, the results of digitization-based studies in Benjamin Albritton, Georgia Henley, and Elaine M. Treharne, eds., Medieval Manuscripts in the Digital Age (London: Routledge, 2021). 6 Bill Endres, Digitizing Medieval Manuscripts: The St. Chad Gospels, Materiality, Recoveries, and Representation in 2D and 3D (Leeds: Arc Humanities Press, 2019); Jana Dambrogio, Amanda Ghassaei, Daniel Starza Smith, Holly Jackson, Martin L. Demaine, Graham Davis, David Mills, Rebekah Ahrendt, Nadine Akkerman, David van der Linden, and Erik D. Demaine, “Unlocking History through Automated Virtual Unfolding of Sealed Documents Imaged by X-Ray Microtomography,” Nature Communications 12, no. 1 (2 March 2021): 1184. 7 Katarzyna Anna Kapitan, “Perspectives on Digital Catalogs and Textual Networks of Old Norse Literature,” Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies 6, no. 1 (2021): 93–95. 8 L. W. Cornelis van Lit, Among Digitized Manuscripts: Philology, Codicology, Paleography in a Digital World (Leiden: Brill, 2019), 227. 236 | Manuscript Studies capacity to compress the information from multiple variables or features into a single metric (a concept known as dimensionality reduction). We believe that similarity metrics will form a valuable part of the conceptual tool kit that researchers bring to any comparative study of manuscripts. Similarity by the Numbers: Basics of Similarity Metrics All similarity metrics compare pairs of objects or features. Calculating the similarity metrics for an entire set of objects therefore entails comparing every possible pair of objects in the set. One of the most straightforward ways to quantify similarity is in terms of distance. Distance metrics build on spatial reasoning: they quantify the distance between two points in multidimensional space. For a simple measurement between two points on a grid, the Pythagorean theorem defines the hypotenuse, or shortest path between them. We usually imagine the two axes of this grid as representing two dimensions of a plane in Euclidean space, but there is no reason they cannot represent something else. Each axis of the grid could instead stand for a quantitative feature of a manuscript: for example, the number of folios or distinct watermarks it contains. Likewise, there is no need to limit these dimensions to the usual two or three dimensions familiar from spatial reasoning, although the equations to calculate distance must change accordingly. We might position points along seven or twenty different axes or dimensions, each of which represents a different feature we deem relevant to our research question. Similarity metrics may simply invert distance metrics (observations with a low distance would have a high similarity index, and vice versa), or they may adopt other approaches to quantifying overlap that violate some of the axioms or requirements for distance metrics. Over the past two decades, a vibrant debate has emerged around the similarities and differences between physical and cultural distances.9 9 Ted Underwood and Richard Jean So, “Can We Map Culture?,” Journal of Cultural Analytics 6, no. 3 (June 17, 2021): 34. Brey and Doyle, Beyond Comparative Analysis | 237 There are as many different equations for calculating similarity and distance as there are ways of conceptualizing these terms.10 Some distance metrics consider sequence, measuring how many additions, subtractions, or substitutions are necessary to move from one string of features to another. This approach, known as edit distance, is employed by specialists in genetics and stemmatic analysis.11 Some applications of edit distance in manuscript research could include comparing the order of psalms in books of hours (one of the key methods for determining liturgical use) or the collation of different manuscripts. Edit distance would be useful for comparing the lists of feast days in the liturgical calendars that frequently open Christian manuscripts, which could, as Aaron Macks has proposed, automatically group and localize manuscripts.12 Other distance metrics ignore sequence entirely, comparing two sets of features regardless of the order in which they occur, as in the iconographic comparison case study we describe below. Data that simply record a presence or absence represent another special case for which researchers have developed tailored distance metrics. Similarity measures known as Q measures compare objects based on their associated features, while R measures assess the dependence between these associated features or descriptors.13 Take as an example a paleographic dataset that catalogs folios as the main objects and letterforms as associated features. A Q measure might quantify the similarity of two folios (objects) based on the presence or absence of specific letterforms 10 For recent surveys of common metrics, see Sung-Hyuk Cha, “Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions,” International Journal of Mathematical Models and Methods in Applied Sciences 1, no. 4 (2007): 300–307; Louis Legendre and Pierre Legendre, Numerical Ecology (Boston: Elsevier, 2012), 265–335; Habiba, Jan C. Athenstädt, Barbara J. Mills, and Ulrik Brandes, “Social Networks and Similarity of Site Assemblages,” Journal of Archaeological Science 92 (April 2018): 63–72. 11 Joris van Zundert, Armin Hoenen, Sara Manafzadeh, Yannick M. Staedler, Teemu Roos, and Jean-Baptiste Guillaumin, “Computational Methods and Tools,” in Handbook of Stemmatology: History, Methodology, Digital Approaches, ed. Philipp Roelli (Berlin: De Gruyter, 2020), 308–9, 330, 343–44. 12 Aaron Macks, “Data Sanctorum: The Corpus Kalendarium Database of Devotional Calendars,” Manuscript Studies 6, no. 2 (Fall 2021): 348. 13 Legendre and Legendre, Numerical Ecology, 266. 238 | Manuscript Studies (features). An R measure of the same dataset would quantify how “close” or “distant” two letterforms (features) are based on the frequency of their co-occurrence on various folios. All of the concrete examples of similarity measurement we discuss in this article fall into the category of Q measures. Although these approaches may feel alien to some manuscript researchers, there is in fact a broad swathe of manuscript scholarship that deals with similarity in one form or another. In the following section, we offer a brief survey of several areas in which informal and formal analyses of similarity have shaped manuscript research, along with some guidelines for preparing observations about manuscripts with the goal of calculating similarity metrics. Similarity Measurement and Manuscript Studies Manuscript scholars use the term “similarity” (and its inverse, “difference,” or “dissimilarity”) in sophisticated but sometimes ambiguous ways. Manuscripts can be similar in terms of texts, iconography, script, style, layout, materials, or codicological makeup.14 Similarity might also refer to the contexts in which books were made or used: to define genres, model/copy relationships, shared structures of production or patronage, overlapping chains of provenance, or common functions. These assessments of similarity, often qualitative, work well for the narrow scope of most historical studies, such as analyses of specific texts or genres or the work of a given artist or 14 Comparative paleographic studies abound, such as François Déroche, The Abbasid Tradition: Qurʾans of the 8th to 10th Centuries (New York: Nour Foundation in association with Azimuth Editions and Oxford University Press, 1992). For examples of comparative studies focused on materials, see Abigail Quandt, “The Purple Codices: A Report on Current and Future Research and Conservation Projects,” Care and Conservation of Manuscripts 16 (2018): 121–52; Maurizio Aceto et al., “Mythic Dyes or Mythic Colour? New Insight into the Use of Purple Dyes on Codices,” Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 215 (May 15, 2019): 133–41. For an argument in favor of comparative codicological analyses, see Malachi Beit-Arié, “Why Comparative Codicology?,” Gazette du livre médiéval 23, no. 1 (1993): 1–5. Brey and Doyle, Beyond Comparative Analysis | 239 [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library workshop.15 The relatively small number of manuscripts typically considered in these studies encourages scholars to read closely and consider multiple types of similarity simultaneously. In studies with larger scopes, manuscript scholars have long used structured data to facilitate comparison, focusing their analyses on a limited set of features for each object. These may take the form of standardized catalogs, tables, or vocabularies. Günther Haseloff’s 1938 Die Psalterillustration im 13. Jahrhundert includes twenty tables of psalter iconography; seventy-five years later, Alison Stones devoted nearly a whole volume of her Gothic Manuscripts catalog to iconographic tables for a variety of texts.16 Feature-based comparative analyses also form the basis for connoisseurial groupings or attributions.17 Detailed paleographic studies like François Deroche and Albert Derolez’s categorizations of early Abbasid and Gothic scripts, respectively, produced 15 For example, Persis Berlekamp, Wonder, Image, and Cosmos in Medieval Islam (New Haven, CT: Yale University Press, 2011); Kathryn M. Rudy, Piety in Pieces (Cambridge: Open Book Publishers, 2016), 223, on the refurbishment of Netherlandish manuscripts; Benjamin Anderson, Cosmos and Community in Early Medieval Art (New Haven, CT: Yale University Press, 2017), 77–79, on Carolingian cosmological manuscripts; William Noel, “The Utrecht Psalter in England: Continuity and Experiment,” in The Utrecht Psalter in Medieval Art: Picturing the Psalms of David, ed. Koert van der Horst, William Noel, and Wilhelmina C. M. Wüstefeld (Tuurdijk, Netherlands: HES, 1996), 120–65; Richard H. Rouse and Mary A. Rouse, Manuscripts and Their Makers: Commercial Book Producers in Medieval Paris, 1200–1500, 2 vols. (Turnhout: Harvey Miller, 2000); Richard H. Rouse and Mary A. Rouse, Renaissance Illuminators in Paris: Artists & Artisans 1500–1715 (London: Harvey Miller, 2019); Kathryn A. Smith, Art, Identity and Devotion in Fourteenth-Century England: Three Women and Their Books of Hours (London: British Library, 2003); Michael A. Michael, “Oxford, Cambridge and London: Towards a Theory for ‘Grouping’ Gothic Manuscripts,” Burlington Magazine 130, no. 1019 (1988): 107–15, on groups among English manuscripts. 16 Günther Haseloff, Die Psalterillustration im 13. Jahrhundert: Studien zur Geschichte der Buchmalerei in England, Frankreich, und den Niederlanden (Kiel: [n.p.], 1938), 100–123; Alison Stones, Gothic Manuscripts 1260–1320, 4 vols. (London: Harvey Miller, 2013–2014). 17 For example, Georg Vitzthum, Die Pariser Miniaturmalerei von der Zeit des hl. Ludwig bis zu Philipp von Valois und ihr Verhältnis zur Malerei in Nordwesteuropa (Leipzig: Quelle & Meyer, 1907), 88–111 (“Die Gruppe des ‘Romans de la Poire’”), to name just one example; Robert Branner, Manuscript Painting in Paris during the Reign of Saint Louis: A Study of Styles (Berkeley: University of California Press, 1977). 240 | Manuscript Studies controlled vocabularies for describing handwriting in manuscript books.18 Indices, databases, and vocabularies developed at research centers such as the Index of Medieval Art (since 1919) and the Institut de recherche et d’histoire des textes (founded 1937) have also added to the wealth of structured data available for the study of manuscripts.19 Computational approaches to similarity build on these comparative-analytical traditions. The many available tools for assessing similarity among large sets of objects or texts with complex traits (such as manuscripts) empower humanistic researchers to use structured data on even larger scales. The timely mathematical calculation of similarity at such scales requires computers and therefore also computer-readable data. Some of the types of manuscript similarity discussed above are more amenable to quantification than others, but for the purpose of quantifying similarity, computer-readable data need not be restricted to numbers. Qualities that can be classified or cataloged—such as iconography, paleography, textual contents, artistic attributions, watermarks, or pigments—lend themselves well to calculating similarity, especially if they are cataloged with controlled vocabularies. Provenance involves discrete chains of ownership, which are remarkably well suited to similarity measurement if they can be reconstructed. Qualities that are more difficult to quantify include artistic style, although computer scientists have begun developing approaches that analyze brushstrokes.20 For this article, however, we will limit our discussion to features that manuscript 18 Déroche, The Abbasid Tradition; Albert Derolez, The Paleography of Gothic Manuscript Books: From the Twelfth to the Early Sixteenth Century (Cambridge: Cambridge University Press, 2003); Derolez, “Possibilités et limites d’une paléographie quantitative,” in Hommages à Carl Deroux, ed. Pol Defosse, vol. 5 (Brussels: Latomus, 2004), 98–102. 19 Colum P. Hourihane, “Classifying Subject Matter in Medieval Art: The Index of Christian Art at Princeton University,” Visual Resources 30, no. 3 (July 3, 2014): 255–62; Louis Holtz, “Les premières années de l’Institut de recherche et d’histoire des textes,” La revue pour l’histoire du CNRS, no. 2 (May 5, 2000), https://doi.org/10.4000/histoire-cnrs.2742. 20 Fang Ji, Michael S. McMaster, Samuel Schwab, Gundeep Singh, Lauryn N. Smith, Shishir Adhikari, Márcio O’Dwyer, Farah Sayed, Anthony Ingrisano, Dean Yoder, et al., “Discerning the Painter’s Hand: Machine Learning on Surface Topography,” Heritage Science 9, no. 1 (November 12, 2021): 152. Brey and Doyle, Beyond Comparative Analysis | 241 specialists have traditionally categorized or quantified, to highlight existing manuscript data as a foundation for new digital analysis. Manuscript researchers may easily tackle small-scale analyses of this type without computers, but as the number of items to compare increases linearly, the number of possible comparisons increases exponentially. Analyzing these large numbers of combinations manually would take a prohibitively long time, but computers can do so in a matter of seconds or, in extreme cases, hours. Quantitative approaches to similarity allow manuscript researchers to study large-scale trends difficult or impossible to discern in smaller groups. Scholars wishing to collect data for similarity analysis should organize their observations on a spreadsheet, placing the basic objects under investigation in separate rows and the features they wish to compare in separate columns. For example, a project measuring similarity between a set of manuscripts should have a row for each manuscript and columns for a range of manuscript properties: number of pages, quires, or lines per page; dates of production or change in ownership; geographic coordinates; patrons’ or expected owners’ names; textual or visual contents; presence or absence of illumination or annotations; materials; and more. Some similarity metrics, such as Gower’s distance, can analyze both quantitative and qualitative features, so columnar data can be either numerical or categorical.21 As long as the columns represent comparable features, and as long as some repetition (similarity) exists within them, it will be possible to measure similarity. For the sake of computational analysis, traditional catalogs and published tables may be considered “legacy data,” that is, information stored in a format that is difficult to process with computers. Beyond the initial step of digitizing published data (or other legacy data), as in the project we describe below, researchers may have to standardize and reconfigure their data to facilitate similarity analysis. Some datasets will be too varied or too small to meaningfully apply similarity metrics. Since similarity metrics measure the repetition of qualities or attributes within a group of objects or observations, a set of objects that lacks 21 J. C. Gower, “A General Coefficient of Similarity and Some of Its Properties,” Biometrics 27, no. 4 (1971): 857–71. 242 | Manuscript Studies repeated features will appear uniformly dissimilar. As a consequence, although manuscript similarity studies do not require very large datasets (much less “big data”), the number of objects included in such studies may still feel large relative to the conventions of the field of manuscript research. Calculating these metrics may be helpful for producing accurate, systematic comparisons based on the quantitative and categorical features of as few as three objects. Additional caution is merited, however, when considering whether arguments based on such small samples accurately represent the larger corpus or population from which they are drawn. Tools for evaluating the robustness of a quantitative method given the size of a dataset can help to determine whether computational analysis is likely to yield meaningful results.22 Although a comprehensive comparison that includes numerous, varied features like the one described above may be tempting, researchers should focus selectively on limited comparisons of features (or variables) that answer research questions. That is to say, the features from which researchers calculate similarity metrics should be the closest possible proxies to the phenomenon they want to understand. A researcher answering questions about textual communities will choose different features to analyze than a researcher interested in patronage or workshop practices. If possible, the features under consideration should have direct, causal relationships with the subject of the research question. For instance, a research question about workshop practices might consider page-ruling techniques that vary from workshop to workshop 22 In addition to tests of statistical power, which can yield a minimum viable sample size, researchers may be interested in two particular approaches. One, known as simulation testing, is to simulate a “perfect” dataset and test a method on smaller and smaller samples to determine the point at which a given method consistently yields incorrect results. The other approach tests the robustness of a method by applying it to smaller and smaller samples from an existing dataset. For the theory of simulation testing, see Tim P. Morris, Ian R. White, and Michael J. Crowther, “Using Simulation Studies to Evaluate Statistical Methods,” Statistics in Medicine 38, no. 11 (2019): 2074–102. For applications in archaeology and digital humanities, see Luce Prignano, Ignacio Morer, and Albert Diaz-Guilera, “Wiring the Past: A Network Science Perspective on the Challenge of Archeological Similarity Networks,” Frontiers in Digital Humanities 4 (2017), https://doi.org/10.3389/fdigh.2017.00013; Yann C. Ryan and Sebastian E. Ahnert, “The Measure of the Archive: The Robustness of Network Analysis in Early Modern Correspondence,” Journal of Cultural Analytics 6, no. 3 (July 21, 2021), https://doi. org/10.22148/001c.25943. Brey and Doyle, Beyond Comparative Analysis | 243 but exclude data about page size (which is typically determined by supply chains and production techniques shared by multiple workshops). Including features in an analysis that are unrelated to the research question may seem harmless, but more data is not always better. If a feature has no logical reason to be connected to a research question, including it in similarity computations will introduce noise into the results, even to the extent of obscuring real patterns. Instead of lumping all these features into a single similarity metric, researchers may wish to keep them separate and perform two comparative analyses to identify where their phenomenon of interest diverges from other trends. Once researchers have identified a research question and collected the necessary observations in a computationally tractable format, they can use a programming language (or collaborate with someone who can) to calculate similarity metrics quickly and systematically.23 With such varied approaches to conceptualizing similarity, manuscript researchers can learn from other disciplines, many of which have employed similarity metrics for decades. In the following section, we introduce a brief overview of some salient methodological developments and debates. Lessons from Other Fields: Applications and Limitations of Similarity Measurement Researchers across disciplines continue to investigate different approaches to quantifying similarity, as well as to debate how the resulting values should be used to answer questions. One area of continuing research focuses on how resistant various metrics are to errors introduced by analyzing just a sample of a larger population.24 Researchers in biology and chemistry have recently 23 Many programming languages already feature implementations of the necessary equations for similarity measurement. See Ladd, “Understanding and Using Common Similarity Measures for Text Analysis.” 24 Stephen A. Bloom, “Similarity Indices in Community Studies: Potential Pitfalls,” Marine Ecology Progress Series 5, no. 2 (1981): 125–28; Henk Wolda, “Similarity Indices, Sample Size and Diversity,” Oecologia 50, no. 3 (1981): 296–302; Anne Chao, Robin L. Chazdon, Robert K. 244 | Manuscript Studies developed methods for quantifying the similarities of larger sets of data that may share entities and features.25 Theoretical advances in understanding the mathematical relationships between similarity indices and distance metrics may further revolutionize the field.26 Critics of similarity metrics have demonstrated they may not yield clear results in practice because they discard or compress information that other methods preserve.27 Manuscript researchers should be aware of these other uses for similarity metrics and their limitations. One of the most relevant applications is in archaeology, where similarity metrics are now a standard methodological tool for comparing assemblages of artifacts excavated at multiple sites. Pioneering publications in the 1950s and 1960s prompted widespread adoption of similarity metrics (first referred to as indices or coefficients of “agreement”) to study cultural data.28 Over the past two decades, archaeologists have embraced similarity metrics as the basis for reconstructing cultural and socioeconomic networks between sites of human activity.29 The primary evidence for such analyses is usually the Colwell, and Tsung-Jen Shen, “Abundance-Based Similarity Indices and Their Estimation When There Are Unseen Species in Samples,” Biometrics 62, no. 2 (2006): 361–71; Prignano, Morer, and Diaz-Guilera, “Wiring the Past.” 25 Anne Chao, Lou Jost, S. C. Chiang, Y.-H. Jiang, and Robin L. Chazdon, “A Two-Stage Probabilistic Approach to Multiple-Community Similarity Indices,” Biometrics 64, no. 4 (2008): 1178–86; Ulf G. Indahl, Tormod Næs, and Kristian Hovde Liland, “A Similarity Index for Comparing Coupled Matrices,” Journal of Chemometrics 32, no. 10 (2018): e3049. 26 Ondřej Rozinek and Jan Mareš, “The Duality of Similarity and Metric Spaces,” Applied Sciences 11, no. 4 (2021): 1910. 27 J. W. Johnston, Similarity Indices II: The Power of Goodall’s Significance Test for the Simple Matching Coefficient (Richland, WA: Battelle Pacific Northwest Laboratories, December 1976); David I. Warton, Stephen T. Wright, and Yi Wang, “Distance-Based Multivariate Analyses Confound Location and Dispersion Effects,” Methods in Ecology and Evolution 3, no. 1 (2012): 89–101. 28 W. S. Robinson, “A Method for Chronologically Ordering Archaeological Deposits,” American Antiquity 16, no. 4 (1951): 293–301; George W. Brainerd, “The Place of Chronological Ordering in Archaeological Analysis,” American Antiquity 16, no. 4 (1951): 301–13; George L. Cowgill, “Archaeological Applications of Factor, Cluster, and Proximity Analysis,” American Antiquity 33, no. 3 (1968): 367–75. 29 Per Östborn and Henrik Gerding, “Network Analysis of Archaeological Data: A Systematic Approach,” Journal of Archaeological Science 46 (June 2014): 75–88; Anna Collar, Fiona Coward, Brey and Doyle, Beyond Comparative Analysis | 245 composition of ceramic assemblages discovered at various sites, but some researchers have focused on the physical properties of fired bricks or iconographic patterns in shell art.30 The relationship between similarities in material culture and other forms of social, political, or economic interaction remains undertheorized, leading archaeologist Matthew Peeples to advocate that “archaeologists and ethnographers [need] to more directly collaborate on projects explicitly focused on tracking how formally defined social networks (as reckoned by people themselves) relate to patterns of material similarity, production, and consumption at various scales.”31 Manuscript studies likewise foregrounds the intersection of material culture and human networks, so developments in this area of archaeological research may have significant ramifications for the field. Researchers in fields beyond manuscript studies have also investigated the limitations of similarity measurement. Archaeologists caution that reshaping or omitting information to fit the constraints of a data structure entails a sacrifice of qualitative complexities.32 Manuscript researchers may therefore wish to adopt these approaches as a complement to, rather than a replacement for, existing forms of description and analysis. In the 1970s, as ecologists began adopting similarity metrics to quantify environmental degradation, it became clear that some tests based on similarity metrics fail to discern whether two samples of different species come from the same Tom Brughmans, and Barbara J. Mills, “Networks in Archaeology: Phenomena, Abstraction, Representation,” Journal of Archaeological Method and Theory 22, no. 1 (March 2015): 2–3; Barbara J. Mills, “Social Network Analysis in Archaeology,” Annual Review of Anthropology 46, no. 1 (2017): 387. 30 Per Östborn and Henrik Gerding, “The Diffusion of Fired Bricks in Hellenistic Europe: A Similarity Network Analysis,” Journal of Archaeological Method and Theory 22, no. 1 (2015): 306–44; Jacob Lulewicz and Adam B. Coker, “The Structure of the Mississippian World: A Social Network Approach to the Organization of Sociopolitical Interactions,” Journal of Anthropological Archaeology 50 (June 2018): 113–27. 31 Matthew A. Peeples, “Finding a Place for Networks in Archaeology,” Journal of Archaeological Research 27, no. 4 (December 2019): 477–78, 482. 32 Piraye Hacigüzeller, James Stuart Taylor, and Sara Perry, “On the Emerging Supremacy of Structured Digital Data in Archaeology: A Preliminary Assessment of Information, Knowledge and Wisdom Left Behind,” Open Archaeology 7, no. 1 (2021): 1710–11. 246 | Manuscript Studies [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library underlying ecological distribution.33 Since then, researchers have proposed new ways to approach this problem, confronting the confounding effects of spatial and temporal trends that ecologists, archaeologists, and historians of manuscripts all encounter.34 These methods also merit consideration, although they are beyond the scope of this article. This cursory overview has introduced just a small fraction of the developments in the usage of similarity metrics. Manuscript researchers who delve into similarity metrics would benefit from conversations and collaborations with experts in fields like biostatistics, paleontology, and archaeology and from engaging with the substantial body of research on the topic. Critiques and alternatives may indeed lead them to refine their methods. Depending on the kind of contextual records that survive, similarity-based approaches to manuscripts may also offer theoretical insights of interest to those in adjacent fields like archaeology. Case Study: Similarity of Marginal Iconography in Medieval European Manuscripts, 1250–1350 In the remainder of this essay, we discuss examples from an ongoing study to illustrate some ways to apply similarity metrics to a concrete dataset. Our project investigates patterns within the fashion for figural decoration in manuscript margins, a phenomenon that emerged in the regions around the English Channel in the mid-thirteenth century, by analyzing iconographic similarity in the marginal images in later medieval manuscripts (fig. 1). We work with legacy data from Lilian M. C. Randall’s Images in the Margins of Gothic Manuscripts, an index describing 13,234 images in 237 manuscripts, most of which were produced in France, Flanders, and England between 1250 and 1350.35 As part of the Manuscript Connections project, our larger, 33 J. W. Johnston, Similarity Indices I: What Do They Measure? (Richland, WA: Battelle Pacific Northwest Laboratories, November 1976); Johnston, Similarity Indices II. 34 See Legendre and Legendre, Numerical Ecology, 17–21. 35 Lilian M. C. Randall, Images in the Margins of Gothic Manuscripts (Berkeley: University of California Press, 1966). For further research on marginal illumination, see Michael Camille, Brey and Doyle, Beyond Comparative Analysis | 247 Figure 1. Flies in the margins of three manuscripts, one example of iconographic overlap. Left: A man attacks a fly with a spear. The Gorleston Psalter, ca. 1310–24, London, British Library, Add. MS 49622, fol. 7v. Center: Two flies. The Maastricht Hours, ca. 1310–20(?). London, British Library, Stowe MS 17, fol. 64r. These two images produced with permission from the © British Library Board. Right: A fly pursued by a swallow. The Rothschild Canticles, ca. 1295–1300. New Haven, CT, Yale University, Beinecke Rare Book and Manuscript Library, MS 404, fol. 157r. ongoing study of illuminated manuscripts utilizing computational techniques, we use similarity metrics and network analysis together to investigate similarity in marginal iconography across manuscripts.36 Examples from this project demonstrate different approaches to quantifying similarity and model the types of questions that similarity metrics can help researchers answer. Image on the Edge: The Margins of Medieval Art (Cambridge, MA: Harvard University Press, 1992); Lucy Freeman Sandler, “The Study of Marginal Imagery: Past, Present, and Future,” Studies in Iconography 18 (1997): 1–49; Laura Kendrick, “Making Sense of Marginalized Images in Manuscripts and Religious Architecture,” in A Companion to Medieval Art: Romanesque and Gothic in Northern Europe, ed. Conrad Rudolph (Oxford: Blackwell, 2006), 274–94; Jean Wirth, Les marges à drôleries des manuscrits gothiques (1250–1350) (Geneva: Droz, 2008); Kathryn A. Smith, “Margin,” in “Medieval Art History Today: Critical Terms,” ed. Nina Rowe, special issue, Studies in Iconography 33 (2012): 29–44. 36 Manuscript Connections, http://manuscriptconnections.org. This project has benefited immensely by participation in the 2019–2021 Getty Advanced Workshop for Network Analysis + Digital Art History. A project description from 2018 can be found at https://sites.haa.pitt. edu/na-dah/. 248 | Manuscript Studies Figure 2. Page 100 from Randall’s Images in the Margins with sample entries color-coded based on their content. Dark red highlights the main theme, light red the subtheme, dark blue the manuscript identifier, and light blue the location within the manuscript. We elected to use Randall’s indexing system largely unchanged as the basis for our data, in an experiment to see whether it could yield meaningful quantitative results without revision.37 Randall’s index lists iconographic themes in alphabetical order, dividing them into groups (which we call “themes” or “main themes”) based on the key actors or objects represented, and subgroups (which we call “subthemes”), often based on the relationship or action in which actors are engaged. Randall’s main themes frequently occur in multiple manuscripts and occasionally even occur multiple times 37 Some of Randall’s themes are more similar than others, although this is arguably true of any system of iconographic categorization. Alternative systems like Iconclass offer affordances such as a deep hierarchy of categories that permit researchers to automatically move from more granular to more general characterizations of images. They provide a fruitful starting point for researchers generating iconographic datasets from scratch. Researchers working with other types of manuscript data may find existing systems for systematically recording categorical and quantitative features meet their needs, or they may need to create their own. Brey and Doyle, Beyond Comparative Analysis | 249 Figure 3. A screenshot of part of the spreadsheet produced by digitizing and subdividing the entries in Randall’s index of marginalia. Entries with no subtheme contain the code NA (not applicable) to remove potential ambiguity about why the cell lacks a value. Only the main theme and manuscript identifier columns were used to create the manuscript-theme matrix from which similarity metrics were calculated. within a single manuscript, while her subthemes are often limited to a single instance across her corpus (see fig. 2 for a sample page). For example, marginalia that Randall categorized under the theme “Fly” appear in three manuscripts: once without a qualifying subtheme, once with the subtheme “and swallow,” and once with the subtheme “attacked by man with spear.” Although we included these subthemes in our digitization of the data, our analysis throughout this article is based solely on Randall’s main themes. Translating Randall’s index into a format amenable to computational analysis involved several steps. First, we scanned the text and used Adobe Acrobat to perform automatic optical character recognition (OCR). Then we entered each instance of marginalia into a separate row in a spreadsheet (fig. 3). We divided the information in each entry into several columns, which included the major and minor theme for each instance of marginalia, the identifier for the manuscript in which it occurs, and the folio on which it appears. Because the OCR introduced typographic errors into the text, we then used the free, open-source software tool OpenRefine to identify and correct these errors. Next, we used a short script written in the statistical programming language R to transform this spreadsheet into a matrix in which each row represents a manuscript (237 manuscripts/rows) and each 250 | Manuscript Studies column represents one of the major iconographic themes (2,002 distinct themes/columns). The values in the cells of this manuscript-theme matrix consist of the total number of instances of each major theme in a given manuscript. Because many themes occur in only a few manuscripts, like the “Fly” example above, and most manuscripts contain only a few instances of marginalia, just 8,184, or 1.7 percent, of the cells in this manuscript-theme matrix contain nonzero values indicating the presence of one or more instances of a theme in a given manuscript.38 Finally, we used open-source packages (prewritten sets of functions) to calculate the manuscript-manuscript similarity for each possible pair of manuscripts with a variety of different similarity metrics (27,966 total pairwise comparisons).39 As we show in the sections that follow, researchers can use these comparisons to make not just observations but also arguments on the basis of similarity between manuscripts. Our experience exemplifies the opportunities and challenges that arise when using legacy data. Randall’s publication defined the parameters for the study of marginal iconography in the later twentieth century and remains an essential resource for researchers today. The wealth of structured data it provides presents an extraordinary opportunity for computer-assisted scholarship. However, Randall’s index is organized for human readers, with headings of varying specificity and redundant listings. The index is also necessarily incomplete, constrained by exigencies of time and print media. Randall explicitly prioritized depictions of figures in action, although the index also contains numerous actionless figures.40 Even within these parameters, the 38 Both the number of themes in each manuscript and the number of marginalia in each theme have severely right-skewed distributions, perhaps even logarithmic distributions, a subject that merits further investigation beyond the scope of this article. 39 The basic R language includes a function called dist() that can calculate several different types of distance metrics. Additional distance/similarity metrics may be calculated using imported packages available in the CRAN (Comprehensive R Archive Network) repository. See, for example, Mark van der Loo and David Turner, “Gower: Gower’s Distance,” accessed 2 February 2023, https://CRAN.R-project.org/package=gower. It is also possible to write custom functions to calculate distance or similarity metrics. 40 “With these considerations in mind, the present iconographic selection consists of scenes depicting humans, animals, or hybrids in some sort of activity. These constitute the essence Brey and Doyle, Beyond Comparative Analysis | 251 index could not be comprehensive: in a survey of a random subset of twenty manuscripts, we found that, on average, Randall indexed about 36 percent of the figural marginalia that we observed in these manuscripts. Given the substantial amount of data the index contains, it may provide an adequate sample to reflect the larger population of marginalia that we seek to analyze.41 Still, we must understand that our findings do not represent a direct reflection of marginal iconography but rather estimates based on one researcher’s descriptions. Accordingly, we take care at each stage to check our computational findings against recent publications and our own observations of manuscript images whenever possible. Choosing and Comparing Similarity Metrics Our experience with the marginal iconography data indicated that choosing a similarity metric is not just a technical but an interpretive decision, entailing careful consideration of the nature of the data, the research question, and the contextualization of individual comparisons. Researchers should experiment with different similarity metrics to find one that fits their data and understanding of similarity. We tested several metrics to determine which could identify similarities meaningful to our research questions before ultimately selecting the Gower distance metric. of marginal subject matter and provide the most valuable insight into contemporary mores and ideas.” Randall, Images in the Margins, 15. 41 The extent to which surviving iconographic themes reflect the full set of images that must have existed during this period is yet a separate question, but one that is beyond the scope of this article. For a sophisticated discussion of this problem as it applies to the preservation of trecento songs in manuscripts dated 1380–1415, see Michael Scott Cuthbert, “Trecento Fragments and Polyphony beyond the Codex” (PhD diss., Harvard University, 2006), 44–86; Michael Scott Cuthbert, “Monks, Manuscripts, and Other Peer-to-Peer Song Sharing Networks of the Middle Ages,” in Cantus Scriptus: Technologies of Medieval Song: Proceedings of the 3rd Annual Lawrence J. Schoenberg Symposium on Manuscript Studies in the Digital Age, November 20–21, 2010, ed. Lynn Ransom and Emma Dillon (Piscataway, NJ: Gorgias Press, 2012), 110–22. 252 | Manuscript Studies The Type of Similarity Matters Similarity metrics can be divided, broadly speaking, into different families based on how they interpret overlaps in data.42 Some of the simplest focus on absolute overlap. One example is the intersection metric, which simply adds up the minimum number of shared traits in a given comparison (e.g., the minimum number of images for each theme shared between two manuscripts).43 This approach contrasts with others that produce proportional or relative values, like the Brainerd-Robinson similarity metric (see appendix for a comparison of calculations using the intersection metric, the BrainerdRobinson similarity metric, and Gower’s distance). Developed to analyze archaeological assemblages, this metric compares the proportions of the overall set of features (the relative frequency distributions) shared by any particular pair of objects.44 Other metrics take an entirely different approach— neither absolute nor proportional. Cosine similarity, widely used to find similarities between textual documents, positions objects based on the features they contain, then measures their orientation (that is, their angle) relative to one another within an abstract, multidimensional space.45 The Brainerd-Robinson and cosine metrics are specifically designed to ignore the absolute magnitude of the feature sets they analyze, allowing them to assess the similarity of objects that have very different quantities of features. Researchers should consider with care which kind of similarity metric best fits their questions and data. Testing these options on our data, we found that metrics that ignored the absolute magnitude (the total number of shared images) ran counter to our art-historical understanding that the presence of abundant marginalia 42 For a more comprehensive overview of similarity metrics, see above, n10. 43 This metric is so basic it is not even discussed in most overviews of similarity metrics, but it is the basis for the more complex Steinhaus metric and Bray and Curtis’s percentage difference equation. Legendre and Legendre, Numerical Ecology, 285, 311. 44 Brainerd, “The Place of Chronological Ordering in Archaeological Analysis”; Robinson, “A Method for Chronologically Ordering Archaeological Deposits”; Habiba et al., “Social Networks and Similarity of Site Assemblages,” 64. 45 Han, Kamber, and Pei, Data Mining, 77; Legendre and Legendre, Numerical Ecology, 301–2 (related approaches). [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library Brey and Doyle, Beyond Comparative Analysis | 253 itself constitutes a meaningful similarity: books with abundant marginalia are often more similar to each other than they are to books with few images. Both cosine and Brainerd-Robinson metrics overstated the similarity of sparsely illuminated manuscripts with other books. For instance, the English Vaux Psalter (London, Lambeth Palace Library, MS 233) and a Netherlandish psalter (London, British Library, Yates Thompson MS 42) produced one of the highest cosine similarity metrics in our dataset. The two manuscripts share only one image in common, but since this is the sole image recorded for the Netherlandish psalter in our data, the cosine metric interpreted it as highly significant. As art historians, when we considered the single image from the Netherlandish psalter alongside the fifty-nine rich and varied illuminations recorded from the Vaux Psalter, the books appeared to us very different. Although the Brainerd-Robinson and cosine metrics produce results that differ from our intuitive understanding of our data and our analytical goals, they may still be useful for other types of manuscript research when substantial differences in the magnitudes of features obscure real parallels. After ruling out certain types of similarity metrics, it can be instructive to compare the remaining options to learn more about how they work and what conception of similarity underpins their algorithm (see appendix). Our comparisons of the intersection metric, discussed above, and Gower’s distance metric, led us ultimately to conclude that Gower’s metric better reflects our intuitive understanding of similarity between manuscripts. Gower’s metric is similar to the intersection metric, in that it starts with the absolute overlap between two features. It goes a step further by normalizing each feature differently based on whether there is wide variation in its abundance across a dataset.46 One consequence of this normalization is that features that occur consistently impact the final similarity more than features that vary dramatically in quantity. The Gower index also counts matches in the absence of images toward its similarity measurement. Thus, pairs of manuscripts that 46 This normalization or scaling based on the differing ranges of values within each feature accomplishes something similar to other methods such as term frequency–inverse document frequency (TF-IDF) scaling or feature scaling, although each approach produces slightly different results. Gower, “A General Coefficient of Similarity and Some of Its Properties”; Legendre and Legendre, Numerical Ecology, 278–80. 254 | Manuscript Studies both avoid certain popular images (such as “Obscaena,” “Fables,” or “Jesus Christ, life of ”) may also have a positive similarity score, even if they share no images. This property of the Gower metric may produce some counterintuitive results for our data, because researchers typically conceive of comparisons based on the features present in one or both objects but may struggle to account comprehensively for features wholly absent within the comparison. This surprising feature of the Gower metric illustrates the importance of understanding the specific concept of similarity encoded within a given metric. Random Baseline In our experience, similarity metrics are most useful when compared to a baseline. This baseline may represent either a simple random scenario (the approach we adopt here) or the output of a more sophisticated model or simulation for how the data were generated. Because art historians currently lack a mathematical model for artists’ use of iconographic themes in marginalia, our baseline represents an alternative past in which illuminators selected iconographic themes at random.47 Our simulated manuscripts contain the same quantities and frequencies of images as our real manuscripts (between 47 Specifically, we used permutations like those employed in statistical tests where researchers do not wish to make assumptions about the underlying distributions of their data. See David Spiegelhalter, The Art of Statistics: How to Learn from Data (New York: Basic Books, 2021), 261. The use of stochastic simulations as a baseline for interpreting similarity indices was established (and debated) in ecological studies as early as the late 1960s, although it has not been widely adopted in other areas like document retrieval or archaeological assemblage comparisons. David W. Goodall, “A Probabilistic Similarity Index,” Nature 203, no. 4949 (September 1964): 1098; David M. Raup and Rex E. Crick, “Measurement of Faunal Similarity in Paleontology,” Journal of Paleontology 53, no. 5 (1979): 1213–27; James F. Heltshe, “Jackknife Estimate of the Matching Coefficient of Similarity,” Biometrics 44, no. 2 (1988): 447–60. Some researchers have expressed reservations about this approach, perhaps most forcefully articulated in J. W. Johnston’s dismissal of Goodall’s probabilistic index, Similarity Indices I, 51–53. Simulations based on ecological data suggest that stochastic permutation is underpowered as a test to detect significant associations between observations, but it may still be used descriptively. Legendre and Legendre, Numerical Ecology, 294. Brey and Doyle, Beyond Comparative Analysis | 255 1 and 480 images in each manuscript), but we randomly assigned iconographic themes to each manuscript. Imagine iconographic speed dating, where manuscripts remain seated at their tables with anywhere from 1 to 480 chairs, and marginal themes get shuffled randomly among the tables. The random scenarios (known as permutations) preserve both the number of individual images and the frequency of repeated themes recorded in each manuscript. Thus, extending the metaphor, if a given theme was observed five times in a manuscript, five chairs at its table would be assigned as a bloc to a new iconographic theme in each random scenario. In this way, the randomizations reflect the unequal distribution of themes observed within manuscript production. This baseline allowed us to interpret the observed iconographic similarity of a pair of manuscripts depending on whether it is substantially higher or lower than the similarity in our simulated random dataset. Using this approach, we created five thousand simulated datasets and calculated the similarity metrics for each pair of manuscripts.48 Because each of these simulations can produce a different value, we took the mean for each pair of manuscripts to get a sense of an expected or typical value under the assumption of random theme selection. We then subtracted the mean values of these simulations from the observed similarity values to normalize the observed measurements, revealing the degree to which the actual metrics are higher or lower than we would expect to see by chance.49 Of course, since no manuscript illustrator picked themes at random, we would anticipate that 48 The number of permutations is somewhat arbitrary, and the pragmatic approach to determining the necessary number is simply to track the point at which the statistic(s) of interest stabilize. The less variation there is within the observations, the fewer permutations are necessary to produce a sufficiently varied set of simulations. Ecologists, for example, recommend preliminary tests with five hundred to one thousand simulations, with a more stringent requirement of ten thousand simulations for publishable results. Legendre and Legendre, Numerical Ecology, 31. 49 While one can guess whether any given pair of manuscripts will have a high or low Gower distance based solely on their average number of marginalia, this correlation is broken when the metric is normalized by the mean of the randomized simulations. In addition to considering the difference from the mean of the simulated values, researchers may also wish to calculate a z score (divide the difference from the mean by the standard deviation of the simulated distribution) or create a p value (the percent of the simulated values are greater or less than the observed values). We are grateful to John Ladd for suggesting these alternatives. 256 | Manuscript Studies our observed values diverge from these randomized similarity metrics. However, the observed divergence is relatively small: the mean of the random simulations never even reaches 50 percent above or below the observed similarity. In the greater or lesser divergence, we found intriguing patterns of similarity among the manuscripts in our dataset. One important consequence of incorporating these probabilistic simulations is that, when considered solely on the basis of iconographic categories, the actual similarity metrics of manuscripts with few marginalia closely resemble mean values from our random simulation. By contrast, pairs with many marginalia may diverge significantly from the values expected by chance. Because manuscripts in particularly high- or low-similarity pairs tend to have abundant marginalia, these books also tend to be well studied.50 Archaeologists typically deal with this problem of small, “noisy” samples even before calculating similarity metrics, excluding sites below a certain threshold of observations. Manuscript researchers may wish to do the same, as comparable inconsistencies and lacunae characterize manuscript data and archaeological data. 50 The Maastricht Hours (London, British Library, Stowe MS 17), the Breviary of Renaud de Bar (London, British Library, Yates Thompson MS 8), and the breviary portion of the Aspremont-Kievraing Prayer Book (Melbourne, National Gallery of Victoria, inv. 1254-3) are among the most similar manuscript pairs identified by both the intersection and the Gower metrics. Other well-studied books with high intersection similarity include the Gorleston Psalter (London, British Library, Add. MS 49622), the Hours of Jeanne d’Evreux (New York, Metropolitan Museum of Art, Cloisters Collection, 54.1.2), and the Aspremont-Kievraing Psalter (Oxford, Bodleian Library, MS Douce 118). Volumes of the Ghent Psalter (Oxford, Bodleian Library, MS Douce 5) and the Belleville Breviary (Paris, Bibliothèque nationale de France, MS lat. 10483), the Pontifical of Renaud de Bar (Cambridge, Fitzwilliam Museum, MS 298), and both volumes of the Arthurian miscellany attributed to the Dampierre group (Paris, Bibliothèque nationale de France, MS fr. 95; and New Haven, CT, Yale University, Beinecke Rare Book and Manuscript Library, MS 229) have high similarity according to the Gower metric. Bibliography for most of these books can be found in Stones, Gothic Manuscripts. For the others, see Margot McIlwain Nishimura, “The Gorleston Psalter: A Study of the Marginal in the Visual Culture of Fourteenth-Century England” (PhD diss., New York University, 1999); Kyunghee Pyun and Anna D. Russakoff, eds., Jean Pucelle: Innovation and Collaboration in Manuscript Painting (London: Harvey Miller, 2013), especially the essays by Barbara D. Boehm and Pascale Charron. Brey and Doyle, Beyond Comparative Analysis | 257 Different Similarity Metrics in Practice Case studies can demonstrate how different similarity metrics function in practice and clarify how metrics diverge. The Luttrell Psalter (London, British Library, Add. MS 42130; 343 marginal images recorded) and the Taymouth Hours (London, British Library, Yates Thompson MS 13; 279 marginal images recorded) have seventy-one instances of iconographic overlap between them. Both manuscripts contain many marginal images, but, when contextualized against a random baseline, the intersection metric and Gower metric disagree about whether they overlap more or less than expected by chance. According to the intersection metric (.44), these manuscripts are substantially more similar than we would expect: in our five thousand random scenarios, the highest overlap was forty-four images, while the actual observed overlap is significantly higher, at seventy-one. Because Gower’s metric measures distance, lower or negative numbers indicate proximity or similarity, while higher numbers indicate distance or dissimilarity. In contrast to the intersection metric, the adjusted Gower metric (.0009) indicates they approximate values from our random scenarios, based not just on the images they share (uniformly weighted so that less common themes contribute just as much as more common ones) but also on the images each artist excluded. To illustrate the different types of similarity that these two metrics measure, consider a Venn diagram. The intersection metric describes only the overlapping center of a Venn diagram, focusing exclusively on a pair’s shared features. Thus, the high intersection metric for the Luttrell Psalter and the Taymouth Hours reflects shared themes such as images from the life of Christ, the life of the Virgin, and representations of saints. By contrast, the Gower metric incorporates every part of the Venn diagram: the individual circles, their overlap in the center, and everything outside the circles (every iconographic theme in the dataset absent from both manuscripts). As a result, the Gower metric better reflects the stark variation in this manuscript pair’s marginal imagery overall. While the Taymouth Hours presents literary and religious narratives in marginal cycles, the Luttrell Psalter’s much more eclectic imagery includes imaginative scenes of hybrid figures, agricultural 258 | Manuscript Studies labors, courtly pastimes, animals, and more.51 By de-emphasizing a few highly overlapping categories of images, Gower’s metric counterbalances the intersection metric’s tendency to allow a few areas of intense overlap to overshadow broader variations. The Luttrell Psalter and the Taymouth Hours illustrate the implications of Gower’s more comprehensive interpretation of similarity. The metrics also diverge in their measurement of similarity between the Maastricht Hours (London, British Library, Stowe MS 17; 344 marginal images recorded) and the winter volume of the Breviary of Renaud de Bar (London, British Library, Yates Thompson MS 8; 253 marginal images recorded), which share seventy-four images (figs. 4 and 5). In this case, these two manuscripts are considerably more similar than we would expect to see based on the number and types of marginal themes they include. The probabilistic Gower metric (−.0278) is the lowest (most similar) of any pair. The intersection metric (.4846) also reflects greater similarity than we would expect to see in a random dataset; however, it is unexceptional compared to other manuscript pairs in the corpus. From an art-historical perspective, the higher-than-expected similarity of the Maastricht Hours and the Breviary of Renaud de Bar is less surprising than that of the Luttrell Psalter and the Taymouth Hours, given the shared visual emphasis in the former pair on satirical imagery in the margins throughout each book.52 Here again the Gower metric matches art-historical expectations. 51 For thorough analyses of the marginal imagery in these two manuscripts, see Michael Camille, Mirror in Parchment: The Luttrell Psalter and the Making of Medieval England (Chicago: University of Chicago Press, 1998); Kathryn A. Smith, The Taymouth Hours: Stories and the Construction of the Self in Late Medieval England (London: British Library, 2012). 52 For the Maastricht Hours, see Judith Oliver, Gothic Manuscript Illumination in the Diocese of Liège (c. 1250–c. 1330) (Leuven: Peeters, 1988), 1:28–30. For the Breviary of Renaud de Bar, see Patrick M. de Winter, “Une réalisation exceptionnelle d’enlumineurs français et anglais vers 1300: Le bréviaire de Renaud de Bar, évêque de Metz,” in La Lorraine: Études archéologiques, Actes du 103e congrès national des Sociétés savantes (Nancy-Metz, 1978), Section d’archéologie et d’histoire de l’art (Paris: Bibliothèque nationale de France, 1980), 27–62; Kay Davenport, The Bar Books: Manuscripts Illuminated for Renaud de Bar, Bishop of Metz (1303–1316) (Turnhout: Brepols, 2017). Brey and Doyle, Beyond Comparative Analysis | 259 Figure 4. Marginal illumination of a buffeting game (frog in the middle). The Maastricht Hours, ca. 1310–20(?). London, British Library, Stowe MS 17, fol. 142v. © British Library Board. Metrics diverge in their measurement of similarity between the Maastricht Hours and the winter volume of the Breviary of Renaud de Bar (fig. 5), which share seventy-four images. While the Gower metric is more intuitive in these instances, there are other cases where it veers from art-historical understandings of similarity. Including shared absences as well as shared motifs leads Gower distance to ascribe greater-than-expected similarity to manuscripts that have no overlap. 260 | Manuscript Studies [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library Figure 5. Detail of a buffeting game (frog in the middle). Breviary of Renaud de Bar (winter volume), 1302–3. London, British Library, Yates Thompson MS 8, fol. 222v. © British Library Board. The Winter Breviary of Renaud de Bar (fig. 5) and a copy of a literary work the Romance of the Rose, by Guillaume de Lorris and Jean de Meun (Tournai, Bibliothèque de la Ville, Cod. 101; twenty-six images recorded) epitomize this counterintuitive feature of the metric. The intersection metric of the pair (−.0108) signals slightly lower similarity than expected by chance, yet the Gower metric (−.0094) indicates that the pair slightly exceeds expected similarity. Not only do these manuscripts share no images in our dataset; their marginal illuminations draw from entirely different sources and serve contrasting roles in the text. The Breviary of Renaud de Bar combines assertions of its clerical owner’s aristocratic identity (heraldry, hunting, and erotically charged games like frog in the middle; fig. 5) with parodic, often violent topsy-turvy imagery (including its infamous “killer rabbits”) and explicitly spiritual content.53 The marginal images in the Romance of the Rose, 53 See Davenport, The Bar Books, 671–94, for a comprehensive index of marginal subjects in the manuscripts for Renaud de Bar; see also Eleanor Jackson, “Medieval Killer Rabbits: When Bunnies Strike Back,” British Library Medieval Manuscripts Blog, 16 June 2021, https:// blogs.bl.uk/digitisedmanuscripts/2021/06/killer-rabbits.html. On frog in the middle and other games in the margins, see Richard H. Randall, “Frog in the Middle,” Metropolitan Museum of Art Bulletin 16, no. 10 (1958): 269–75; Lilian M. C. Randall, “Games and the Passion in Pucelle’s Hours of Jeanne d’Évreux,” Speculum 47, no. 2 (1972): 246–57; Madeline H. Caviness, “Patron or Matron? A Capetian Bride and a Vade Mecum for Her Marriage Bed,” Speculum Brey and Doyle, Beyond Comparative Analysis | 261 in contrast, illustrate specific passages from the literary work, featuring characters and allegorical personifications absent from the typical marginal repertoire.54 These two decorative programs differ in their complete lack of iconographic overlap and their radically different functions for marginal illumination. Ultimately, researchers may demur from asserting similarity based solely on shared exclusions. The similarity by omission of the Tournai Romance of the Rose and the Winter Breviary of Renaud de Bar reminds researchers that it is important to understand how metrics are calculated and to account for this when analyzing results. No metric perfectly matches the flexible usage of similarity in arthistorical discussions of iconography, but we find that a probabilistic version of Gower’s metric produces satisfactory results due to its holistic consideration of the iconographic data, considering the images a pair of manuscripts share as well as the images that are unique to each and the images that both eschew. By including all themes but weighting them based on their variation within the corpus, Gower’s metric also avoids disproportionately emphasizing some themes over others, as in the comparison of the Luttrell Psalter and the Taymouth Hours discussed above. Finally, contextualizing the results of a similarity metric in terms of a random, simulated baseline produces a measurement not of direct similarity but rather of how the observed similarity diverges from what we would expect to see by chance. Using Similarity Metrics The variety of measurements of similarity reflect the numerous ways historians conceptualize these relationships. Our discussion thus far demonstrates the nuances between different similarity or distance measurements, despite the high degree of overlap between them. Once a researcher has selected a 68, no. 2 (1993): 339. Although Martine Meuwese does not address the inspiration for the killer rabbit in Monty Python and the Holy Grail, she identifies Randall’s Images in the Margins of Gothic Manuscripts as a main source for Terry Gilliam’s animations for the film in “The Animation of Marginal Decorations in ‘Monty Python and the Holy Grail,’” Arthuriana 14, no. 4 (2004): 45–58. 54 Lucien Fourez, “Le roman de la rose de la Bibliothèque de la ville de Tournai,” Scriptorium 1, no. 2 (1946): 216. 262 | Manuscript Studies suitable similarity measurement, they can use it to answer questions about their dataset. In this section, we highlight four approaches for analyzing the results of similarity metrics we have found especially effective for our project: working directly with similarity metrics, hunting for outliers among concise numerical representations known as summary statistics, statistical tests like Analysis of Similarity, and exploratory approaches such as clustering. Working Directly with Similarity Metrics Researchers can work directly with similarity metrics to answer questions about specific pairings within their dataset.55 Since our corpus includes several multivolume manuscripts, we were able to pose questions about the iconography of manuscript volumes or fragments created together. How much iconographic consistency (or repetition) characterizes multivolume works? Although artists and patrons likely conceived these multivolume or fragmentary manuscripts as single projects, our data distinguish between volumes in such a way that they function, as far as the computational analyses go, as separate books. Here, we used these sometimes arbitrary divisions to demonstrate how similarity metrics illuminate the iconographic relationship between two sections or parts of a single work. The question is this: Taking the pairs of multivolume manuscripts in isolation, is their similarity significantly higher than we would expect to see by chance? Thirty of the 237 total manuscripts in our dataset represent either discrete volumes or substantial fragments of fourteen complete works. Most of these multivolume manuscripts are split into two volumes or fragments, but two works are bound in three volumes. Having calculated similarity measurements for every manuscript pair represented in our dataset, we filtered the results to show only the manuscript pairs that comprise these multipart works. 55 Because of the large number of comparisons produced by a similarity analysis, researchers should exercise some caution when interpreting individual results. Specifically, more stringent criteria for significance may be appropriate. For a brief discussion of this issue, see Spiegelhalter, The Art of Statistics, 278–80. Brey and Doyle, Beyond Comparative Analysis | 263 The similarity metrics within this small set of multivolume manuscripts reveal patterns in artists’ working practices that shifted based on the density of illumination they created. Books with dense or abundant marginal imagery tend to have iconographically similar volumes or fragments, showing artists revisiting certain iconographic motifs throughout the work (fig. 6). In contrast, iconographic similarity is less pronounced in books with more sparse marginal illumination, reflecting both the close matches between expected and simulated metrics for sparse-marginalia manuscripts and a preference for greater variety in manuscripts with fewer illuminations. Densely illustrated multivolume manuscripts show how iconographic themes tend to repeat across volumes, resulting in higher similarity scores than expected by chance. As figure 6 shows, one manuscript pair with high similarity measurements is a two-volume psalter for Ghent use, today in Figure 6. Scatterplot of pairs of multivolume manuscripts, showing the average marginal image count per pair against the probabilistic Gower distance. Manuscript pairs with more marginal images have lower distances (higher similarity) than manuscript pairs with fewer marginalia, which have distances closer to the random baseline (the 0 grid line on the y axis). Based on metrics calculated from the entries in Randall, Images in the Margins of Gothic Manuscripts. 264 | Manuscript Studies the Bodleian Library, Oxford (MS Douce 5 and 6, figs. 7–9).56 The two volumes of this work were each executed by different artists, both with wide-ranging careers and distinct approaches to the planning and design of their marginal illumination.57 Despite these differences, the manuscripts share an overlap of 63 images between the 158 recorded for MS Douce 5 and the 279 for MS Douce 6. The two volumes’ iconographic coherence results in higher-than-expected measurements of similarity (−.02). Coordination between the patron and the two artists precipitated the high measurement of similarity within this multivolume manuscript: both artists of the Ghent Psalter were working toward the same thematic specifications. Their higher-than-expected similarity also demonstrates the role of repetition in densely illuminated books, as artists often chose to reprise important themes throughout. Among these pairs of manuscripts, measured similarity decreases substantially once the average image counts between manuscripts falls below forty-five.58 This value reflects a threshold above which artists tended to begin repeating themselves but below which they maintained iconographic variation akin to that expected in our random scenario. The three volumes of the Beaupré Antiphonary (Baltimore, Walters Art Museum, MS W.759–761) have only between thirteen and sixteen images recorded in each volume and overlaps between zero and two (fig. 10).59 The similarity measurements for the Beaupré volumes (between −.0031 and −.0023) are effectively indistinguishable from random overlap. Whereas multivolume manuscripts with a profusion of marginal images seem to draw repeatedly on the same marginal motifs, creating a coherent and measurably similar collection of images, 56 Stones, Gothic Manuscripts, part 1, 2:344–54; Elizabeth Solopova, Latin Liturgical Psalters in the Bodleian Library: A Select Catalogue (Oxford: Bodleian Library, 2013), 379–87. 57 Stones, Gothic Manuscripts, part 1, 2:351–52. 58 By contrast, this threshold is closer to an average of one hundred instances of marginalia for the probabilistic intersection metric, suggesting that the probabilistic Gower metric performs substantially better when analyzing more sparsely illustrated manuscripts. 59 Lilian M. C. Randall, Medieval and Renaissance Manuscripts in the Walters Art Gallery, vol. 3, pt. 1, Belgium, 1250–1530 (Baltimore: Johns Hopkins University Press, 1997), 25–56; Stones, Gothic Manuscripts, part 1, 2:384–97. Brey and Doyle, Beyond Comparative Analysis | 265 Figure 7. Marginal illuminations of a beggar with a basket on his back containing an ape (right) and a bestiary representation of a unicorn hunt (below). Psalter for Ghent use, ca. 1315–25. Oxford, Bodleian Library, MS Douce 5, fol. 74r. Reproduced according to the terms and conditions of the CC-BY-NC 4.0 license. The manuscript pair Douce 5 and 6 (see figs. 8 and 9), made by different artists for the same commission, show higher-than-expected measurements of similarity, reflecting the coordination between the artists and the patron and demonstrating the role of repetition of key iconographic themes in extensively illuminated manuscript projects. 266 | Manuscript Studies Figure 8. Detail of a beggar with a basket on his back containing an ape. Psalter for Ghent use, ca. 1315–25. Oxford, Bodleian Library, MS Douce 6, fol. 153r. Reproduced according to the terms and conditions of the CC-BY-NC 4.0 license. [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library Brey and Doyle, Beyond Comparative Analysis | 267 Figure 9. Detail of a bestiary representation of a unicorn hunt. Psalter for Ghent use, ca. 1315–25. Oxford, Bodleian Library, MS Douce 6, fol. 39r. Reproduced according to the terms and conditions of the CC-BY-NC 4.0 license. manuscripts with fewer marginal images contain more limited repetitions, making their discrete parts no more similar than we would expect to see by chance. Whether this quantitative threshold of marginal images reveals a trend in manuscript artists’ approaches to large-scale book design and what it might mean remain open questions for future work. Similarity measurement provides a means to investigate such large-scale patterns in artists’ working practices across manuscripts. 268 | Manuscript Studies Figure 10. Composite image of sequential openings from the Beaupré Antiphonary, including one page with marginal illuminations of a fishmonger, a man holding a scroll, and a winged hybrid, 1289–90. Baltimore, Walters Art Museum, MS W.759, fols. 105v–113r (marginalia on fol. 108r). Reproduced according to the terms and conditions of the CC0 license. Brey and Doyle, Beyond Comparative Analysis | 269 Summary Statistics and Outliers: Average Similarity We can quickly discern whether a manuscript is typical or unusual in its iconographic themes by calculating summary statistics like the average of all its similarity metrics. This approach can identify specific outliers at each extreme to serve as case studies for further research. Manuscripts that use marginal images as a kind of subject-specific illustration tend to have quite low average similarity, such as the Veil rentier d’Audenarde (The rent-book of Audenarde, Brussels, Bibliothèque royale de Belgique, MS 1175), a manuscript unique in our dataset that lists and illustrates the landholdings of a lord in eastern Flanders.60 Devotional works that contain lengthy cycles of marginal illustrations, often depicting unusual biblical episodes in sequence, also have low average similarity but, in some cases, high pairwise similarity to each other.61 Manuscripts with numerous, specialized marginalia that exclude many common themes make up the bulk of both categories. As heavily illustrated outliers, these low-similarity manuscripts have often attracted scholarly attention for their unusual approach to the use of images. 60 Léo Verriest, Le Polyptyque illustré dit “Veil rentier” de Messire Jehan de Pamele-Audenarde (vers 1275) (Brussels: printed by the author, 1950); Margaret Goehring, “Signs of the City: Seigniorial Power and Vernacular Visual Culture in Two Northern French Rent-Books,” Studies in Iconography 41 (2020): 1–29; see also Stones, Gothic Manuscripts, part 1, 2:297–98. Other unusual manuscripts with low average Gower similarity include Frederick II, De arte venandi cum avibus (Paris, Bibliothèque nationale de France, MS fr. 12400) and the Bird Psalter (Cambridge, Fitzwilliam Museum, 2-1954). Interestingly, two manuscripts with relatively high average intersection similarity have low average Gower similarity: the Copenhagen Psalter (Copenhagen, Det Kongelige Bibliotek, MS G.K.S. 3384.8°) and the Hours of Jeanne d’Évreux (New York, Metropolitan Museum of Art, Cloisters Collection, 54.1.2). 61 The Gower distance metric in particular assigned manuscripts with extensive narrative cycles low average similarity. These included the Taymouth Hours (London, British Library, Yates Thompson MS 13), the Isabella Psalter (Munich, Bayerische Staatsbibliothek, MS Cod. gall. 16), the Tickhill Psalter (New York, New York Public Library, MS Spencer 26), and the Queen Mary Psalter (London, British Library, Royal MS 2 B VII). The Gower metric measured the Isabella Psalter as more similar to the Queen Mary Psalter (−.011) than expected by chance but slightly more dissimilar to the Taymouth Hours (.007). 270 | Manuscript Studies Manuscripts with the highest average similarity primarily consist of personal devotional books (psalters, hours, and breviaries), such as the winter volume of the Breviary of Renaud de Bar, the Maastricht Hours, and one volume of the psalter for Ghent use (MS Douce 5; Figs. 4, 5, and 7). These manuscripts use marginalia as a signifier of courtliness, alternating between moralizing parodies, allusive commentaries on or references to the adjacent text, and decorative fillers. Their eclectic approach stands in contrast to the narrative or illustrative cycles that characterize highly illuminated manuscripts with low average similarity. Three manuscripts with particularly high average similarity depart from these standard devotional texts: two volumes from a set of Arthurian romances from the late thirteenth century (Paris, Bibliothèque nationale de France MS fr. 95; and New Haven, CT, Yale University, Beinecke Rare Book and Manuscript Library, MS 229) and a mid-fourteenth-century copy of the Romance of the Rose (Paris, Bibliothèque nationale de France, MS fr. 25526). The Arthurian volumes belong to the so-called Dampierre Group of luxury manuscripts, illuminated for members of the courtly circle of Guy of Dampierre, count of Flanders (d. 1305), largely by two artists working in the region of Thérouanne.62 These artists also produced five other manuscripts included in our dataset, four of which are devotional manuscripts.63 Crossover in their repertoire of marginal images and their patrons’ desire for prolific marginalia 62 Alison Stones, “The Illustrations of BN, Fr. 95 and Yale 229: Prolegomena to a Comparative Analysis,” in Word and Image in Arthurian Literature, ed. Keith Busby (New York: Garland, 1996), 203–83; Elizabeth Moore Hunt, Illuminating the Borders of Northern French and Flemish Manuscripts, 1270–1310 (New York: Routledge, 2007), 6, 79–110; Stones, Gothic Manuscripts, part 1, 2:550–75; Emily R. Shartrand, “Sexual Warfare in the Margins of Two Late-ThirteenthCentury Franco-Flemish Arthurian Romance Manuscripts” (PhD diss., University of Delaware, 2020). On the Dampierre Group more broadly, see Hunt, Illuminating the Borders; Kerstin Carlvant, Manuscript Painting in Thirteenth-Century Flanders: Bruges, Ghent and the Circle of the Counts (London: Harvey Miller, 2012), 117–35. 63 The Psalter of Guy of Dampierre (Brussels, Bibliothèque royale de Belgique, MS 10607); Guillaume de Tyre, Histoire de la guerre sainte (Paris, Bibliothèque nationale de France, MS fr. 2754); the Margaret the Black Psalter (private collection; formerly Christie’s, Arcana Collection, lot 29; Sotheby’s 21.vi.88, lot 37; Kraus 75/88); and the two-volume Franciscan Psalter-Hours (Paris, Bibliothèque nationale de France, MS lat. 1076 and Marseille, Bibliothèque municipale, MS 111). Brey and Doyle, Beyond Comparative Analysis | 271 may help explain the surprisingly high average similarity of these Arthurian manuscripts. The Romance of the Rose comes from the Paris workshop of married manuscript makers Jeanne and Richard de Montbaston, who appear to have specialized in vernacular literature, serving both courtly and more modest patrons.64 The nineteen copies of the Rose attributed to the Montbastons reflect the wide economic range of their patrons; this is the most extensively illuminated of the group.65 These devotional and literary volumes possess high average iconographic similarity, but they are not “typical” manuscripts as such—far from it. They contain many more instances of marginalia and many more iconographic themes than the average manuscript. Contextualizing these manuscripts using summary statistics, we may understand them as extreme examples from the center of their manuscript culture. Analysis of Similarity Manuscript researchers can use similarity metrics to test assumptions about a priori groups with a specific statistical test known as Analysis of Similarity. This type of statistical hypothesis testing is designed to see whether the observed difference in a sample is likely to be reflected in the population from which it is drawn. This is especially helpful for understanding how surviving manuscripts might reflect the much larger corpus of manuscripts lost to time. For example, art historians may wish to ask whether two regions had distinct or interconnected manuscript cultures, especially when researchers have traditionally studied those manuscripts within nationalist frameworks that separate objects based on modern or historical geopolitical borders. Our example dataset consists of 237 manuscripts produced on either side of the English Channel, with 59 localized to England, and 178 localized 64 Rouse and Rouse, Manuscripts and Their Makers, 1:234–60; see also Camille, Image on the Edge, 147–49; Sylvia Huot, The Romance of the Rose and Its Medieval Readers: Interpretation, Reception, Manuscript Transmission (Cambridge: Cambridge University Press, 1993), 273–322. 65 Rouse and Rouse, Manuscripts and Their Makers, 1:242–43. 272 | Manuscript Studies to continental Europe, predominantly France and Flanders. Given the iconographic data discussed above, we can ask, Were manuscripts produced on one side of the English Channel more iconographically similar to each other than they were to manuscripts produced on the other side? This is precisely the type of question that can be answered with an Analysis of Similarity test, which evaluates whether the similarity of all pairs within each group is significantly greater than the similarity of all pairs across groups (that is, greater than we would expect to see by chance if there was no difference in the larger corpus from which extant manuscripts are preserved).66 Performing an Analysis of Similarity test on our data, using Gower’s metric, we found that there is no clear difference between the marginalia of trans-Channel pairs of manuscripts and cis-Channel pairs of manuscripts. The Analysis of Similarity test produces a result between −1 and 1. Higher in-group similarities yield a number closer to 1, and higher between-group similarities produce a number closer to −1. In the case of our analysis, the test statistic is −.02941, indicating that the difference between in-group and between-group similarity is very small. This number is also one we would expect to see in a random scenario: 73 percent of ten thousand random simulations with no underlying difference between groups produced a result at least as extreme as this one, so this tiny divergence between the in-group and between-group similarity is quite plausibly a product of chance.67 A few motifs exhibit clear regional patterns. Elephants and the miracles of the Virgin, for example, are more common in English manuscripts, whereas 66 An Analysis of Similarity test could also be a first step in extending Madeline H. Caviness’s inquiry into the gendered imagery of the margins in fourteenth-century prayer books made for elite women and men. Caviness, “Hedging in Men and Women: The Margins as an Agent of Gender Construction,” in Reframing Medieval Art: Difference, Margins, Boundaries (selfpublished, 2001), ch. 3, http://dca.lib.tufts.edu/caviness/chapter3.html. Ecologists often use this approach to compare the impact of different variables on ecological communities. Legendre and Legendre, Numerical Ecology, 603–11; Pier Luigi Buttigieg and Alban Ramette, “A Guide to Statistical Analysis in Microbial Ecology: A Community-Focused, Living Review of Multivariate Data Analyses,” FEMS Microbiology Ecology 90, no. 3 (December 2014): 547. 67 In this analysis, we use the original version of Gower’s metric, with ten thousand permutations to check the significance of our findings, that is, whether the observed results are distinguishable from randomized datasets. The statistical significance is the probability of obtaining test results at least as extreme as the results actually observed within these ten thousand randomized permutations. Brey and Doyle, Beyond Comparative Analysis | 273 bandyball and fables are more common in French and Flemish manuscripts. On the whole, however, the test shows no evidence of articulated distinctions between manuscripts on opposing sides of the Channel, but neither does it preclude their existence. Analysis of Similarity indicates that the regional origins of a manuscript pair can predict almost nothing about their iconographic similarity, underscoring long-standing critiques of nationalist approaches to manuscript studies and challenging the continued use of such frameworks in major catalogs and funding initiatives.68 Clustering While Analysis of Similarity measures the similarity of manuscripts within pre-identified groups, other methods known as clustering or community detection use indices of similarity as a quantitative basis for identifying groups in a dataset. Many researchers calculate similarity measures principally as a grounds for clustering, rather than analyzing the resulting metrics directly. Clustering methods assign entities to groups based on some set of criteria related to their proximity or density.69 Some clustering tools automatically determine an optimal number of groups based on some set mathematical criteria, while others require users to select a number.70 Clustering is an open-ended, exploratory method that can provide researchers with a different perspective on relationships within their dataset and prompt new questions. When a set of clusters is well differentiated, different proximity-based clustering 68 Lucy Freeman Sandler, “Illuminated in the British Isles: French Influence and/or the Englishness of English Art, 1285–1345,” Gesta 45, no. 2 (2006): 177–88; Sandler, Gothic Manuscripts 1285–1385 (London: Harvey Miller, 1986), 16–23; Stones, Gothic Manuscripts, part 1, 1:18–19. For nationalism and the historiography of medieval art more broadly, see Jonathan J. G. Alexander, “Medieval Art and Modern Nationalism,” in Medieval Art: Recent Perspectives: A Memorial Tribute to C. R. Dodwell, ed. Gale R. Owen-Crocker and Timothy Graham (Manchester: Manchester University Press, 1998), 206–23; Richard Marks, “The Englishness of English Gothic Art?,” in Gothic Art & Thought in the Later Medieval Period: Essays in Honor of Willibald Sauerländer, ed. Colum Hourihane (Princeton, NJ: Index of Christian Art, 2011), 64–89. 69 Han, Kamber, and Pei, Data Mining, 444. 70 Christian M. Hennig, Marina Meila, Fionn Murtagh, and Roberto Rocci, eds., Handbook of Cluster Analysis (Boca Raton, FL: CRC Press, 2016), 14–15. [192.42.89.170] Project MUSE (2024-04-30 16:33 GMT) Wellesley College Library 274 | Manuscript Studies algorithms will often identify very similar groups. By contrast, when the size or even the presence of clusters is more ambiguous, as is often the case with similarity indices based on many features, different methods can produce very different results. Researchers must draw on established knowledge about their subjects and their own expertise to determine whether the resulting clusters can serve as proxies for some real historical phenomenon. Biologists often use clustering to create taxonomies or to formulate preliminary hypotheses about causal relationships that can be tested further.71 In manuscript studies, stemmatologists already use distance measurements and clustering algorithms to produce manuscript stemmae based on textual discrepancies.72 Given lacunae in manuscript preservation, it may be especially helpful here to learn from fields like paleontology, where the chance loss or preservation of fossils blurs together groups that would otherwise be easily distinguished.73 We applied an agglomerative hierarchical clustering algorithm known as Ward’s method to the distance matrix based on the probabilistic Gower index.74 Although this method, like many standard methods for clustering, is not ideal for high-dimensional data like our iconographic themes, we adopted this approach to illustrate clustering here because of its widespread use on datasets with fewer features.75 We visualized the result using a dendrogram, or tree diagram (fig. 11), showing the proximity of manuscripts to 71 Historians of material culture and textual transmission have criticized biological metaphors, yet these analogies continue to shape our models of historical processes. Bence Nanay, “George Kubler and the Biological Metaphor of Art,” British Journal of Aesthetics 58, no. 4 (December 18, 2018): 424–26; Armin Hoenen, “Evolutionary Models in Other Disciplines,” in Roelli, Handbook of Stemmatology, 534–86. 72 Teemu Roos, “Computational Construction of Trees,” in Roelli, Handbook of Stemmatology, 317–19. 73 Matthew J. Vavrek, “A Comparison of Clustering Methods for Biogeography with Fossil Datasets,” PeerJ 4 (February 25, 2016): e1720. This article focuses on results from clustering methods that classify observations into a predetermined number of clusters. 74 Ward’s method is one of several approaches to clustering that seeks to minimize variance within clusters, prioritizing this over other criteria. Legendre and Legendre, Numerical Ecology, 360. 75 Existing approaches to clustering high-dimensional data, such a subspace clustering, focus on reducing the number of dimensions while minimizing information loss. Ira Assent, “Clustering High Dimensional Data,” WIREs Data Mining and Knowledge Discovery 2, no. 4 (2012): 340–50. Brey and Doyle, Beyond Comparative Analysis | 275 Figure 11. Dendrogram of the iconographic clusters of French, Flemish, and English manuscripts with marginalia. Of the fifteen color-coded clusters, the two largest (purple and magenta) contain manuscripts that are about as similar to each other as expected by chance. The remaining sixty-nine manuscripts form smaller clusters of iconographically similar manuscripts, with a few thematically unusual manuscripts in clusters of one (visible in details A and B). Data and manuscript identifiers based on Randall, Images in the Margins of Gothic Manuscripts. For modern shelfmarks and locations, see nn 77 and 78. 276 | Manuscript Studies each other, which we cut into fifteen clusters. The number of clusters here is subjective. In this case, we determined it using the minimum number of clusters that satisfied a stopping criterion: that no single cluster should contain more than half of the manuscripts. Our goal in setting this stopping criterion was to create a usable visualization of the data, rather than a definitive typology. The resulting diagram offers the possibility to consider similarity as a basis for groupings. Of the fifteen color-coded clusters, the two largest (purple and magenta, containing 100 and 68 manuscripts, respectively) contain manuscripts that are about as similar to each other as expected by chance.76 Smaller clusters contain between 2 and 16 manuscripts that are more similar to each other than we would expect to see by chance, and 3 thematically unusual manuscripts form their own clusters in isolation.77 Some of the smaller clusters appear to be linked by patronage or other aspects of production that were not directly factored in to calculating the distance metrics or clusters. The concentration of five royal women’s manuscripts with an iconographic emphasis on narrative cycles within two closely related clusters corresponds with our findings regarding manuscripts with low average similarity and further affirms them as a distinct group within the dataset ripe for further investigation (fig. 11, detail A).78 76 The manuscripts in these two clusters also tend to have fewer recorded marginal images than the manuscripts sorted elsewhere. Manuscripts in the largest clusters have an average of 4.7 and 24.1 instances of marginalia recorded, compared with an average of 56.8 instances in the next-largest substantial cluster. These two large clusters tend to group manuscripts about which it is difficult to make substantive conclusions solely on the basis of this iconographic data. 77 The Smithfield Decretals (London, British Library, Royal MS 10 E IV), the rent-book of Audenarde (Brussels, Bibliothèque royale de Belgique, MS 1175), and the Tickhill Psalter (New York, New York Public Library, MS Spencer 26) form their own clusters. 78 These manuscripts are the Hours of Jeanne de Navarre (Paris, Bibliothèque nationale de France, MS nouv. acq. lat. 3145, ex-Yates Thompson MS 75), the Hours of Yolande of Flanders (London, British Library, Yates Thompson MS 27), the Taymouth Hours (London, British Library Yates, Thompson MS 13), the Queen Mary Psalter (London, British Library, Royal MS 2 B VII), and the Isabella Psalter (Munich, Bayerische Staatsbibliothek, MS Cod. gall. 16). See above, n61. While many of these manuscripts have been thoroughly studied individually or in the context of artist and workshop productions, we know of no work considering these Brey and Doyle, Beyond Comparative Analysis | 277 Clustering may also produce surprising juxtapositions, which can raise new questions about well-studied manuscripts. For instance, why was the East Anglian Luttrell Psalter clustered with the Belleville Breviary (Paris, Bibliothèque nationale de France, MS lat. 10483–4) and the Breviary of Charles V (Paris, Bibliothèque nationale de France, MS lat. 1052), from the Paris workshops of Jean Pucelle and his follower, Jean le Noir, respectively (fig. 11, detail B)? Because Jean le Noir copied elements of the Belleville Breviary’s marginal iconography in his breviary for Charles V, their close relationship is unsurprising, rendering their association with the Luttrell Psalter all the more unexpected. Although the Luttrell Psalter artist did not intentionally reference the iconography of Pucelle’s breviary as Jean le Noir did, their association in the cluster raises interesting questions about the nature of the visual language of marginal iconography shared across the Channel.79 Conclusion The complexity of manuscripts as objects has inspired a rich tradition of structured description, which researchers and catalogers continue to expand. Methods like similarity measurement and its analysis permit researchers to use this wealth of structured data at scales that were previously too time consuming to be practical. We hope that this article has demonstrated the utility of similarity metrics beyond specialized applications in areas such as stemmatic analysis, and that they will continue to find wider use among manuscript researchers. Researchers can use similarity measurement to compare multifaceted historical objects, drawing on both existing sources for structured manuscript data and original research. Similarity metrics will early fourteenth-century manuscripts with narrative marginal imagery as a cross-Channel courtly phenomenon. 79 François Avril, Les fastes du Gothique: Le siècle de Charles V (Paris: Bibliothèque nationale de France, 1981), 295–96, 333–34; Joan A. Holladay, “Jean Pucelle and His Patrons,” in Jean Pucelle: Innovation and Collaboration, ed. Kyunghee Pyun and Anna D. Russakoff (London: Harvey Miller, 2013), 25. 278 | Manuscript Studies not suit every situation, particularly cases where researchers compare objects with many features. When applied to appropriate research questions, however, they may provide new insights into datasets that are too time consuming to analyze using traditional approaches. Similarity metrics can find outliers, answer specific questions, and highlight broad patterns within a set of manuscripts. When compared to a random baseline, they can also provide a sense of whether observed patterns may be attributed to chance or should instead be interpreted as reflecting substantive historical phenomena. On their own, similarity measurements allow researchers to compare pairs of objects and identify exceptional cases based on summary statistics like average similarity. Methods like Analysis of Similarity and clustering extend their usefulness, permitting investigators to answer research questions about the comparison of a priori categories, the identification of new groupings, and the centrality of objects in a corpus. Many quantitative methods for comparing and categorizing multifeature observations build on principles akin to those employed in the calculation of similarity metrics, so experimenting with these concepts can be a step toward other computational approaches to analyzing manuscript data. Quantitative metrics require explicit definitions of similarity, which encourages researchers to reflect on conventional conceptions of similarity and difference. In addition to inviting researchers to reevaluate and define foundational disciplinary concepts, such transparency also encourages less sweeping, more carefully circumscribed arguments. More than simply a tool, similarity measurement, like other computational approaches, presents a framework for reimagining the possibilities of humanistic inquiry. As new forms and quantities of data about manuscripts become accessible around the world, researchers seeking to see both the forest and the trees must increasingly turn to such scalable, quantitative approaches if they wish to discern macro-patterns and the place of individual objects within them. While quantitative approaches suited to analyzing these macro-patterns have traditionally been perceived as simplifications or reductions of nuance and historical ambiguity, the work we have presented here demonstrates the potential benefits of this trade-off. Similarity metrics let researchers step back Brey and Doyle, Beyond Comparative Analysis | 279 and consider the big picture, while capturing more of the complexity of humanistic usage of “similarity” than one might expect. We see the profusion of distinct quantitative metrics of similarity as a counterpart to, rather than foreclosure of, the subtleties inherent in conventional usage of the term similarity. If the criteria for comparing manuscripts rest in the eye of the beholder, similarity metrics render them transparent, explicit, and expandable. 280 | Manuscript Studies Appendix: Sample Calculations Although general equations for calculating metrics are readily accessible online, step-by-step demonstrations of their implementation are less common. The calculation of these metrics by hand is not necessary thanks to their implementation as functions in programming languages like R or Python, or in free, open-source packages or libraries, however we do so here so readers can get an intuition of how different metrics may produce startlingly different results from the same data. In table 1, we provide a toy dataset, which we will use as the basis for calculating metrics. Because Gower’s metric requires the range of each feature, we include a separate row totaling the range for each column. Table 1. A Toy Dataset with Feature Counts Feature 1 Feature 2 Feature 3 MS 1 10 5 0 MS 2 0 5 10 MS 3 20 10 80 Range of Feature (difference between largest and smallest value) 20 5 80 The Brainerd-Robinson similarity metric is calculated with percentages rather than absolute values, so we convert the values to percentages for each manuscript in table 2. Table 2. Features as Percentages Feature 1 (%) Feature 2 (%) Feature 3 (%) MS 1 67 33 0 MS 2 0 33 67 MS 3 18 9 73 Sample calculations follow for each metric, using the feature values for one pair of manuscripts (MS 1 and MS 2). Brey and Doyle, Beyond Comparative Analysis | 281 min{10,0}  +  min{5,5}  +  min{0,10}  =  5   Intersection Similarity min{10,0}  +  min{5,5}  +  min{0,10}  =  5   Brainerd-Robinson Similarity 200 − 67 − 0 + 33 − 33 + 0 − 67 = 66 200Distance − 67 − 0 + 33 − 33 + 0 − 67 = 66 Gower’s 1  –   1 × 1− 3 10 − 0 20 + 1− 5−5 5 + 1− 0 − 10 80 = .21 Calculating these10values 1 − 0 for the remaining 5 − 5 two possible0pairs − 10 of manu× 1 − + 1 − + 1 − = .21 1  –   scripts results in table 3, to each metric 3 20with the most similar 5 pairs according 80 highlighted in bold. Note that Gower’s metric is a distance metric rather than a similarity metric, so a lower value result indicates greater proximity or similarity. Table 3. Pairwise Metrics Intersection Similarity Brainerd-Robinson Similarity Gower’s Distance MS 1-MS 2 5 66 0.21 MS 1-MS 3 15 54 0.83 MS 2-MS 3 15 152 0.95 The intersection metric assigns MS 3 high similarity to the other two manuscripts based in part on the large number of features present in that manuscript overall. By contrast, because Brainerd-Robinson’s metric considers only percentages, it clearly differentiates between the similarity of MS 1–MS 3 and MS 2–MS 3. Gower’s distance down-weights features with large ranges (feature 3) and puts emphasis instead on features with small ranges (feature 2), resulting in the highest pairwise similarity going to MS 1 and MS 2 for that metric.