Applying computer vision to digitised natural history collections for climate change research: temperature-size responses in British butterflies
Natural history collections (NHCs) are invaluable resources for understanding biotic response to global change. Museums around the world are currently imaging specimens, capturing specimen data, and making them freely available online. In parallel to the digitisation effort, there have been great advancements in computer vision (CV): the computer trained automated recognition/detection, and measurement of features in digital images. Applying CV to digitised NHCs has the potential to greatly accelerate the use of NHCs for biotic response to global change research. In this paper, we apply CV to a very large, digitised collection to test hypotheses in an established area of biotic response to climate change research: temperature-size responses. We develop a CV pipeline (Mothra) and apply it to the NHM iCollections of British butterflies (>180,000 specimens). Mothra automatically detects the specimen in the image, sets the scale, measures wing features (e.g., forewing length), determines the orientation of the specimen (pinned ventrally or dorsally), and identifies the sex. We pair these measurements and meta-data with temperature records to test how adult size varies with temperature during the immature stages of species and to assess patterns of sexual-size dimorphism across species and families. Mothra accurately measures the forewing lengths of butterfly specimens and compared to manual baseline measurements, Mothra accurately determines sex and forewing lengths of butterfly specimens. Females are the larger sex in most species and an increase in adult body size with warm monthly temperatures during the late larval stages is the most common temperature size response. These results confirm suspected patterns and support hypotheses based on recent studies using a smaller dataset of manually measured specimens. We show that CV can be a powerful tool to efficiently and accurately extract phenotypic data from a very large collection of digital NHCs. In the future, CV will become widely applied to digital NHC collections to advance ecological and evolutionary research and to accelerate the use of NHCs for biotic response to global change research.