Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Object Detection in Historical Images: Transfer Learning and Pseudo Labelling

Online AM: 15 October 2024 Publication History

Abstract

The automatic analysis of images in the historical sciences often requires the identification of objects. Object identification is a well researched problem for modern photographs, however, for historical material annotations are often necessary. We present a solution for finding objects without manual work. The method consists of a style transfer of images from the COCO dataset into the domain using CycleGAN and training with items obtained through pseudo labelling on the original and the additional transferred COCO images. Different strategies to assemble the dataset are compared. The best method obtains a F1 score of 0.58 for 15 object types without any labelling.

References

[1]
Wouter Haverals and Vanessa Joosen. Constructing age in children’s literature: A digital approach to Guus Kuijer’s oeuvre. The Lion and the Unicorn, 45(1):25–45, 2021.
[2]
Holly Rushmeier, Ruggero Pintus, Ying Yang, Christiana Wong, and David Li. Examples of challenges and opportunities in visual analysis in the digital humanities. Human Vision and Electronic Imaging XX, 9394:397–405, 2015.
[3]
Alan Liu. Toward a diversity stack: Digital humanities and diversity as technical problem. Publications of the Modern Language Association of America, 135(1):130–151, 2020.
[4]
Hubertus Kohle. Digitale Bildwissenschaft. Hülsbusch, 2013.
[5]
Leonardo Impett and Fabian Offert. There is a digital art history. CoRR, abs/2308.07464, 2023.
[6]
Yongho Kim, Thomas Mandl, Chanjong Im, Sebastian Schmideler, and Wiebke Helm. Applying computer vision systems to historical book illustrations: Challenges and first results. In Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), Riga, Latvia, October 21-23, 2020, volume 2865 of CEUR Workshop Proceedings, pages 255–260. CEUR-WS.org, 2020.
[7]
Matthias Springstein, Stefanie Schneider, Javad Rahnama, Eyke Hüllermeier, Hubertus Kohle, and Ralph Ewerth. iART: a search engine for art-historical images to support research in the humanities. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2801–2803, 2021.
[8]
Nikolay Banar, Walter Daelemans, and Mike Kestemont. Multi-modal label retrieval for the visual arts: The case of Iconclass. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence, ICAART 2021, Volume 1, Online Streaming, February 4-6., pages 622–629. SCITEPRESS, 2021.
[9]
Federico Milani and Piero Fraternali. A dataset and a convolutional model for iconography classification in paintings. Journal on Computing and Cultural Heritage (JOCCH), 14(4):1–18, 2021.
[10]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations, ICLR San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, June 27-30, pages 770–778. IEEE Computer Society, 2016.
[12]
Mingxing Tan and Quoc V. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 9-15 June, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 2019.
[13]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
[14]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 20-25 June 2009, Miami, Florida, USA, pages 248–255. IEEE Computer Society, 2009.
[15]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision - ECCV - 13th European Conference, Zurich, Switzerland, Sept. 6-12, Proceedings, Part V, volume 8693 of LNCS, pages 740–755. Springer, 2014.
[16]
Wiebke Helm, Chanjong Im, Thomas Mandl, and Sebastian Schmideler. Herausforderungen für die Klassifikation historischer Buchillustrationen. Überlegungen am Beispiel retrodigitalisierter Kinder- und Jugendsachbücher des 19. Jahrhunderts. In 6. Tagung des Verbands Digital Humanities im deutschsprachigen Raum, DHd 2019, Frankfurt & Mainz, Germany, March 25-29, 2019, 2019.
[17]
Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. arXiv, 2013.
[18]
Hamed Alqahtani, Manolya Kavakli-Thorne, and Gulshan Kumar. Applications of generative adversarial networks (GANs): An updated review. Archives of Computational Methods in Engineering, 28:525—552, 2021.
[19]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
[20]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, ICCV Venice, Italy, October 22-29., pages 2242–2251. IEEE Computer Society, 2017.
[21]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. CoRR, abs/1508.06576, 2015.
[22]
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 91–99, 2015.
[23]
Glenn Jocher. YOLOv5. GitHub, 2020.
[24]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
[25]
Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, and Lei Zhang. A strong and reproducible object detector with only public datasets. CoRR, abs/2304.13027, 2023.
[26]
Toru Ogawa, Atsushi Otsubo, Rei Narita, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. Object detection for comics using Manga109 annotations. CoRR, abs/1803.08670, 2018.
[27]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, volume 9905 of Lecture Notes in Computer Science, pages 21–37. Springer, 2016.
[28]
Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Honolulu, HI, USA, July 21-26., pages 6517–6525. IEEE Computer Society, 2017.
[29]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. YOLOv4: Optimal speed and accuracy of object detection. CoRR, abs/2004.10934, 2020.
[30]
Joseph Redmon and Ali Farhadi. YOLOv3: An incremental improvement. CoRR, abs/1804.02767, 2018.
[31]
Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You Only Look Once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Las Vegas, NV, USA, June 27-30., pages 779–788. IEEE Computer Society, 2016.
[32]
Abhishek Sarda, Shubhra Dixit, and Anupama Bhan. Object detection for autonomous driving using YOLO algorithm. In International Conference on Intelligent Engineering and Management (ICIEM), pages 447–451, 2021.
[33]
Mohammed Abdulaziz Aide Al-qaness, Aaqif Afzaal Abbasi, Hong Fan, Rehab Ali Ibrahim, Saeed H. Alsamhi, and Ammar Hawbani. An improved YOLO-based road traffic monitoring system. Computing, 103(2):211–230, 2021.
[34]
Chanjong Im, Yongho Kim, and Thomas Mandl. Deep learning for historical books: classification of printing technology for digitized images. Multimedia Tools and Applications, 81(4):5867–5888, 2022.
[35]
David G Stork. Computer Vision, ML, and AI in the Study of Fine Art. Communications of the ACM, 67(5):68–75, 2024.
[36]
Margarete Pratschke. Geschichte und Kritik digitaler Kunst- und Bildgeschichte: Geschichte digitaler Kunstgeschichte, Geschichte der Digital Humanities. In Piotr Kuroczyński, Peter Bell, and Lisa Dieckmann, editors, Computing Art Reader: Einführung in die digitale Kunstgeschichte, Computing in Art and Architecture, page 20–37. arthistoricum.net-ART-Books, Dez. 2018.
[37]
Sander Münster and Melissa Terras. The visual side of digital humanities: a survey on topics, researchers, and epistemic cultures. Digital Scholarship in the Humanities, 35(2):366–389, 2020.
[38]
Abhishek Dutta, Giles Bergel, and Andrew Zisserman. Visual analysis of chapbooks printed in scotland. In Proceedings of the 6th International Workshop on Historical Document Imaging and Processing, pages 67–72, 2021.
[39]
Wentao Zhao, Dalin Zhou, Xinguo Qiu, and Wei Jiang. Compare the performance of the models in art classification. PLoS ONE 16(3): e0248414, 2021.
[40]
Babak Saleh and Ahmed M. Elgammal. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. CoRR, abs/1505.00855, 2015.
[41]
Fahad Shahbaz Khan, Shida Beigpour, Joost van de Weijer, and Michael Felsberg. Painting-91: a large scale database for computational painting categorization. Mach. Vis. Appl., 25(6):1385–1397, 2014.
[42]
Simone Bianco, Davide Mazzini, Paolo Napoletano, and Raimondo Schettini. Multitask painting categorization by deep multibranch neural network. Expert Syst. Appl., 135:90–101, 2019.
[43]
Alexander Dunst and Rita Hartel. Hin zu einer Visuellen Stilometrie: Automatische Genre-und Autorunterscheidung in graphischen Narrativen. In Kritik der digitalen Vernunft. 5. Tagung „Digital Humanities im deutschsprachigen Raum“, 2018.
[44]
Cristina Dondi, Abhishek Dutta, Matilde Malaspina, and Andrew Zisserman. The use and reuse of printed illustrations in 15th-century Venetian editions. In Printing R-Evolution and Society 1450-1500. Edizioni Ca' Foscari, 2020.
[45]
Stanislav Smirnov and Alma Eguizabal. Deep learning for object detection in fine-art paintings. In Metrology for Archaeology and Cultural Heritage (MetroArchaeo), pages 45–49, 2018.
[46]
Hassan El-Hajj, Oliver Eberle, Anika Merklein, Anna Siebold, Noga Shlomi, Jochen Büttner, Julius Martinetz, Klaus-Robert Müller, Grégoire Montavon, and Matteo Valleriani. Explainability and transparency in the realm of digital humanities: toward a historian XAI. International Journal of Digital Humanities, 5(2):299–331, 2023.
[47]
Björn Ommer. Computer Vision und Kunstgeschichte – Dialog zweier Bildwissenschaften. In Piotr Kuroczyński, Peter Bell, and Lisa Dieckmann, editors, Computing Art Reader: Einführung in die digitale Kunstgeschichte, Computing in Art and Architecture, page 60–75. arthistoricum.net-ART-Books, 2018.
[48]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 18-24 July, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021.
[49]
Sebastian Diem and Thomas Mandl. Automatic classification of portraits: Application of transformer and CNN based models for an art historic dataset. In Michael Leyer and Johannes Wichmann, editors, Lernen, Wissen, Daten, Analysen (LWDA) Conference Proceedings, Marburg, Germany, October 9-11, 2023, volume 3630 of CEUR Workshop Proceedings, pages 192–206. CEUR-WS.org, 2023.
[50]
Thomas Mandl. Digitale Sammlungen als Grundlage für Big Data: Der Umgang mit Bias in historischen Bilddaten. Medium Buch, Wolfenbütteler interdisziplinäre Forschungen, (4):31–45, 2024.
[51]
Yongho Kim, Thomas Mandl, Chanjong Im, Sebastian Schmideler, and Wiebke Helm. Applying computer vision systems to historical book illustrations: Challenges and first results. In Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), Riga, Latvia, October 21-23, 2020, volume 2865 of CEUR Workshop Proceedings, pages 255–260. CEUR-WS.org, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal on Computing and Cultural Heritage
Journal on Computing and Cultural Heritage  Just Accepted
EISSN:1556-4711
Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 15 October 2024
Accepted: 04 October 2024
Revised: 25 August 2024
Received: 28 February 2024

Check for updates

Author Tags

  1. Object Detection
  2. Image Processing
  3. Children’s Books
  4. Evaluation
  5. Distant Viewing
  6. Digital Humanities

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 45
    Total Downloads
  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)45
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media