Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities

Published: 16 July 2021 Publication History

Abstract

Descriptive and empirical sciences, such as History, are the sciences that collect, observe and describe phenomena to explain them and draw interpretative conclusions about influences, driving forces and impacts under given circumstances. Spreadsheet software and relational database management systems are still the dominant tools for quantitative analysis and overall data management in these these sciences, allowing researchers to directly analyse the gathered data and perform scholarly interpretation. However, this current practice has a set of limitations, including the high dependency of the collected data on the initial research hypothesis, usually useless for other research, the lack of representation of the details from which the registered relations are inferred, and the difficulty to revisit the original data sources for verification, corrections or improvements. To cope with these problems, in this article we present FAST CAT, a collaborative system for assistive data entry and curation in Digital Humanities and similar forms of empirical research. We describe the related challenges, the overall methodology we follow for supporting semantic interoperability, and discuss the use of FAST CAT in the context of a European (ERC) project of Maritime History, called SeaLiT, which examines economic, social and demographic impacts of the introduction of steamboats in the Mediterranean area between the 1850s and the 1920s.

References

[1]
Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, and Jiannan Wang. 2016. Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 International Conference on Management of Data. 2201–2206.
[2]
Apostolos Delis. 2020. Seafaring Lives at the crossroads of Mediterranean maritime history. International Journal of Maritime History 32, 2 (2020), 464–478.
[3]
Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. 2014. RML: A generic language for integrated RDF mappings of heterogeneous data. In Proceedings of the 7th Workshop on Linked Data on the Web.
[4]
Martin Doerr. 2003. The CIDOC conceptual reference module: An ontological approach to semantic interoperability of metadata. AI Magazine 24, 3 (2003), 75.
[5]
Martin Doerr and Dolores Iorizzo. 2008. The dream of a global knowledge—A new approach. Journal on Computing and Cultural Heritage 1, 1 (2008), 1–23.
[6]
Wolfgang Haak, Guido Brandt, Hylke N. de Jong, Christian Meyer, Robert Ganslmeier, Volker Heyd, Chris Hawkesworth, Alistair W. G. Pike, Harald Meller, and Kurt W. Alt. 2008. Ancient DNA, strontium isotopes, and osteological analyses shed light on social and kinship organization of the later stone age. Proceedings of the National Academy of Sciences 105, 47 (2008), 18226–18231.
[7]
Peter Haase, Daniel M. Herzig, Artem Kozlov, Andriy Nikolov, and Johannes Trame. 2019. Metaphactory: A platform for knowledge graph management. Semantic Web 10, 6 (2019), 1109–1125.
[8]
Tom Heath and Christian Bizer. 2011. Linked data: Evolving the web into a global data space. Synthesis Lectures on the Semantic Web: Theory and Technology 1, 1 (2011), 1–136.
[9]
Geneva Henry. 2014. Data curation for the humanities. In Research Data Management: Practical Strategies for Information Professionals, Joyce M. Ray (Ed.). Purdue University Press, 347–374.
[10]
Giorgos Kadilierakis, Pavlos Fafalios, Panagiotis Papadakos, and Yannis Tzitzikas. 2020. Keyword search over RDF using document-centric information retrieval systems. In Proceedings of the Extended Semantic Web Conference. 121–137.
[11]
Giorgos Kadilierakis, Christos Nikas, Pavlos Fafalios, Panagiotis Papadakos, and Yannis Tzitzikas. 2020. Elas4RDF: Multi-perspective triple-centered keyword search over RDF using ElasticSearch. In Proceedings of the Extended Semantic Web Conference.
[12]
Mikko Koho, Esko Ikkala, Petri Leskinen, Minna Tamper, Jouni Tuominen, and Eero Hyvönen. 2019. WarSampo knowledge graph: Finland in the second world war as linked open data. Semantic Web–Interoperability, Usability, Applicability 12, 2 (2019), 265–278.
[13]
Sanjay Krishnan, Daniel Haas, Michael J. Franklin, and Eugene Wu. 2016. Towards reliable interactive data cleaning: A user survey and recommendations. In Proceedings of the Workshop on Human-in-the-Loop Data Analytics. 1–5.
[14]
Vangelis Kritsotakis, Yannis Roussakis, Theodore Patkos, and Maria Theodoridou. 2018. Assistive query building for semantic data. In SEMANTICS Posters&Demos.
[15]
Yannis Marketakis, Nikos Minadakis, Haridimos Kondylakis, Konstantina Konsolaki, Georgios Samaritakis, Maria Theodoridou, Giorgos Flouris, and Martin Doerr. 2017. X3ML mapping framework for information integration in cultural heritage and beyond. International Journal on Digital Libraries 18, 4 (2017), 301–319.
[16]
Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke Van Erp, Kees Mandemakers, Leen Breure, Andrea Scharnhorst, Stefan Schlobach, and Frank Van Harmelen. 2015. Semantic technologies for historical research: A survey. Semantic Web 6, 6 (2015), 539–564.
[17]
Franck Michel, Johan Montagnat, and Catherine Faron Zucker. 2014. A Survey of RDB to RDF Translation Approaches and Tools. Technical Reporthal-00903568v1. HAL Archives.
[18]
Renée Miller. 2014. Big data curation. In Proceedings of the 20th International Conference on Management of Data (COMAD’14).4.
[19]
Trevor Muñoz and Allen H. Renear. 2011. Issues in Humanities Data Curation. White Paper. Available at http://hdl.handle.net/2142/30852.
[20]
Christos Nikas, Giorgos Kadilierakis, Pavlos Fafalios, and Yannis Tzitzikas. 2020. Keyword search over RDF: Is a single perspective enough?Big Data and Cognitive Computing 4, 3 (2020), 22.
[21]
Dominic Oldman and Diana Tanase. 2018. Reshaping the knowledge graph by connecting researchers, data and practices in ResearchSpace. In Proceedings of the International Semantic Web Conference. 325–340.
[22]
Aris M. Ouksel and Amit Sheth. 1999. Semantic interoperability in global information systems. ACM SIGMOD Record 28, 1 (1999), 5–12.
[23]
Carole Palmer, Nicholas M. Weber, Allen H. Renear, and Trevor Muñoz. 2013. Foundations of data curation: The pedagogy and practice of “purposeful work” with research data. Archives Journal 3. https://www.ideals.illinois.edu/handle/2142/78099.
[24]
Kostas Petrakis, Georgios Samaritakis, Thomas Kalesios, Enric Garcia Domingo, Apostolos Delis, Yannis Tzitzikas, Martin Doerr, and Pavlos Fafalios. 2021. Digitizing, curating and visualizing archival sources of maritime history: the case of ship logbooks of the nineteenth and twentieth centuries. Drassana28 (2021), 60–87.
[25]
Paraskevi Pitta, Maria Kanakidou, Nikolaos Mihalopoulos, Sylvia Christodoulaki, Panagiotis D. Dimitriou, Constantin Frangoulis, Antonia Giannakourou, et al. 2017. Saharan dust deposition effects on the microbial food web in the eastern Mediterranean: A study based on a mesocosm experiment. Frontiers in Marine Science 4 (2017), 117.
[26]
Erhard Rahm and Hong Hai Do. 2000. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin 23, 4 (2000), 3–13.
[27]
Michael Stonebraker, Daniel Bruckner, Ihab F. Ilyas, George Beskales, Mitch Cherniack, Stanley B. Zdonik, Alexander Pagan, and Shan Xu. 2013. Data curation at scale: The Data Tamer system. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR’13).
[28]
Yannis Tzitzikas, Nikos Manolis, and Panagiotis Papadakos. 2017. Faceted exploration of RDF/S datasets: A survey. Journal of Intelligent Information Systems 48, 2 (2017), 329–364.
[29]
Yannis Tzitzikas, Nikos Minadakis, Yannis Marketakis, Pavlos Fafalios, Carlo Allocca, Michalis Mountantonakis, and Ioanna Zidianaki. 2014. Matware: Constructing and exploiting domain specific warehouses by aggregating semantic data. In Proceedings of the Extended Semantic Web Conference. 721–736.
[30]
Ruben Verborgh and Max De Wilde. 2013. Using OpenRefine. Packt Publishing Ltd.

Cited By

View all
  • (2024)FastCat Catalogues: Interactive Entity-Based Exploratory Analysis of Archival DocumentsProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00035(190-194)Online publication date: 26-Jun-2024
  • (2024)Unifying Faceted Search and Analytics over RDF Knowledge GraphsKnowledge and Information Systems10.1007/s10115-024-02076-966:7(3921-3958)Online publication date: 24-Mar-2024
  • (2024)Curating the Chinese ancient book catalogsJournal of the Association for Information Science and Technology10.1002/asi.2489475:12(1331-1349)Online publication date: 21-Nov-2024
  • Show More Cited By

Index Terms

  1. FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Journal on Computing and Cultural Heritage
        Journal on Computing and Cultural Heritage   Volume 14, Issue 4
        December 2021
        328 pages
        ISSN:1556-4673
        EISSN:1556-4711
        DOI:10.1145/3476246
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 16 July 2021
        Accepted: 01 April 2021
        Revised: 01 December 2020
        Received: 01 September 2020
        Published in JOCCH Volume 14, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Data entry
        2. archival research
        3. data curation
        4. digital humanities
        5. semantic interoperability

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie
        • Individual Fellowship, Project “ReKnow—Research Documentation, Analysis and Exploration in Empirical and Descriptive Sciences”
        • European Research Council (ERC)

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)35
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 20 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)FastCat Catalogues: Interactive Entity-Based Exploratory Analysis of Archival DocumentsProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00035(190-194)Online publication date: 26-Jun-2024
        • (2024)Unifying Faceted Search and Analytics over RDF Knowledge GraphsKnowledge and Information Systems10.1007/s10115-024-02076-966:7(3921-3958)Online publication date: 24-Mar-2024
        • (2024)Curating the Chinese ancient book catalogsJournal of the Association for Information Science and Technology10.1002/asi.2489475:12(1331-1349)Online publication date: 21-Nov-2024
        • (2023)A Brief Survey of Methods for Analytics over RDF Knowledge GraphsAnalytics10.3390/analytics20100042:1(55-74)Online publication date: 17-Jan-2023
        • (2023)The SeaLiT Ontology – An Extension of CIDOC-CRM for the Modeling and Integration of Maritime History InformationJournal on Computing and Cultural Heritage 10.1145/358608016:3(1-21)Online publication date: 9-Aug-2023
        • (2023)Data Enrichment Toolchain: A Data Linking and Enrichment Platform for Heterogeneous DataIEEE Access10.1109/ACCESS.2023.331770511(103079-103091)Online publication date: 2023
        • (2023)A workflow model for holistic data management and semantic interoperability in quantitative archival researchDigital Scholarship in the Humanities10.1093/llc/fqad01838:3(1049-1066)Online publication date: 6-Apr-2023
        • (2022)CIDOC-CRM and Machine Learning: A Survey and Future ResearchHeritage10.3390/heritage50300845:3(1612-1636)Online publication date: 7-Jul-2022
        • (2022)Collaborative Data Use between Private and Public Stakeholders—A Regional Case StudyData10.3390/data70200207:2(20)Online publication date: 28-Jan-2022
        • (2022)How Your Cultural Dataset is Connected to the Rest Linked Open Data?Trandisciplinary Multispectral Modelling and Cooperation for the Preservation of Cultural Heritage10.1007/978-3-031-20253-7_12(136-148)Online publication date: 24-Nov-2022

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media