Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities

Published: 16 July 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Descriptive and empirical sciences, such as History, are the sciences that collect, observe and describe phenomena to explain them and draw interpretative conclusions about influences, driving forces and impacts under given circumstances. Spreadsheet software and relational database management systems are still the dominant tools for quantitative analysis and overall data management in these these sciences, allowing researchers to directly analyse the gathered data and perform scholarly interpretation. However, this current practice has a set of limitations, including the high dependency of the collected data on the initial research hypothesis, usually useless for other research, the lack of representation of the details from which the registered relations are inferred, and the difficulty to revisit the original data sources for verification, corrections or improvements. To cope with these problems, in this article we present FAST CAT, a collaborative system for assistive data entry and curation in Digital Humanities and similar forms of empirical research. We describe the related challenges, the overall methodology we follow for supporting semantic interoperability, and discuss the use of FAST CAT in the context of a European (ERC) project of Maritime History, called SeaLiT, which examines economic, social and demographic impacts of the introduction of steamboats in the Mediterranean area between the 1850s and the 1920s.

    References

    [1]
    Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, and Jiannan Wang. 2016. Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 International Conference on Management of Data. 2201–2206.
    [2]
    Apostolos Delis. 2020. Seafaring Lives at the crossroads of Mediterranean maritime history. International Journal of Maritime History 32, 2 (2020), 464–478.
    [3]
    Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. 2014. RML: A generic language for integrated RDF mappings of heterogeneous data. In Proceedings of the 7th Workshop on Linked Data on the Web.
    [4]
    Martin Doerr. 2003. The CIDOC conceptual reference module: An ontological approach to semantic interoperability of metadata. AI Magazine 24, 3 (2003), 75.
    [5]
    Martin Doerr and Dolores Iorizzo. 2008. The dream of a global knowledge—A new approach. Journal on Computing and Cultural Heritage 1, 1 (2008), 1–23.
    [6]
    Wolfgang Haak, Guido Brandt, Hylke N. de Jong, Christian Meyer, Robert Ganslmeier, Volker Heyd, Chris Hawkesworth, Alistair W. G. Pike, Harald Meller, and Kurt W. Alt. 2008. Ancient DNA, strontium isotopes, and osteological analyses shed light on social and kinship organization of the later stone age. Proceedings of the National Academy of Sciences 105, 47 (2008), 18226–18231.
    [7]
    Peter Haase, Daniel M. Herzig, Artem Kozlov, Andriy Nikolov, and Johannes Trame. 2019. Metaphactory: A platform for knowledge graph management. Semantic Web 10, 6 (2019), 1109–1125.
    [8]
    Tom Heath and Christian Bizer. 2011. Linked data: Evolving the web into a global data space. Synthesis Lectures on the Semantic Web: Theory and Technology 1, 1 (2011), 1–136.
    [9]
    Geneva Henry. 2014. Data curation for the humanities. In Research Data Management: Practical Strategies for Information Professionals, Joyce M. Ray (Ed.). Purdue University Press, 347–374.
    [10]
    Giorgos Kadilierakis, Pavlos Fafalios, Panagiotis Papadakos, and Yannis Tzitzikas. 2020. Keyword search over RDF using document-centric information retrieval systems. In Proceedings of the Extended Semantic Web Conference. 121–137.
    [11]
    Giorgos Kadilierakis, Christos Nikas, Pavlos Fafalios, Panagiotis Papadakos, and Yannis Tzitzikas. 2020. Elas4RDF: Multi-perspective triple-centered keyword search over RDF using ElasticSearch. In Proceedings of the Extended Semantic Web Conference.
    [12]
    Mikko Koho, Esko Ikkala, Petri Leskinen, Minna Tamper, Jouni Tuominen, and Eero Hyvönen. 2019. WarSampo knowledge graph: Finland in the second world war as linked open data. Semantic Web–Interoperability, Usability, Applicability 12, 2 (2019), 265–278.
    [13]
    Sanjay Krishnan, Daniel Haas, Michael J. Franklin, and Eugene Wu. 2016. Towards reliable interactive data cleaning: A user survey and recommendations. In Proceedings of the Workshop on Human-in-the-Loop Data Analytics. 1–5.
    [14]
    Vangelis Kritsotakis, Yannis Roussakis, Theodore Patkos, and Maria Theodoridou. 2018. Assistive query building for semantic data. In SEMANTICS Posters&Demos.
    [15]
    Yannis Marketakis, Nikos Minadakis, Haridimos Kondylakis, Konstantina Konsolaki, Georgios Samaritakis, Maria Theodoridou, Giorgos Flouris, and Martin Doerr. 2017. X3ML mapping framework for information integration in cultural heritage and beyond. International Journal on Digital Libraries 18, 4 (2017), 301–319.
    [16]
    Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke Van Erp, Kees Mandemakers, Leen Breure, Andrea Scharnhorst, Stefan Schlobach, and Frank Van Harmelen. 2015. Semantic technologies for historical research: A survey. Semantic Web 6, 6 (2015), 539–564.
    [17]
    Franck Michel, Johan Montagnat, and Catherine Faron Zucker. 2014. A Survey of RDB to RDF Translation Approaches and Tools. Technical Reporthal-00903568v1. HAL Archives.
    [18]
    Renée Miller. 2014. Big data curation. In Proceedings of the 20th International Conference on Management of Data (COMAD’14).4.
    [19]
    Trevor Muñoz and Allen H. Renear. 2011. Issues in Humanities Data Curation. White Paper. Available at http://hdl.handle.net/2142/30852.
    [20]
    Christos Nikas, Giorgos Kadilierakis, Pavlos Fafalios, and Yannis Tzitzikas. 2020. Keyword search over RDF: Is a single perspective enough?Big Data and Cognitive Computing 4, 3 (2020), 22.
    [21]
    Dominic Oldman and Diana Tanase. 2018. Reshaping the knowledge graph by connecting researchers, data and practices in ResearchSpace. In Proceedings of the International Semantic Web Conference. 325–340.
    [22]
    Aris M. Ouksel and Amit Sheth. 1999. Semantic interoperability in global information systems. ACM SIGMOD Record 28, 1 (1999), 5–12.
    [23]
    Carole Palmer, Nicholas M. Weber, Allen H. Renear, and Trevor Muñoz. 2013. Foundations of data curation: The pedagogy and practice of “purposeful work” with research data. Archives Journal 3. https://www.ideals.illinois.edu/handle/2142/78099.
    [24]
    Kostas Petrakis, Georgios Samaritakis, Thomas Kalesios, Enric Garcia Domingo, Apostolos Delis, Yannis Tzitzikas, Martin Doerr, and Pavlos Fafalios. 2021. Digitizing, curating and visualizing archival sources of maritime history: the case of ship logbooks of the nineteenth and twentieth centuries. Drassana28 (2021), 60–87.
    [25]
    Paraskevi Pitta, Maria Kanakidou, Nikolaos Mihalopoulos, Sylvia Christodoulaki, Panagiotis D. Dimitriou, Constantin Frangoulis, Antonia Giannakourou, et al. 2017. Saharan dust deposition effects on the microbial food web in the eastern Mediterranean: A study based on a mesocosm experiment. Frontiers in Marine Science 4 (2017), 117.
    [26]
    Erhard Rahm and Hong Hai Do. 2000. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin 23, 4 (2000), 3–13.
    [27]
    Michael Stonebraker, Daniel Bruckner, Ihab F. Ilyas, George Beskales, Mitch Cherniack, Stanley B. Zdonik, Alexander Pagan, and Shan Xu. 2013. Data curation at scale: The Data Tamer system. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR’13).
    [28]
    Yannis Tzitzikas, Nikos Manolis, and Panagiotis Papadakos. 2017. Faceted exploration of RDF/S datasets: A survey. Journal of Intelligent Information Systems 48, 2 (2017), 329–364.
    [29]
    Yannis Tzitzikas, Nikos Minadakis, Yannis Marketakis, Pavlos Fafalios, Carlo Allocca, Michalis Mountantonakis, and Ioanna Zidianaki. 2014. Matware: Constructing and exploiting domain specific warehouses by aggregating semantic data. In Proceedings of the Extended Semantic Web Conference. 721–736.
    [30]
    Ruben Verborgh and Max De Wilde. 2013. Using OpenRefine. Packt Publishing Ltd.

    Cited By

    View all
    • (2024)Unifying Faceted Search and Analytics over RDF Knowledge GraphsKnowledge and Information Systems10.1007/s10115-024-02076-966:7(3921-3958)Online publication date: 24-Mar-2024
    • (2024)Curating the Chinese ancient book catalogs: Leveraging the dual roles of humanities scholars as experts and users in collaborative practiceJournal of the Association for Information Science and Technology10.1002/asi.24894Online publication date: 14-Apr-2024
    • (2023)A Brief Survey of Methods for Analytics over RDF Knowledge GraphsAnalytics10.3390/analytics20100042:1(55-74)Online publication date: 17-Jan-2023
    • Show More Cited By

    Index Terms

    1. FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Journal on Computing and Cultural Heritage
          Journal on Computing and Cultural Heritage   Volume 14, Issue 4
          December 2021
          328 pages
          ISSN:1556-4673
          EISSN:1556-4711
          DOI:10.1145/3476246
          Issue’s Table of Contents
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 16 July 2021
          Accepted: 01 April 2021
          Revised: 01 December 2020
          Received: 01 September 2020
          Published in JOCCH Volume 14, Issue 4

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Data entry
          2. archival research
          3. data curation
          4. digital humanities
          5. semantic interoperability

          Qualifiers

          • Research-article
          • Research
          • Refereed

          Funding Sources

          • European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie
          • Individual Fellowship, Project “ReKnow—Research Documentation, Analysis and Exploration in Empirical and Descriptive Sciences”
          • European Research Council (ERC)

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)59
          • Downloads (Last 6 weeks)3

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Unifying Faceted Search and Analytics over RDF Knowledge GraphsKnowledge and Information Systems10.1007/s10115-024-02076-966:7(3921-3958)Online publication date: 24-Mar-2024
          • (2024)Curating the Chinese ancient book catalogs: Leveraging the dual roles of humanities scholars as experts and users in collaborative practiceJournal of the Association for Information Science and Technology10.1002/asi.24894Online publication date: 14-Apr-2024
          • (2023)A Brief Survey of Methods for Analytics over RDF Knowledge GraphsAnalytics10.3390/analytics20100042:1(55-74)Online publication date: 17-Jan-2023
          • (2023)The SeaLiT Ontology – An Extension of CIDOC-CRM for the Modeling and Integration of Maritime History InformationJournal on Computing and Cultural Heritage 10.1145/358608016:3(1-21)Online publication date: 9-Aug-2023
          • (2023)FastCat Catalogues: Interactive Entity-Based Exploratory Analysis of Archival Documents2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL57899.2023.00035(190-194)Online publication date: Jun-2023
          • (2023)Data Enrichment Toolchain: A Data Linking and Enrichment Platform for Heterogeneous DataIEEE Access10.1109/ACCESS.2023.331770511(103079-103091)Online publication date: 2023
          • (2023)A workflow model for holistic data management and semantic interoperability in quantitative archival researchDigital Scholarship in the Humanities10.1093/llc/fqad01838:3(1049-1066)Online publication date: 6-Apr-2023
          • (2022)CIDOC-CRM and Machine Learning: A Survey and Future ResearchHeritage10.3390/heritage50300845:3(1612-1636)Online publication date: 7-Jul-2022
          • (2022)Collaborative Data Use between Private and Public Stakeholders—A Regional Case StudyData10.3390/data70200207:2(20)Online publication date: 28-Jan-2022
          • (2022)How Your Cultural Dataset is Connected to the Rest Linked Open Data?Trandisciplinary Multispectral Modelling and Cooperation for the Preservation of Cultural Heritage10.1007/978-3-031-20253-7_12(136-148)Online publication date: 24-Nov-2022

          View Options

          Get Access

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media