Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3410566.3410609acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Towards a universal approach for semantic interpretation of spreadsheets data

Published: 25 August 2020 Publication History

Abstract

Spreadsheets are a popular way to represent and structure data and knowledge; in this connection semantic interpretation of spreadsheets data has become an active area of scientific research. In this paper, we propose a new approach for semantic interpretation of data extracted from spreadsheets with arbitrary layouts and styles. Analyzed spreadsheets are presented in the MS Excel format. In particular, our approach includes two stages: analyzing and transforming source spreadsheets to spreadsheets in a relational canonicalized form; annotating canonical spreadsheets by entities from a knowledge graph. At the first stage we use a rule-based approach implemented in the form of a domain-specific language called Cells Rule Language (CRL), and an original form of a canonical table. At the second stage we use an aggregated method for defining similarity between candidate entities and cell values that consists of the sequential application of five metrics and combining ranks obtained by each metric. Algorithms of each stage are implemented in the form of special software: TabbyXL and TabbyLD respectively. DBpedia is used as a knowledge graph. Experimental evaluations of our proposals are obtained for T2Dv2 and Troy200 corpuses, and they demonstrates the applicability of our approach and software for semantic spreadsheet data interpretation. The feature of the approach is its universality due to the use of the language for describing spreadsheets transformation rules, as well as an original canonical form. This feature provides processing large volumes of heterogeneous spreadsheets in various domains. This work is a part of the Tabby research project for software development of recognition, extraction, transformation and interpretation of data from spreadsheet tables with arbitrary layouts and styles.

References

[1]
Michael J. Cafarella, Alon Halevy, Zhe D. Wang, Eugene Wu, and Yang Zhang. 2008. Webtables: exploring the power of tables on the web. In Proceedings of the VLDB Endowment (VLDB'08). Auckland, New Zealand, 538--549.
[2]
Shuo Zhang, and Krisztian Balog. 2020. Web table extraction, retrieval and augmentation: A survey. ACM Transactions on Intelligent Systems and Technology 11, 2 (2020), 1--31.
[3]
Shuo Yang, Ran Wei, and Alexey O. Shigarov. 2018. Semantic interoperability for electronic business through a novel cross-context semantic document exchange approach. In Proceedings of the ACM Symposium on Document Engineering (DocEng'18). Halifax, Canada, 1--10.
[4]
Alexey O. Shigarov, Vasiliy V. Khristyuk, and Andrey A. Mikhailov. 2019. TabbyXL: Software platform for rule-based spreadsheet data extraction and transformation. SoftwareX 10 (2019), 100270.
[5]
TabbyLD https://github.com/tabbydoc/tabbyld
[6]
Benno Kruit, Peter Boncz, Jacopo Urbani. 2019. Extracting novel facts from tables for knowledge graph completion. In Proceedings of the 18th International Semantic Web Conference (ISWC'19). Auckland, New Zealand, 364--381.
[7]
Vasilis Efthymiou, Oktie Hassanzadeh, Mariano Rodriguez-Muro, and Vassilis Christophides. 2017. Matching web tables with knowledge base entities: From entity lookups to entity embeddings. In Proceedings of the 16th International Semantic Web Conference (ISWC'2017). Vienna, Austria, 260--277.
[8]
Basil Ell, Sherzod Hakimov, Philipp Braukmann, Lorenzo Cazzoli, Fabian Kaupmann, Amerigo Mancino, Junaid Altaf Memon, Kai Rother, Abhishek Saini, and Philipp Cimiano. 2017. Towards a large corpus of richly annotated web tables for knowledge base population. In Proceedings of the 5th International workshop on Linked Data for Information Extraction (LD4IE). Vienna, Austria, 1--12.
[9]
Ziqi Zhang. 2017. Effective and efficient semantic table interpretation using TableMiner+. Semantic Web 8, 6 (2017), 921--957.
[10]
Tianxing Wu, Shengjia Yan, Zhixin Piao, Liang Xu, Ruiming Wang, and Guilin Qi. 2016. Entity linking in web tables with multiple linked knowledge bases. In Proceedings of the 6th Joint International Semantic Technology Conference (JIST). Singapore, Singapore, 239--253.
[11]
Mark van Assem, Hajo Rijgersberg, Mari Wigham, and Jan Top. 2010. Converting and annotating quantitative data tables. In Proceedings of the 9th International Semantic Web Conference (ISWC'10). Shanghai, China, 16--36.
[12]
Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Converting and annotating quantitative data tables. In Proceedings of the 36th International Conference on Very Large Data Bases. Singapore, Singapore, 1338--1347.
[13]
Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering semantics of tables on the web. In Proceedings of the 37th International Conference on Very Large Data Bases. Seattle, USA, 528--538.
[14]
Varish Mulwad, Tim Finin, and Anupam Joshi. 2012. A domain independent framework for extracting linked semantic data from tables. Search Computing 7538 (2012), 16--33.
[15]
Jingjing Wang, Haixun Wang, Zhongyuan Wang, and Kenny Q. Zhu. 2012. Understanding tables on the web. In Proceedings of the 31th International Conference on Conceptual Modeling (ER'12). Florence, Italy, 141--155.
[16]
Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. LIEGE: Link entities in web lists with knowledge base. In Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'12). Beijing, China, 1424--1432.
[17]
Dong Deng, Yu Jiang, Guoliang Li, Jian Li, and Cong Yu. 2013. Scalable column concept determination for web tables using large knowledge bases. In Proceedings of the 39th International Conference on Very Large Data Bases. Riva del Garda, Trento, Italy, 1606--1617.
[18]
Emir Muñoz, Aidan Hogan, and Alessandra Mileo. 2014. Using linked data to mine RDF from wikipedia's tables. In Proceedings of the 7th ACM international conference on Web search and data mining. New York, USA, 533--542.
[19]
Chandra S. Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL: Entity linking in web tables. In Proceedings of the 14th International Semantic Web Conference (ISWC'15). Bethlehem, USA, 425--441.
[20]
Ivan Ermilov, and Axel-Cyrille N. Ngomo. 2016. TAIPAN: Automatic property mapping for tabular data. In Proceedings of the 20th International Conference on European Knowledge Acquisition Workshop (EKAW'16). Bologna, Italy, 163--179.
[21]
IDominique Ritze, and Christian Bizer. 2017. Matching web tables to DBpedia - A feature utility study. In Proceedings of the 20th International Conference on Extending Database Technology (EDBT'17). Venice, Italy, 210--221.
[22]
SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching https://www.cs.ox.ac.uk/isg/challenges/sem-tab/
[23]
Alexey O. Shigarov, and Andrey A. Mikhailov. 2017. Rule-based spreadsheet data transformation from arbitrary to relational tables. Information Systems 71 (2017), 123--136.
[24]
Michael J. Cafarella, Alon Halevy, Daisy Z. Wang, Eugene Wu, and Yang Zhang. 2008. Uncovering the relational web. In Proceedings of the Eleventh International Workshop on the Web and Databases (WebDB'08). Vancouver, Canada, 1--6.
[25]
Stanford CoreNLP https://stanfordnlp.github.io/CoreNLP/
[26]
Stanford CoreNLP: Named Entity Recognition https://stanfordnlp.github.io/CoreNLP/ner.html
[27]
AChristian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7, 3 (2009), 154--165.
[28]
T2Dv2 Gold Standard for Matching Web Tables to DBpedia http://webdatacommons.org/webtables/goldstandardV2.html
[29]
TANGO-DocLab web tables from international statistical sites (Troy200) http://tc11.cvc.uab.es/datasets/Troy_200_1

Cited By

View all
  • (2023)Knowledge Graph Engineering Based on Semantic Annotation of TablesComputation10.3390/computation1109017511:9(175)Online publication date: 5-Sep-2023
  • (2022)Extraction of Facts from Web-Tables based on Semantic Interpretation Tabular Data2022 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM57067.2022.9983959(7-17)Online publication date: 23-Sep-2022
  • (2022)Knowledge Graph Augmentation Based on Tabular Data: A Case Study for Industrial Safety InspectionProceedings of the Sixth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’22)10.1007/978-3-031-19620-1_30(314-324)Online publication date: 31-Oct-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '20: Proceedings of the 24th Symposium on International Database Engineering & Applications
August 2020
252 pages
ISBN:9781450375030
DOI:10.1145/3410566
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DBpedia
  2. knowledge graph
  3. linked data
  4. named entity linking
  5. named entity recognition
  6. semantic interpretation
  7. spreadsheet data

Qualifiers

  • Research-article

Funding Sources

Conference

IDEAS 2020

Acceptance Rates

IDEAS '20 Paper Acceptance Rate 27 of 57 submissions, 47%;
Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Knowledge Graph Engineering Based on Semantic Annotation of TablesComputation10.3390/computation1109017511:9(175)Online publication date: 5-Sep-2023
  • (2022)Extraction of Facts from Web-Tables based on Semantic Interpretation Tabular Data2022 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM57067.2022.9983959(7-17)Online publication date: 23-Sep-2022
  • (2022)Knowledge Graph Augmentation Based on Tabular Data: A Case Study for Industrial Safety InspectionProceedings of the Sixth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’22)10.1007/978-3-031-19620-1_30(314-324)Online publication date: 31-Oct-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media