Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3583780.3615046acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

SAND: Semantic Annotation of Numeric Data in Web Tables

Published: 21 October 2023 Publication History

Abstract

A large portion of quantitative information about entities is expressed as Web tables, and these tables often lack proper schema and annotation, which introduces challenges for the purpose of querying and analysis. In this paper, we introduce SAND, a novel approach for annotating numeric columns of Web tables by linking them to properties in a knowledge graph. Our approach relies only on the semantic information readily available in knowledge graphs and not on contextual information that can be missing or labelled data which may be difficult to obtain. We show that our approach can reliably detect both semantic types (e.g., height) and unit labels (e.g., Centimeter) when the semantic type is present in the knowledge graph. Our evaluation on real-world web tables shows that our method outperforms by a large margin, in terms of accuracy, some of the state-of-the-art approaches on semantic labeling and unit detection.

References

[1]
.: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching. https://www.cs.ox.ac.uk/isg/challenges/sem-tab/ (2019--2022), [Online; accessed 1-Oct-2022]
[2]
Alobaid, A., Kacprzak, E., Corcho, Ó.: Typology-based semantic labeling of numeric tabular data. Semantic Web 12(1), 5--20 (2021). https://doi.org/10.3233/SW-200397, https://doi.org/10.3233/SW-200397
[3]
Bhagavatula, C.S., Noraset, T., Downey, D.: Methods for exploring and mining tables on wikipedia. In: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics. p. 18--26. IDEA '13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2501511.2501516, https://doi.org/10.1145/2501511.2501516
[4]
Bhagavatula, C.S., Noraset, T., Downey, D.: Tabel: Entity linking in web tables. In: International Semantic Web Conference. pp. 425--441. Springer (2015)
[5]
Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proceedings of the VLDB Endowment 1(1), 538--549 (2008)
[6]
Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: Colnet: Embedding the semantics of web tables for column type prediction. Proceedings of the AAAI Conference on Artificial Intelligence 33(01), 29--36 (Jul 2019). https://doi.org/10.1609/aaai.v33i01.330129, https://ojs.aaai.org/index.php/AAAI/article/view/3765
[7]
Chen, W., Wang, H., Chen, J., Zhang, Y., Wang, H., Li, S., Zhou, X., Wang, W.Y.: Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164 (2019)
[8]
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP- CoNLL). pp. 708--716 (2007)
[9]
Cutrona, V., Chen, J., Efthymiou, V., Hassanzadeh, O., Jiménez-Ruiz, E., Sequeda, J., Srinivas, K., Abdelmageed, N., Hulsebos, M., Oliveira, D., et al.: Results of semtab 2021. Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 3103, 1--12 (2022)
[10]
Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: From entity lookups to entity embeddings. In: d'Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. (eds.) The Semantic Web -- ISWC 2017. pp. 260--277. Springer International Publishing, Cham (2017)
[11]
Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain- independent information extraction from web tables. In: Proceedings of the 16th international conference on World Wide Web. pp. 71--80 (2007)
[12]
Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by successive approximation. Mathematics of Operations Research 15(3), 430--466 (1990)
[13]
Ho, V.T., Pal, K., Razniewski, S., Berberich, K., Weikum, G.: Extracting contextualized quantity facts from web tables. In: Proceedings of the Web Conference 2021. pp. 4033--4042 (2021)
[14]
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. pp. 782--792 (2011)
[15]
Ibrahim, Y., Riedewald, M., Weikum, G., Zeinalipour-Yazti, D.: Bridging quantities in tables and text. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE). pp. 1010--1021. IEEE (2019)
[16]
Kacprzak, E., Giménez-García, J.M., Piscopo, A., Koesten, L., Ibáñez, L.D., Tennison, J., Simperl, E.: Making sense of numerical data - semantic labelling of web tables. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds.) Knowledge Engineering and Knowledge Management. pp. 163--178. Springer International Publishing, Cham (2018)
[17]
Kruit, B., Boncz, P., Urbani, J.: Takco: A platform for extracting novel facts from tables. In: Companion Proceedings of the Web Conference 2021. pp. 705--707 (2021)
[18]
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167--195 (2015), http://dblp.uni-trier.de/db/journals/semweb/semweb6.html#LehmannIJJKMHMK15
[19]
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3(1--2), 1338--1347 (Sep 2010). https://doi.org/10.14778/1920841.1921005, https://doi.org/10.14778/1920841.1921005
[20]
Nan, L., Hsieh, C., Mao, Z., Lin, X.V., Verma, N., Zhang, R., Krysciski, W., Schoelkopf, H., Kong, R., Tang, X., et al.: Fetaqa: free-form table question answering. Transactions of the Association for Computational Linguistics 10, 35--49 (2022)
[21]
Nargesian, F., Zhu, E., Pu, K.Q., Miller, R.J.: Table union search on open data. Proceedings of the VLDB Endowment 11(7), 813--825 (2018)
[22]
Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) The Semantic Web -- ISWC 2016. pp. 428--445. Springer International Publishing, Cham (2016)
[23]
Nguyen, P., Nguyen, K., Ichise, R., Takeda, H.: Embnum: Effective, efficient, and robust semantic labeling for numerical values. New Gener. Comput. 37(4), 393--427 (2019). https://doi.org/10.1007/s00354-019-00076-w
[24]
Nishida, K., Sadamitsu, K., Higashinaka, R., Matsuo, Y.: Understanding the semantic structures of tables with a hybrid deep neural network architecture. Proceedings of the AAAI Conference on Artificial Intelligence 31 (Feb 2017)
[25]
Orlin, J.B.: A polynomial time primal network simplex algorithm for minimum cost flows. Math. Program. 77, 109--129 (1997). https://doi.org/10.1007/BF02614365, https://doi.org/10.1007/BF02614365
[26]
Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic labeling: A domain-independent approach. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) The Semantic Web -- ISWC 2016. pp. 446--462. Springer International Publishing, Cham (2016)
[27]
Ramnandan, S., Mittal, A., Knoblock, C.A., Szekely, P.: Assigning semantic labels to data sources. In: Gandon, F., Sabou, M., Sack, H., d'Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) The Semantic Web. Latest Advances and New Domains. pp. 403--417. Springer International Publishing, Cham (2015)
[28]
Ritze, D., Lehmberg, O., Bizer, C.: T2dv2 gold standard for matching web tables to dbpedia (2015)
[29]
Sarawagi, S., Chakrabarti, S.: Open-domain quantity queries on web tables: Annotation, response, and consensus models. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 711--720. KDD '14, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623749, https://doi.org/10.1145/2623330.2623749
[30]
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27(2), 443--460 (2014)
[31]
Sun, H., Ma, H., He, X., Yih, W.t., Su, Y., Yan, X.: Table cell search for question answering. In: Proceedings of the 25th International Conference on World Wide Web. pp. 771--782 (2016)
[32]
Takeoka, K., Oyamada, M., Nakadai, S., Okadome, T.: Meimei: An efficient probabilistic approach for semantically annotating tables. Proceedings of the AAAI Conference on Artificial Intelligence 33(01), 281--288 (Jul 2019). https://doi.org/10.1609/aaai.v33i01.3301281, https://ojs.aaai.org/index.php/AAAI/article/view/3796
[33]
Zhang, D., Suhara, Y., Li, J., Hulsebos, M., Demiralp, Ç., Tan, W.: Sato: Contextual semantic type detection in tables. Proc. VLDB Endow. 13(11), 1835--1848 (2020)
[34]
Zhang, Z.: Effective and efficient semantic table interpretation using tableminer. Semantic Web 8(6), 921--957 (2017). https://doi.org/10.3233/SW-160242, https://doi.org/10.3233/SW-160242

Cited By

View all
  • (2024)Protecting privacy in the age of big data: exploring data linking methods for quasi-identifier selectionInternational Journal of Information Security10.1007/s10207-024-00944-724:1Online publication date: 3-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. column annotation
  2. numeric data
  3. semantic annotation

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)76
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Protecting privacy in the age of big data: exploring data linking methods for quasi-identifier selectionInternational Journal of Information Security10.1007/s10207-024-00944-724:1Online publication date: 3-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media