research-article

Annotating Columns with Pre-trained Language Models

Authors:

Yoshihiko Suhara,

Çağatay Demiralp,

Wang-Chiew TanAuthors Info & Claims

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

June 2022

Pages 1493 - 1503

https://doi.org/10.1145/3514221.3517906

Published: 11 June 2022 Publication History

Abstract

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the problem of annotating table columns (i.e., predicting column types and the relationships between columns) using only information from the table itself. We develop a multi-task learning framework (called Doduo) based on pre-trained language models, which takes the entire table as input and predicts column types/relations using a single model. Experimental results show that Doduo establishes new state-of-the-art performance on two benchmarks for the column type prediction and column relation prediction tasks with up to 4.0% and 11.9% improvements, respectively. We report that Doduo can already outperform the previous state-of-the-art performance with a minimal number of tokens, only 8 tokens per column. We release a toolbox (https://github.com/megagonlabs/doduo) and confirm the effectiveness of Doduo on a real-world data science problem through a case study.

References

[1]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, Vol. 5 (2017), 135--146. https://doi.org/10.1162/tacl_a_00051

[2]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247--1250.

[3]

Matteo Cannaviccio, Denilson Barbosa, and Paolo Merialdo. 2018. Towards Annotating Relational Data on the Web with Language Models. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1307--1316. https://doi.org/10.1145/3178876.3186029

Digital Library

[4]

Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1335--1349. https://doi.org/10.1145/3318464.3389742

Digital Library

[5]

Rich Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proceedings of the Tenth International Conference on International Conference on Machine Learning (Amherst, MA, USA) (ICML'93). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 41--48.

[6]

R. Caruana. 2004. Multitask Learning. Machine Learning, Vol. 28 (2004), 41--75.

Digital Library

[7]

Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibá n ez, Emilia Kacprzak, and Paul Groth. 2020. Dataset search: a survey. VLDB J., Vol. 29, 1 (2020), 251--272.

Digital Library

[8]

Jiaoyan Chen, Ernesto Jimé nez-Ruiz, Ian Horrocks, and Charles Sutton. 2019 a. ColNet: Embedding the Semantics of Web Tables for Column Type Prediction. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 29--36. https://doi.org/10.1609/aaai.v33i01.330129

Digital Library

[9]

Jiaoyan Chen, Ernesto Jimé nez-Ruiz, Ian Horrocks, and Charles Sutton. 2019 b. Learning Semantic Annotations for Tabular Data. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019, Sarit Kraus (Ed.). ijcai.org, 2088--2094. https://doi.org/10.24963/ijcai.2019/289

[10]

Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What Does BERT Look at? An Analysis of BERT's Attention. In Proc. BlackBoxNLP '19. 276--286.

[11]

Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The Data Civilizer System. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p44-deng-cidr17.pdf

[12]

Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2020. TURL: Table Understanding through Representation Learning. Proc. VLDB Endow., Vol. 14, 3 (nov 2020), 307--319. https://doi.org/10.14778/3430915.3430921

Digital Library

[13]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423

[14]

Hong-Hai Do and Erhard Rahm. 2002. COMA-a system for flexible combination of schema matching approaches. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 610--621.

[15]

Mor Geva, Ankit Gupta, and Jonathan Berant. 2020. Injecting Numerical Reasoning Skills into Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 946--958. https://doi.org/10.18653/v1/2020.acl-main.89

[16]

GLUE. 2021. GLUE Leaderboard. https://gluebenchmark.com/leaderboard (2021).

[17]

Google. [n.d.]. Google Data Studio. https://datastudio.google.com/

[18]

Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 1923--1933. https://doi.org/10.18653/v1/D17--1206

[19]

Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Eisenschlos. 2020. TaPas: Weakly Supervised Table Parsing via Pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4320--4333. https://doi.org/10.18653/v1/2020.acl-main.398

[20]

Kevin Hu, Snehalkumar 'Neil' S. Gaikwad, Madelon Hulsebos, Michiel A. Bakker, Emanuel Zgraggen, César Hidalgo, Tim Kraska, Guoliang Li, Arvind Satyanarayan, and cCaugatay Demiralp. 2019. VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300892

Digital Library

[21]

Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, cCagatay Demiralp, and César Hidalgo. 2019. Sherlock: A Deep Learning Approach to Semantic Data Type Detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1500--1508. https://doi.org/10.1145/3292500.3330993

Digital Library

[22]

Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig. 2020. How Can We Know What Language Models Know? Transactions of the Association for Computational Linguistics, Vol. 8 (2020), 423--438. https://doi.org/10.1162/tacl_a_00324

[23]

Udayan Khurana and Sainyam Galhotra. 2021. Semantic Concept Annotation for Tabular Data. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 844--853. https://doi.org/10.1145/3459637.3482295

Digital Library

[24]

Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsifodimos. 2021. Valentine: Evaluating Matching Techniques for Dataset Discovery. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 468--479.

[25]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, Vol. 14, 1 (Sep 2020), 50--60. https://doi.org/10.14778/3421424.3421431

Digital Library

[26]

Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow., Vol. 3, 1--2 (Sept. 2010), 1338--1347. https://doi.org/10.14778/1920841.1921005

Digital Library

[27]

Erin Macdonald and Denilson Barbosa. 2020. Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs .Association for Computing Machinery, New York, NY, USA, 2133--2136. https://doi.org/10.1145/3340531.3412164

Digital Library

[28]

Tomá s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2--4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1301.3781

[29]

Renée J. Miller. 2018. Open Data Integration. Proc. VLDB Endow., Vol. 11, 12 (Aug. 2018), 2130--2139. https://doi.org/10.14778/3229863.3240491

Digital Library

[30]

Emir Mu noz, Aidan Hogan, and Alessandra Mileo. 2014. Using Linked Data to Mine RDF from Wikipedia's Tables. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (New York, New York, USA) (WSDM '14). Association for Computing Machinery, New York, NY, USA, 533--542. https://doi.org/10.1145/2556195.2556266

Digital Library

[31]

Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: A Taxonomy of Relational Patterns with Semantic Types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Jeju Island, Korea, 1135--1145. https://www.aclweb.org/anthology/D12--1104

[32]

Phuc Nguyen, Natthawut Kertkeidkachorn, Ryutaro Ichise, and Hideaki Takeda. [n.d.]. MTab4DBpedia: Semantic Annotation for Tabular Data with DBpedia. ( [n.,d.]).

[33]

Masayo Ota, Heiko Müller, Juliana Freire, and Divesh Srivastava. 2020. Data-Driven Domain Discovery for Structured Datasets. Proc. VLDB Endow., Vol. 13, 7 (March 2020), 953--967. https://doi.org/10.14778/3384345.3384346

Digital Library

[34]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Digital Library

[35]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14--1162

[36]

Fabio Petroni, Tim Rockt"aschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. 2019. Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 2463--2473. https://doi.org/10.18653/v1/D19--1250

[37]

Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB J., Vol. 10, 4 (2001), 334--350.

Digital Library

[38]

Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 5418--5426. https://doi.org/10.18653/v1/2020.emnlp-main.437

[39]

Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. arxiv: 1706.05098 [cs.LG]

[40]

Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Bießmann, and Andreas Grafberger. 2018. Automating Large-Scale Data Quality Verification. Proc. VLDB Endow., Vol. 11, 12 (2018), 1781--1794.

Digital Library

[41]

Tableau Software. [n.d.]. Tableau. https://www.tableau.com/

[42]

Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Ça?atay Demiralp, Chen Chen, and Wang-Chiew Tan. 2021. Annotating Columns with Pre-trained Language Models. arxiv: 2104.01785 [cs.DB]

[43]

Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding. In AAAI. 8968--8975.

[44]

Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, and Takeshi Okadome. 2019. Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 281--288. https://doi.org/10.1609/aaai.v33i01.3301281

Digital Library

[45]

Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Sam Madden, and Mourad Ouzzani. 2021. RPT: Relational Pre-Trained Transformer is Almost All You Need towards Democratizing Data Preparation. Proc. VLDB Endow., Vol. 14, 8 (apr 2021), 1254--1261. https://doi.org/10.14778/3457390.3457391

Digital Library

[46]

Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL. 4593--4601.

[47]

Mohamed Trabelsi, Jin Cao, and Jeff Heflin. 2020. Semantic Labeling Using a Deep Contextualized Language Model. arxiv: 2010.16037 [cs.LG]

[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. NIPS '17. 5998--6008.

[49]

Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pacsca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. VLDB Endow., Vol. 4, 9 (June 2011), 528--538. https://doi.org/10.14778/2002938.2002939

Digital Library

[50]

Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. 2019. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5307--5315. https://doi.org/10.18653/v1/D19--1534

[51]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR.

[52]

Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, and Meng Jiang. 2021 b. TCN: Table Convolutional Network for Web Table Interpretation. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW '21). Association for Computing Machinery, New York, NY, USA, 4020--4032. https://doi.org/10.1145/3442381.3450090

Digital Library

[53]

Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, and Dongmei Zhang. 2021 a. TUTA: Tree-Based Transformers for Generally Structured Table Pre-Training. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD '21). Association for Computing Machinery, New York, NY, USA, 1780--1790. https://doi.org/10.1145/3447548.3467434

Digital Library

[54]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6

[55]

Neo Wu, Bradley Green, Xue Ben, and Shawn O'Banion. 2020. Deep transformer models for time series forecasting: The influenza prevalence case. arXiv preprint arXiv:2001.08317 (2020).

[56]

Yongxin Yang and Timothy M. Hospedales. 2017. Trace Norm Regularised Deep Multi-Task Learning. In ICLR '17 Workshop Track .

[57]

Alexander Yates, Michele Banko, Matthew Broadhead, Michael Cafarella, Oren Etzioni, and Stephen Soderland. 2007. TextRunner: Open Information Extraction on the Web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT). Association for Computational Linguistics, Rochester, New York, USA, 25--26. https://www.aclweb.org/anthology/N07--4013

[58]

Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. 2020. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8413--8426. https://doi.org/10.18653/v1/2020.acl-main.745

[59]

ChengXiang Zhai. 2008. Statistical Language Models for Information Retrieval: A Critical Review. Found. Trends Inf. Retr., Vol. 2, 3 (2008), 137--213. https://doi.org/10.1561/1500000008

Digital Library

[60]

Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Caugatay Demiralp, and Wang-Chiew Tan. 2020. Sato: Contextual Semantic Type Detection in Tables. Proc. VLDB Endow., Vol. 13, 12 (July 2020), 1835--1848. https://doi.org/10.14778/3407790.3407793

Digital Library

[61]

Meihui Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M Procopiuc, and Divesh Srivastava. 2011. Automatic discovery of attributes in relational databases. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 109--120.

Digital Library

Cited By

Kayali MLykov AFountalis IVasiloglou NOlteanu DSuciu D(2024)Chorus: Foundation Models for Unified Data Discovery and ExplorationProceedings of the VLDB Endowment10.14778/3659437.365946117:8(2104-2114)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659461
Papicchio SPapotti PCagliero L(2024)Evaluating Ambiguous Questions in Semantic Parsing2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00050(338-342)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00050
Hoseini STheissen-Lipp JQuix C(2024)A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakesJournal of Web Semantics10.1016/j.websem.2024.10081981(100819)Online publication date: Jul-2024
https://doi.org/10.1016/j.websem.2024.100819
Show More Cited By

Index Terms

Annotating Columns with Pre-trained Language Models
1. Information systems
  1. Data management systems
    1. Information integration

Recommendations

Detecting "dense" columns in interior point methods for linear programs

During the iterations of interior point methods symmetric indefinite systems are decomposed by LD L ^T factorization. This step can be performed in a special way where the symmetric indefinite system is transformed to a positive definite one, ...
Read More
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method

A new method for sparse LU factorization is presented that combines a column pre-ordering strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fill-...
Read More
Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language Model
WWW '22: Companion Proceedings of the Web Conference 2022

Row completion is the task of augmenting a given table of text and numbers with additional, relevant rows. The task divides into two steps: subject suggestion, the task of populating the main column; and gap filling, the task of populating the ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

June 2022

2597 pages

ISBN:9781450392495

DOI:10.1145/3514221

General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '22

Sponsor:

SIGMOD

SIGMOD/PODS '22: International Conference on Management of Data

June 12 - 17, 2022

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
1,030
Total Downloads

Downloads (Last 12 months)391
Downloads (Last 6 weeks)32

Other Metrics

View Author Metrics

Citations

Cited By

Kayali MLykov AFountalis IVasiloglou NOlteanu DSuciu D(2024)Chorus: Foundation Models for Unified Data Discovery and ExplorationProceedings of the VLDB Endowment10.14778/3659437.365946117:8(2104-2114)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659461
Papicchio SPapotti PCagliero L(2024)Evaluating Ambiguous Questions in Semantic Parsing2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00050(338-342)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00050
Hoseini STheissen-Lipp JQuix C(2024)A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakesJournal of Web Semantics10.1016/j.websem.2024.10081981(100819)Online publication date: Jul-2024
https://doi.org/10.1016/j.websem.2024.100819
Wu JYe CZhi HJiang S(2023)Column-Type Prediction for Web Tables Powered by Knowledge Base and TextMathematics10.3390/math1103056011:3(560)Online publication date: 20-Jan-2023
https://doi.org/10.3390/math11030560
Dorodnykh NYurin A(2023)Knowledge Graph Engineering Based on Semantic Annotation of TablesComputation10.3390/computation1109017511:9(175)Online publication date: 5-Sep-2023
https://doi.org/10.3390/computation11090175
Dong YXiao CNozawa TEnomoto MOyamada M(2023)DeepJoin: Joinable Table Discovery with Pre-Trained Language ModelsProceedings of the VLDB Endowment10.14778/3603581.360358716:10(2458-2470)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603587
Fan GWang JLi YZhang DMiller R(2023)Semantics-Aware Dataset Discovery from Data Lakes with Contextualized Column-Based Representation LearningProceedings of the VLDB Endowment10.14778/3587136.358714616:7(1726-1739)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.14778/3587136.3587146
Sun YXin HChen L(2023)RECA: Related Tables Enhanced Column Semantic Type Annotation FrameworkProceedings of the VLDB Endowment10.14778/3583140.358314916:6(1319-1331)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.14778/3583140.3583149
Langenecker SSturm CSchalles CBinnig C(2023)Steered Training Data Generation for Learned Semantic Type DetectionProceedings of the ACM on Management of Data10.1145/35897861:2(1-25)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589786
Khatiwada AFan GShraga RChen ZGatterbauer WMiller RRiedewald M(2023)SANTOS: Relationship-based Semantic Table Union SearchProceedings of the ACM on Management of Data10.1145/35886891:1(1-25)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588689
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents