Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3517906acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Annotating Columns with Pre-trained Language Models

Published: 11 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the problem of annotating table columns (i.e., predicting column types and the relationships between columns) using only information from the table itself. We develop a multi-task learning framework (called Doduo) based on pre-trained language models, which takes the entire table as input and predicts column types/relations using a single model. Experimental results show that Doduo establishes new state-of-the-art performance on two benchmarks for the column type prediction and column relation prediction tasks with up to 4.0% and 11.9% improvements, respectively. We report that Doduo can already outperform the previous state-of-the-art performance with a minimal number of tokens, only 8 tokens per column. We release a toolbox (https://github.com/megagonlabs/doduo) and confirm the effectiveness of Doduo on a real-world data science problem through a case study.

    References

    [1]
    Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, Vol. 5 (2017), 135--146. https://doi.org/10.1162/tacl_a_00051
    [2]
    Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247--1250.
    [3]
    Matteo Cannaviccio, Denilson Barbosa, and Paolo Merialdo. 2018. Towards Annotating Relational Data on the Web with Language Models. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1307--1316. https://doi.org/10.1145/3178876.3186029
    [4]
    Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1335--1349. https://doi.org/10.1145/3318464.3389742
    [5]
    Rich Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proceedings of the Tenth International Conference on International Conference on Machine Learning (Amherst, MA, USA) (ICML'93). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 41--48.
    [6]
    R. Caruana. 2004. Multitask Learning. Machine Learning, Vol. 28 (2004), 41--75.
    [7]
    Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibá n ez, Emilia Kacprzak, and Paul Groth. 2020. Dataset search: a survey. VLDB J., Vol. 29, 1 (2020), 251--272.
    [8]
    Jiaoyan Chen, Ernesto Jimé nez-Ruiz, Ian Horrocks, and Charles Sutton. 2019 a. ColNet: Embedding the Semantics of Web Tables for Column Type Prediction. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 29--36. https://doi.org/10.1609/aaai.v33i01.330129
    [9]
    Jiaoyan Chen, Ernesto Jimé nez-Ruiz, Ian Horrocks, and Charles Sutton. 2019 b. Learning Semantic Annotations for Tabular Data. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019, Sarit Kraus (Ed.). ijcai.org, 2088--2094. https://doi.org/10.24963/ijcai.2019/289
    [10]
    Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What Does BERT Look at? An Analysis of BERT's Attention. In Proc. BlackBoxNLP '19. 276--286.
    [11]
    Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The Data Civilizer System. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p44-deng-cidr17.pdf
    [12]
    Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2020. TURL: Table Understanding through Representation Learning. Proc. VLDB Endow., Vol. 14, 3 (nov 2020), 307--319. https://doi.org/10.14778/3430915.3430921
    [13]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423
    [14]
    Hong-Hai Do and Erhard Rahm. 2002. COMA-a system for flexible combination of schema matching approaches. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 610--621.
    [15]
    Mor Geva, Ankit Gupta, and Jonathan Berant. 2020. Injecting Numerical Reasoning Skills into Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 946--958. https://doi.org/10.18653/v1/2020.acl-main.89
    [16]
    GLUE. 2021. GLUE Leaderboard. https://gluebenchmark.com/leaderboard (2021).
    [17]
    Google. [n.d.]. Google Data Studio. https://datastudio.google.com/
    [18]
    Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 1923--1933. https://doi.org/10.18653/v1/D17--1206
    [19]
    Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Eisenschlos. 2020. TaPas: Weakly Supervised Table Parsing via Pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4320--4333. https://doi.org/10.18653/v1/2020.acl-main.398
    [20]
    Kevin Hu, Snehalkumar 'Neil' S. Gaikwad, Madelon Hulsebos, Michiel A. Bakker, Emanuel Zgraggen, César Hidalgo, Tim Kraska, Guoliang Li, Arvind Satyanarayan, and cCaugatay Demiralp. 2019. VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300892
    [21]
    Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, cCagatay Demiralp, and César Hidalgo. 2019. Sherlock: A Deep Learning Approach to Semantic Data Type Detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1500--1508. https://doi.org/10.1145/3292500.3330993
    [22]
    Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig. 2020. How Can We Know What Language Models Know? Transactions of the Association for Computational Linguistics, Vol. 8 (2020), 423--438. https://doi.org/10.1162/tacl_a_00324
    [23]
    Udayan Khurana and Sainyam Galhotra. 2021. Semantic Concept Annotation for Tabular Data. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 844--853. https://doi.org/10.1145/3459637.3482295
    [24]
    Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsifodimos. 2021. Valentine: Evaluating Matching Techniques for Dataset Discovery. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 468--479.
    [25]
    Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, Vol. 14, 1 (Sep 2020), 50--60. https://doi.org/10.14778/3421424.3421431
    [26]
    Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow., Vol. 3, 1--2 (Sept. 2010), 1338--1347. https://doi.org/10.14778/1920841.1921005
    [27]
    Erin Macdonald and Denilson Barbosa. 2020. Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs .Association for Computing Machinery, New York, NY, USA, 2133--2136. https://doi.org/10.1145/3340531.3412164
    [28]
    Tomá s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2--4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1301.3781
    [29]
    Renée J. Miller. 2018. Open Data Integration. Proc. VLDB Endow., Vol. 11, 12 (Aug. 2018), 2130--2139. https://doi.org/10.14778/3229863.3240491
    [30]
    Emir Mu noz, Aidan Hogan, and Alessandra Mileo. 2014. Using Linked Data to Mine RDF from Wikipedia's Tables. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (New York, New York, USA) (WSDM '14). Association for Computing Machinery, New York, NY, USA, 533--542. https://doi.org/10.1145/2556195.2556266
    [31]
    Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: A Taxonomy of Relational Patterns with Semantic Types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Jeju Island, Korea, 1135--1145. https://www.aclweb.org/anthology/D12--1104
    [32]
    Phuc Nguyen, Natthawut Kertkeidkachorn, Ryutaro Ichise, and Hideaki Takeda. [n.d.]. MTab4DBpedia: Semantic Annotation for Tabular Data with DBpedia. ( [n.,d.]).
    [33]
    Masayo Ota, Heiko Müller, Juliana Freire, and Divesh Srivastava. 2020. Data-Driven Domain Discovery for Structured Datasets. Proc. VLDB Endow., Vol. 13, 7 (March 2020), 953--967. https://doi.org/10.14778/3384345.3384346
    [34]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
    [35]
    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14--1162
    [36]
    Fabio Petroni, Tim Rockt"aschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. 2019. Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 2463--2473. https://doi.org/10.18653/v1/D19--1250
    [37]
    Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB J., Vol. 10, 4 (2001), 334--350.
    [38]
    Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 5418--5426. https://doi.org/10.18653/v1/2020.emnlp-main.437
    [39]
    Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. arxiv: 1706.05098 [cs.LG]
    [40]
    Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Bießmann, and Andreas Grafberger. 2018. Automating Large-Scale Data Quality Verification. Proc. VLDB Endow., Vol. 11, 12 (2018), 1781--1794.
    [41]
    Tableau Software. [n.d.]. Tableau. https://www.tableau.com/
    [42]
    Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Ça?atay Demiralp, Chen Chen, and Wang-Chiew Tan. 2021. Annotating Columns with Pre-trained Language Models. arxiv: 2104.01785 [cs.DB]
    [43]
    Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding. In AAAI. 8968--8975.
    [44]
    Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, and Takeshi Okadome. 2019. Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 281--288. https://doi.org/10.1609/aaai.v33i01.3301281
    [45]
    Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Sam Madden, and Mourad Ouzzani. 2021. RPT: Relational Pre-Trained Transformer is Almost All You Need towards Democratizing Data Preparation. Proc. VLDB Endow., Vol. 14, 8 (apr 2021), 1254--1261. https://doi.org/10.14778/3457390.3457391
    [46]
    Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL. 4593--4601.
    [47]
    Mohamed Trabelsi, Jin Cao, and Jeff Heflin. 2020. Semantic Labeling Using a Deep Contextualized Language Model. arxiv: 2010.16037 [cs.LG]
    [48]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. NIPS '17. 5998--6008.
    [49]
    Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pacsca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. VLDB Endow., Vol. 4, 9 (June 2011), 528--538. https://doi.org/10.14778/2002938.2002939
    [50]
    Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. 2019. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5307--5315. https://doi.org/10.18653/v1/D19--1534
    [51]
    Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR.
    [52]
    Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, and Meng Jiang. 2021 b. TCN: Table Convolutional Network for Web Table Interpretation. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW '21). Association for Computing Machinery, New York, NY, USA, 4020--4032. https://doi.org/10.1145/3442381.3450090
    [53]
    Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, and Dongmei Zhang. 2021 a. TUTA: Tree-Based Transformers for Generally Structured Table Pre-Training. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD '21). Association for Computing Machinery, New York, NY, USA, 1780--1790. https://doi.org/10.1145/3447548.3467434
    [54]
    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
    [55]
    Neo Wu, Bradley Green, Xue Ben, and Shawn O'Banion. 2020. Deep transformer models for time series forecasting: The influenza prevalence case. arXiv preprint arXiv:2001.08317 (2020).
    [56]
    Yongxin Yang and Timothy M. Hospedales. 2017. Trace Norm Regularised Deep Multi-Task Learning. In ICLR '17 Workshop Track .
    [57]
    Alexander Yates, Michele Banko, Matthew Broadhead, Michael Cafarella, Oren Etzioni, and Stephen Soderland. 2007. TextRunner: Open Information Extraction on the Web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT). Association for Computational Linguistics, Rochester, New York, USA, 25--26. https://www.aclweb.org/anthology/N07--4013
    [58]
    Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. 2020. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8413--8426. https://doi.org/10.18653/v1/2020.acl-main.745
    [59]
    ChengXiang Zhai. 2008. Statistical Language Models for Information Retrieval: A Critical Review. Found. Trends Inf. Retr., Vol. 2, 3 (2008), 137--213. https://doi.org/10.1561/1500000008
    [60]
    Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Caugatay Demiralp, and Wang-Chiew Tan. 2020. Sato: Contextual Semantic Type Detection in Tables. Proc. VLDB Endow., Vol. 13, 12 (July 2020), 1835--1848. https://doi.org/10.14778/3407790.3407793
    [61]
    Meihui Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M Procopiuc, and Divesh Srivastava. 2011. Automatic discovery of attributes in relational databases. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 109--120.

    Cited By

    View all
    • (2024)Chorus: Foundation Models for Unified Data Discovery and ExplorationProceedings of the VLDB Endowment10.14778/3659437.365946117:8(2104-2114)Online publication date: 31-May-2024
    • (2024)Evaluating Ambiguous Questions in Semantic Parsing2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00050(338-342)Online publication date: 13-May-2024
    • (2024)A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakesJournal of Web Semantics10.1016/j.websem.2024.10081981(100819)Online publication date: Jul-2024
    • Show More Cited By

    Index Terms

    1. Annotating Columns with Pre-trained Language Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
      June 2022
      2597 pages
      ISBN:9781450392495
      DOI:10.1145/3514221
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. language models
      2. multi-task learning
      3. table understanding

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)391
      • Downloads (Last 6 weeks)32

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Chorus: Foundation Models for Unified Data Discovery and ExplorationProceedings of the VLDB Endowment10.14778/3659437.365946117:8(2104-2114)Online publication date: 31-May-2024
      • (2024)Evaluating Ambiguous Questions in Semantic Parsing2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00050(338-342)Online publication date: 13-May-2024
      • (2024)A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakesJournal of Web Semantics10.1016/j.websem.2024.10081981(100819)Online publication date: Jul-2024
      • (2023)Column-Type Prediction for Web Tables Powered by Knowledge Base and TextMathematics10.3390/math1103056011:3(560)Online publication date: 20-Jan-2023
      • (2023)Knowledge Graph Engineering Based on Semantic Annotation of TablesComputation10.3390/computation1109017511:9(175)Online publication date: 5-Sep-2023
      • (2023)DeepJoin: Joinable Table Discovery with Pre-Trained Language ModelsProceedings of the VLDB Endowment10.14778/3603581.360358716:10(2458-2470)Online publication date: 1-Jun-2023
      • (2023)Semantics-Aware Dataset Discovery from Data Lakes with Contextualized Column-Based Representation LearningProceedings of the VLDB Endowment10.14778/3587136.358714616:7(1726-1739)Online publication date: 1-Mar-2023
      • (2023)RECA: Related Tables Enhanced Column Semantic Type Annotation FrameworkProceedings of the VLDB Endowment10.14778/3583140.358314916:6(1319-1331)Online publication date: 1-Feb-2023
      • (2023)Steered Training Data Generation for Learned Semantic Type DetectionProceedings of the ACM on Management of Data10.1145/35897861:2(1-25)Online publication date: 20-Jun-2023
      • (2023)SANTOS: Relationship-based Semantic Table Union SearchProceedings of the ACM on Management of Data10.1145/35886891:1(1-25)Online publication date: 30-May-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media