Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3076246.3076251acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Using Word Embedding to Enable Semantic Queries in Relational Databases

Published: 14 May 2017 Publication History

Abstract

We investigate opportunities for exploiting Artificial Intelligence (AI) techniques for enhancing capabilities of relational databases. In particular, we explore applications of Natural Language Processing (NLP) techniques to endow relational databases with capabilities that were very hard to realize in practice. We apply an unsupervised neural-network based NLP idea, Distributed Representation via Word Embedding, to extract latent information from a relational table. The word embedding model is based on meaningful textual view of a relational database and captures inter-/intra-attribute relationships between database tokens. For each database token, the model includes a vector that encodes these contextual semantic relationships. These vectors enable processing a new class of SQL-based business intelligence queries called cognitive intelligence (CI) queries that use the generated vectors to analyze contextual semantic relationships between database tokens. The cognitive capabilities enable complex queries such as semantic matching, reasoning queries such as analogies, predictive queries using entities not present in a database, and using knowledge from external sources.

References

[1]
Apache Foundation. 2017. Apache Spark: A fast and general engine for large-scale data processing. (2017). Release 2.1.
[2]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003), 1137-1155. http://www.jmlr.org/papers/v3/bengio03a.html
[3]
Rajesh Bordawekar and Oded Shmueli. 2016. Enabling Cognitive Intelligence Queries in Relational Databases using Low-dimensional Word Embeddings. CoRR abs/1603.07185 (2016). http://arxiv.org/abs/1603.07185
[4]
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 1 (1997), 29--53.
[5]
Joseph M. Hellerstein, Christopher Re, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib analytics library: or MAD skills, the SQL. Proc. VLDB Endow. 5, 12 (August 2012).
[6]
Omer Levy and Yoav Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. In Eighteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics.
[7]
Tomas Mikolov. 2013. word2vec: Tool for computing continuous distributed representations of words. (2013). github.com/tmikolov/word2vec.
[8]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In 27th Annual Conference on Neural Information Processing Systems 2013. 3111--3119. http://papers.nips.cc/paper/ 5021-distributed-representations-of-words-and-phrases-and-their-compositionality
[9]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532--1543. http://aclweb.org/anthology/D/D14/D14-1162.pdf
[10]
D. E. Rumelhart and A. A. Abrahamson. 1973. A model for analogical reasoning. Cognitive Psychology 5, 1 (1973).
[11]
R. J. Sternberg and M. K. Gardner. 1983. Unities in inductive reasoning. Journal of Experimental Psychology: General 112, 1 (1983).
[12]
Daisy Zhe Wang, Yang Chen, Christan Grant, and Kun Li. 2014. Efficient In-Database Analytics with Graphical Models. IEEE Data Engineering Bulletin (2014).

Cited By

View all
  • (2024)Table Embedding Models Based on Contrastive Learning for Improved Cardinality EstimationWeb and Big Data10.1007/978-981-97-7238-4_31(494-511)Online publication date: 28-Aug-2024
  • (2023)Regularized Pairwise Relationship based Analytics for Structured DataProceedings of the ACM on Management of Data10.1145/35889361:1(1-27)Online publication date: 30-May-2023
  • (2023)Selecting Walk Schemes for Database EmbeddingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615052(1677-1686)Online publication date: 21-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DEEM'17: Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning
May 2017
36 pages
ISBN:9781450350266
DOI:10.1145/3076246
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

SIGMOD/PODS'17

Acceptance Rates

Overall Acceptance Rate 44 of 67 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Table Embedding Models Based on Contrastive Learning for Improved Cardinality EstimationWeb and Big Data10.1007/978-981-97-7238-4_31(494-511)Online publication date: 28-Aug-2024
  • (2023)Regularized Pairwise Relationship based Analytics for Structured DataProceedings of the ACM on Management of Data10.1145/35889361:1(1-27)Online publication date: 30-May-2023
  • (2023)Selecting Walk Schemes for Database EmbeddingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615052(1677-1686)Online publication date: 21-Oct-2023
  • (2023)Auto-WLM: Machine Learning Enhanced Workload Management in Amazon RedshiftCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589677(225-237)Online publication date: 4-Jun-2023
  • (2023)Stable Tuple Embeddings for Dynamic Databases2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00103(1286-1299)Online publication date: Apr-2023
  • (2023)Relational data embeddings for feature enrichment with background informationMachine Learning10.1007/s10994-022-06277-7112:2(687-720)Online publication date: 11-Jan-2023
  • (2022)Leva: Boosting Machine Learning Performance with Relational Embedding Data AugmentationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517891(1504-1517)Online publication date: 10-Jun-2022
  • (2022)AutoSrh: An Embedding Dimensionality Search Framework for Tabular Data PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3186387(1-14)Online publication date: 2022
  • (2022)Neural Network Accelerated Tuple Search For Relational Data2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI54793.2022.00029(81-82)Online publication date: Aug-2022
  • (2022)SQL ChatBot – using Context Free Grammar2022 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)10.1109/IEMTRONICS55184.2022.9795814(1-7)Online publication date: 1-Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media