Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Saad, Feras; Casarsa, Leonardo; Mansinghka, Vikash

Computer Science > Artificial Intelligence

arXiv:1704.01087 (cs)

[Submitted on 4 Apr 2017]

Title:Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Authors:Feras Saad, Leonardo Casarsa, Vikash Mansinghka

View PDF

Abstract:Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database search operators with an information theoretic ranking function called predictive relevance. Predictive relevance can be calculated by a fast sparse matrix algorithm based on posterior samples from CrossCat, a nonparametric Bayesian model for high-dimensional, heterogeneously-typed data tables. The result is a flexible search technique that applies to a broad class of information retrieval problems, which we integrate into BayesDB, a probabilistic programming platform for probabilistic data analysis. This paper demonstrates applications to databases of US colleges, global macroeconomic indicators of public health, and classic cars. We found that human evaluators often prefer the results from probabilistic search to results from a standard baseline.

Subjects:	Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1704.01087 [cs.AI]
	(or arXiv:1704.01087v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1704.01087

Submission history

From: Feras Saad [view email]
[v1] Tue, 4 Apr 2017 16:18:07 UTC (6,777 KB)

Computer Science > Artificial Intelligence

Title:Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators