Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Supporting user-defined functions on uncertain data

Published: 01 April 2013 Publication History

Abstract

Uncertain data management has become crucial in many sensing and scientific applications. As user-defined functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this work, we provide a general framework for supporting UDFs on uncertain data. Specifically, we propose a learning approach based on Gaussian processes (GPs) to compute approximate output distributions of a UDF when evaluated on uncertain input, with guaranteed error bounds. We also devise an online algorithm to compute such output distributions, which employs a suite of optimizations to improve accuracy and performance. Our evaluation using both real-world and synthetic functions shows that our proposed GP approach can outperform the state-of-the-art sampling approach with up to two orders of magnitude improvement for a variety of UDFs.

References

[1]
Idl astronomy library. http://idlastro.gsfc.nasa.gov.
[2]
Sloan digital sky survey. http://www.sdss.org.
[3]
R. J. Adler. Some new random field tools for spatial analysis. In Stochastic enviromental research and risk assessment, 2008.
[4]
L. Antova, et al. Fast and simple relational processing of uncertain data. In ICDE, pages 983-992, 2008.
[5]
C. M. Bishop. Pattern recognition and machine learning. Springer-Verlag New York, Inc., 2009.
[6]
S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. In ACM TODS, pages 87-98, 1996.
[7]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4), pages 523-544, 2007.
[8]
M. Denny and M. J. Franklin. Adaptive execution of variable-accuracy functions. In VLDB, pages 547-558, 2006.
[9]
A. Deshpande, et al. Model-driven data acquisition in sensor networks. In VLDB, pages 588-599, 2004.
[10]
G. McLachlan and D. Peel Finite Mixture Models. Wiley-Interscience, 2000.
[11]
A. L. Gibbs and F. E. Su. On choosing and bounding probability metrics. International Statistical Review, 70, pages 419-435, 2002.
[12]
A. Girard, et al. GP priors with uncertain inputs - application to multiple-step ahead time series forecasting. In NIPS, 529-536, 2003.
[13]
J. F. Kurose, et al. An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In AINTEC, pages 1-15, 2006.
[14]
D. T. Nguyen and J. Peters. Local Gaussian process regression for real-time model-based robot control. In Intl. Conf. on Intelligent Robots and Systems (IROS), pages 380-385, 2008.
[15]
A. O'Hagan. Bayesian analysis of computer code outputs: A tutorial. In Reliability Engineering and System Safety, 2006.
[16]
L. Peng, et al. Optimizing probabilistic query processing on continuous uncertain data. PVLDB, 4(11), pages 1169-1180, 2011.
[17]
A. Ranganathan and M. H. Yang. Online sparse matrix Gaussian process regression and vision applications. In ECCV, 468-482, 2008.
[18]
C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. MIT Press, 2009.
[19]
P. Sen, et al. Exploiting shared correlations in probabilistic databases. In VLDB, pages 809-820, 2008.
[20]
S. Singh, et al. Database support for probabilistic attributes and tuples. In ICDE, pages 1053-1061, 2008.
[21]
A. S. Szalay, et al. Designing and mining multi-terabyte astronomy archives: The Sloan digital sky survey. In SIGMOD, pp. 451-462, 2000.
[22]
T. Tran, et al. Probabilistic inference over RFID streams in mobile environments. In ICDE, pages 1096-1107, 2009.
[23]
T. T. L. Tran, et al. Claro: Modeling and processing uncertain data streams. VLDB J., pages 651-676, 2012.
[24]
T. T. L. Tran, et al. Supporting user-defined functions on uncertain data. UMass technical report, 2012. Available at http://www.cs.umass.edu/~ttran/udf_tr.pdf

Cited By

View all
  • (2014)The analytical bootstrapProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2588579(277-288)Online publication date: 18-Jun-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 6, Issue 6
April 2013
144 pages

Publisher

VLDB Endowment

Publication History

Published: 01 April 2013
Published in PVLDB Volume 6, Issue 6

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2014)The analytical bootstrapProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2588579(277-288)Online publication date: 18-Jun-2014

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media