Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Homerun: scalable sparse-spectrum reconstruction of aggregated historical data

Published: 01 July 2018 Publication History

Abstract

Recovering a time sequence of events from multiple aggregated and possibly overlapping reports is a major challenge in historical data fusion. The goal is to reconstruct a higher resolution event sequence from a mixture of lower resolution samples as accurately as possible. For example, we may aim to disaggregate overlapping monthly counts of people infected with measles into weekly counts. In this paper, we propose a novel data disaggregation method, called HomeRun, that exploits an alternative representation of the sequence and finds the spectrum of the target sequence. More specifically, we formulate the problem as so-called basis pursuit using the Discrete Cosine Transform (DCT) as a sparsifying dictionary and impose non-negativity and smoothness constraints. HomeRun utilizes the energy compaction feature of the DCT by finding the sparsest spectral representation of the target sequence that contains the largest (most important) coefficients. We leverage the Alternating Direction Method of Multipliers to solve the resulting optimization problem with scalable and memory efficient steps. Experiments using real epidemiological data show that our method considerably outperforms the state-of-the-art techniques, especially when the DCT of the sequence has a high degree of energy compaction.

References

[1]
N. Ahmed, T. Natarajan, and K. R. Rao. Discrete cosine transform. IEEE transactions on Computers, 100(1):90--93, 1974.
[2]
E. Amaldi and V. Kann. The complexity and approximability of finding maximum feasible subsystems of linear relations. Theoretical computer science, 147(1--2):181--210, 1995.
[3]
J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys (CSUR), 41(1):1--41, Jan. 2009.
[4]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1--122, 2011.
[5]
E. J. Candès. Compressive sampling. In Proceedings of the international congress of mathematicians, volume 3, pages 1433--1452. Madrid, Spain, 2006.
[6]
E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on pure and applied mathematics, 59(8):1207--1223, 2006.
[7]
E. J. Candès and M. B. Wakin. An introduction to compressive sampling. IEEE signal processing magazine, 25(2):21--30, 2008.
[8]
S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM review, 43(1):129--159, 2001.
[9]
E. Cohen and H. Kaplan. Bottom-k sketches: Better and more efficient estimation of aggregates. In ACM SIGMETRICS Performance Evaluation Review, volume 35, pages 353--354. ACM, 2007.
[10]
G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1--3):1--294, 2012.
[11]
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1(2):1542--1552, 2008.
[12]
X. L. Dong and F. Naumann. Data fusion: resolving data conflicts for integration. PVLDB, 2(2):1654--1655, 2009.
[13]
D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proceedings of the National Academy of Sciences, 100(5):2197--2202, 2003.
[14]
M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736--3745, 2006.
[15]
C. Faloutsos, H. V. Jagadish, and N. Sidiropoulos. Recovering information from summary data. PVLDB, 1(1):36--45, 1997.
[16]
O. G. Guleryuz. Weighted overcomplete denoising. In Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on, volume 2, pages 1992--1996. IEEE, 2003.
[17]
M.-J. Hsieh, W.-G. Teng, M.-S. Chen, and P. S. Yu. Dawn: an efficient framework of dct for data with error estimation. The VLDB Journal---The International Journal on Very Large Data Bases, 17(4):683--702, 2008.
[18]
K. Huang, N. D. Sidiropoulos, and A. P. Liavas. A flexible and efficient algorithmic framework for constrained matrix and tensor factorization. IEEE Transactions on Signal Processing, 64(19):5052--5065, 2016.
[19]
S. A. Khayam. The discrete cosine transform (dct): Theory and application. department of electrical and computing engineering, 2003.
[20]
J.-H. Lee, D.-H. Kim, and C.-W. Chung. Multi-dimensional selectivity estimation using compressed histogram information. In ACM SIGMOD Record, volume 28, pages 205--214. ACM, 1999.
[21]
Z. Liu, H. A. Song, V. Zadorozhny, C. Faloutsos, and N. Sidiropoulos. H-fuse: Efficient fusion of aggregated historical data. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 786--794, Houston, Texas, USA, April 2017.
[22]
A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009.
[23]
A. Panagiotopoulou and V. Anastassopoulos. Super-resolution image reconstruction techniques: Trade-offs between the data-fidelity and regularization terms. Information Fusion, 13(3):185--195, 2012.
[24]
T. Rekatsinas, M. Joglekar, H. Garcia-Molina, A. Parameswaran, and C. Ré. Slimfast: Guaranteed results for data fusion and source reliability. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1399--1414. ACM, 2017.
[25]
S. Saha. Image compression - from dct to wavelets: A review. Crossroads, 6(3):12--21, Mar. 2000.
[26]
C. Sax and P. Steiner. Temporal disaggregation of time series. The R Journal, 5(2):80--87, 2003.
[27]
Tycho. Project tycho: Data for health. https://www.tycho.pitt.edu, 2013.
[28]
A. B. Watson. Image compression using the discrete cosine transform. Mathematica journal, 4(1):81, 1994.
[29]
V. Zadorozhny and M. Lewis. Information fusion for usar operations based on crowdsourcing. In Information Fusion (FUSION), 2013 16th International Conference on, pages 1450--1457. IEEE, 2013.

Cited By

View all
  • (2020)TurboLift: fast accuracy lifting for historical data recoveryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00609-629:5(1129-1148)Online publication date: 9-Mar-2020
  • (2020)Tendi: Tensor Disaggregation from Multiple Coarse ViewsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47436-2_65(867-880)Online publication date: 11-May-2020
  • (2019)IncompFuse: a logical framework for historical information fusion with inaccurate data sourcesJournal of Intelligent Information Systems10.1007/s10844-019-00569-654:3(463-481)Online publication date: 20-Jun-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 11
July 2018
507 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2018
Published in PVLDB Volume 11, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)TurboLift: fast accuracy lifting for historical data recoveryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00609-629:5(1129-1148)Online publication date: 9-Mar-2020
  • (2020)Tendi: Tensor Disaggregation from Multiple Coarse ViewsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47436-2_65(867-880)Online publication date: 11-May-2020
  • (2019)IncompFuse: a logical framework for historical information fusion with inaccurate data sourcesJournal of Intelligent Information Systems10.1007/s10844-019-00569-654:3(463-481)Online publication date: 20-Jun-2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media