article

Free access

Maximum error-bounded Piecewise Linear Representation for online stream approximation

Authors:

Xiangliang Zhang,

Ke DengAuthors Info & Claims

The VLDB Journal — The International Journal on Very Large Data Bases, Volume 23, Issue 6

Pages 915 - 937

https://doi.org/10.1007/s00778-014-0355-0

Published: 01 December 2014 Publication History

Abstract

Given a time series data stream, the generation of error-bounded Piecewise Linear Representation (error-bounded PLR) is to construct a number of consecutive line segments to approximate the stream, such that the approximation error does not exceed a prescribed error bound. In this work, we consider the error bound in $$L_\infty $$ norm as approximation criterion, which constrains the approximation error on each corresponding data point, and aim on designing algorithms to generate the minimal number of segments. In the literature, the optimal approximation algorithms are effectively designed based on transformed space other than time-value space, while desirable optimal solutions based on original time domain (i.e., time-value space) are still lacked. In this article, we proposed two linear-time algorithms to construct error-bounded PLR for data stream based on time domain, which are named OptimalPLR and GreedyPLR, respectively. The OptimalPLR is an optimal algorithm that generates minimal number of line segments for the stream approximation, and the GreedyPLR is an alternative solution for the requirements of high efficiency and resource-constrained environment. In order to evaluate the superiority of OptimalPLR, we theoretically analyzed and compared OptimalPLR with the state-of-art optimal solution in transformed space, which also achieves linear complexity. We successfully proved the theoretical equivalence between time-value space and such transformed space, and also discovered the superiority of OptimalPLR on processing efficiency in practice. The extensive results of empirical evaluation support and demonstrate the effectiveness and efficiency of our proposed algorithms.

References

[1]

Berg, M.D., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry Algorithms and Applications. Springer, Berlin (2008)

[2]

Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: Proceedings of the 23rd International Conference on Data, Engineering, pp. 1026---1035 (2007)

[3]

Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very large Data Bases, pp. 435---446 (2007)

[4]

Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2, 145---156 (2009)

[5]

Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In: Proceedings of IEEE 26th International Conference on Data, Engineering, pp. 924---935 (2010)

[6]

Gandhi, S., Nath, S., Suri, S., Liu, J.: Gamps: compressing multi sensor data by grouping and amplitude scaling. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp. 771---784 (2009)

[7]

Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166---176 (2004)

[8]

Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54, 811---830 (2008)

[9]

Guha, S., Shim, K.: A note on linear time algorithms for maximum error histograms. IEEE Trans. Knowl. Data. Eng. 19, 993---997 (2007)

[10]

Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 380---389 (2007)

[11]

Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 151---162 (2001)

[12]

Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 289---296 (2001)

[13]

Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The ucr time series classification/clustering homepage. www.cs.ucr.edu/eamonn/time_series_data/ (2011)

[14]

Lazaridis, I., Mehrota, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data, Engineering, pp. 429---440 (2003)

[15]

Li, G., Li, J., Gao, H.: $$\varepsilon $$¿-approximation to data streams in sensor networks. In: Proceedings of IEEE INFOCOM, pp. 1663---1671 (2013)

[16]

Matias, Y., Urieli, D.: Optimal workload-based weighted wavelet synopses. In: Database Theory--ICDT 2005, pp. 368---382. Springer, Berlin (2005)

[17]

Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 563---574 (2003)

[18]

O'Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574---578 (1981)

[19]

Paix, A.D., Williamson, J.A., Runciman, W.B.: Crisis management during anaesthesia: difficult intubation. Qual. Saf. Health Care (2005)

[20]

Palpanas, T., Vlachos, M., Keogh, E.: Online amnesic approximation of streaming time series. In: Proceedings of the 20th International Conference on Data, Engineering, pp. 339---349 (2004)

[21]

Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 732---743 (2009)

[22]

Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65, 1---42 (2013)

[23]

Sathe, S., Papaioannou, T.G., Jeung, H., Aberer, K.: A survey of model-based sensor data acquisition and management. In: Managing and Mining Sensor Data, pp. 9---50 Springer (2013)

[24]

Shatkay, H., Zdonik, S.B.: Approximate queries and representations for large data sequences. In: Proceedings of the 12th International Conference on Data, Engineering, pp. 536---545 (1996)

[25]

Soroush, E., Wu, K., Pei, J.: Fast and quality-guaranteed data streaming in resource-constrained sensor networks. In: Proceedings of the 9th ACM International Symposium on Mobile ad Hoc Networking and Computing, pp. 391---400 (2008)

[26]

Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23---34 (2004)

[27]

Xu, Z., Zhang, R., Kotagiri, R., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 192---203 (2012)

[28]

Yu, L., Li, J., Gao, H., Fang, X.: Enabling $$\epsilon $$∈-approximate querying in sensor networks. Proc. VLDB Endow. 2(1), 169---180 (2009)

[29]

Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: Proceedings of 14th International Conference on Database Systems for Advanced Applications, pp. 646---661 (2009)

[30]

Zhou, M., Wong, M.H.: A segment-wise time warping method for time scaling searching. Inf. Sci. 173, 227---254 (2005)

Cited By

Wang LYe QHu HMeng X(2024)PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential PrivacyProceedings of the VLDB Endowment10.14778/3681954.368198117:11(3031-3044)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681981
Zhang RHuang YLiang SSun SMa SHuan CChen LLu ZXu YYan MWu J(2024)Revisiting Learned Index with Byte-addressable Persistent StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673113(929-938)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673113
Bhardwaj GChatterjee BSharma APeri SNayak S(2024)Kanva: A Lock-free Learned Search Data StructureProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673082(252-261)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673082
Show More Cited By

Recommendations

Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems

In this paper we study the (in)approximability of two distance-based relaxed variants of the maximum clique problem (Max Clique), named Max d-Clique and Max d-Club: A d-clique in a graph $$G = (V, E)$$G=(V,E) is a subset $$S\subseteq V$$S⊆V of vertices ...
Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems
COCOA 2015: Proceedings of the 9th International Conference on Combinatorial Optimization and Applications - Volume 9486

A d-clique in a graph $$G = V, E$$G=V,E is a subset $$S\subseteq V$$S⊆V of vertices such that for pairs of vertices $$u, v\in S$$u,v∈S, the distance between u and v is at most d in G. A d-club in a graph $$G = V, E$$G=V,E is a subset $$S'\subseteq V$$S'⊆...
On Packing Two Graphs with Bounded Sum of Sizes and Maximum Degree

A packing of graphs $G_1$ and $G_2$, both on $n$ vertices, is a set $\{H_1,H_2\}$ such that $H_1\cong G_1$, $H_2\cong G_2$, and $H_1$ and $H_2$ are edge disjoint subgraphs of $K_n$. In 1978, Sauer and Spencer [J. Combin. Theory Ser. B, 25 (1978), pp. 295--...

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 23, Issue 6

December 2014

140 pages

ISSN:1066-8888

Issue’s Table of Contents

Copyright © Copyright © 2014 Springer-Verlag Berlin Heidelberg.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 December 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
253
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)13

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang LYe QHu HMeng X(2024)PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential PrivacyProceedings of the VLDB Endowment10.14778/3681954.368198117:11(3031-3044)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681981
Zhang RHuang YLiang SSun SMa SHuan CChen LLu ZXu YYan MWu J(2024)Revisiting Learned Index with Byte-addressable Persistent StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673113(929-938)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673113
Bhardwaj GChatterjee BSharma APeri SNayak S(2024)Kanva: A Lock-free Learned Search Data StructureProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673082(252-261)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673082
Havers BPapatriantafilou MGulisano VGramoli V(2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3663338.3663680
Zhang SQi JYao XBrinkmann A(2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654948
Choi MYoo SChoi J(2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654919
Liu YZeng XZhang H(2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639320
Qiao PZhang ZLi YYuan YWang SWang GYu J(2024)AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value StoreIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335510036:7(2877-2894)Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3355100
Kitsios XLiakos PPapakonstantinopoulou KKotidis Y(2024)Flexible grouping of linear segments for highly accurate lossy compression of time series dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00862-z33:5(1569-1589)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00862-z
Li PHua YZuo PChen ZSheng JNaor DGoel A(2023)ROLEXProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585945(99-113)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.5555/3585938.3585945
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents