Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Maximum error-bounded Piecewise Linear Representation for online stream approximation

Published: 01 December 2014 Publication History

Abstract

Given a time series data stream, the generation of error-bounded Piecewise Linear Representation (error-bounded PLR) is to construct a number of consecutive line segments to approximate the stream, such that the approximation error does not exceed a prescribed error bound. In this work, we consider the error bound in $$L_\infty $$ norm as approximation criterion, which constrains the approximation error on each corresponding data point, and aim on designing algorithms to generate the minimal number of segments. In the literature, the optimal approximation algorithms are effectively designed based on transformed space other than time-value space, while desirable optimal solutions based on original time domain (i.e., time-value space) are still lacked. In this article, we proposed two linear-time algorithms to construct error-bounded PLR for data stream based on time domain, which are named OptimalPLR and GreedyPLR, respectively. The OptimalPLR is an optimal algorithm that generates minimal number of line segments for the stream approximation, and the GreedyPLR is an alternative solution for the requirements of high efficiency and resource-constrained environment. In order to evaluate the superiority of OptimalPLR, we theoretically analyzed and compared OptimalPLR with the state-of-art optimal solution in transformed space, which also achieves linear complexity. We successfully proved the theoretical equivalence between time-value space and such transformed space, and also discovered the superiority of OptimalPLR on processing efficiency in practice. The extensive results of empirical evaluation support and demonstrate the effectiveness and efficiency of our proposed algorithms.

References

[1]
Berg, M.D., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry Algorithms and Applications. Springer, Berlin (2008)
[2]
Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: Proceedings of the 23rd International Conference on Data, Engineering, pp. 1026---1035 (2007)
[3]
Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very large Data Bases, pp. 435---446 (2007)
[4]
Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2, 145---156 (2009)
[5]
Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In: Proceedings of IEEE 26th International Conference on Data, Engineering, pp. 924---935 (2010)
[6]
Gandhi, S., Nath, S., Suri, S., Liu, J.: Gamps: compressing multi sensor data by grouping and amplitude scaling. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp. 771---784 (2009)
[7]
Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166---176 (2004)
[8]
Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54, 811---830 (2008)
[9]
Guha, S., Shim, K.: A note on linear time algorithms for maximum error histograms. IEEE Trans. Knowl. Data. Eng. 19, 993---997 (2007)
[10]
Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 380---389 (2007)
[11]
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 151---162 (2001)
[12]
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 289---296 (2001)
[13]
Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The ucr time series classification/clustering homepage. www.cs.ucr.edu/eamonn/time_series_data/ (2011)
[14]
Lazaridis, I., Mehrota, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings of the 19th International Conference on Data, Engineering, pp. 429---440 (2003)
[15]
Li, G., Li, J., Gao, H.: $$\varepsilon $$¿-approximation to data streams in sensor networks. In: Proceedings of IEEE INFOCOM, pp. 1663---1671 (2013)
[16]
Matias, Y., Urieli, D.: Optimal workload-based weighted wavelet synopses. In: Database Theory--ICDT 2005, pp. 368---382. Springer, Berlin (2005)
[17]
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 563---574 (2003)
[18]
O'Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574---578 (1981)
[19]
Paix, A.D., Williamson, J.A., Runciman, W.B.: Crisis management during anaesthesia: difficult intubation. Qual. Saf. Health Care (2005)
[20]
Palpanas, T., Vlachos, M., Keogh, E.: Online amnesic approximation of streaming time series. In: Proceedings of the 20th International Conference on Data, Engineering, pp. 339---349 (2004)
[21]
Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 732---743 (2009)
[22]
Pang, C., Zhang, Q., Zhou, X., Hansen, D., Wang, S., Maeder, A.: Computing unrestricted synopses under maximum error bound. Algorithmica 65, 1---42 (2013)
[23]
Sathe, S., Papaioannou, T.G., Jeung, H., Aberer, K.: A survey of model-based sensor data acquisition and management. In: Managing and Mining Sensor Data, pp. 9---50 Springer (2013)
[24]
Shatkay, H., Zdonik, S.B.: Approximate queries and representations for large data sequences. In: Proceedings of the 12th International Conference on Data, Engineering, pp. 536---545 (1996)
[25]
Soroush, E., Wu, K., Pei, J.: Fast and quality-guaranteed data streaming in resource-constrained sensor networks. In: Proceedings of the 9th ACM International Symposium on Mobile ad Hoc Networking and Computing, pp. 391---400 (2008)
[26]
Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23---34 (2004)
[27]
Xu, Z., Zhang, R., Kotagiri, R., Parampalli, U.: An adaptive algorithm for online time series segmentation with error bound guarantee. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 192---203 (2012)
[28]
Yu, L., Li, J., Gao, H., Fang, X.: Enabling $$\epsilon $$∈-approximate querying in sensor networks. Proc. VLDB Endow. 2(1), 169---180 (2009)
[29]
Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: Proceedings of 14th International Conference on Database Systems for Advanced Applications, pp. 646---661 (2009)
[30]
Zhou, M., Wong, M.H.: A segment-wise time warping method for time scaling searching. Inf. Sci. 173, 227---254 (2005)

Cited By

View all
  • (2024)PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential PrivacyProceedings of the VLDB Endowment10.14778/3681954.368198117:11(3031-3044)Online publication date: 1-Jul-2024
  • (2024)Revisiting Learned Index with Byte-addressable Persistent StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673113(929-938)Online publication date: 12-Aug-2024
  • (2024)Kanva: A Lock-free Learned Search Data StructureProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673082(252-261)Online publication date: 12-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 23, Issue 6
December 2014
140 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 December 2014

Author Tags

  1. Error bound
  2. Piecewise Linear Representation
  3. Stream approximation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)13
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential PrivacyProceedings of the VLDB Endowment10.14778/3681954.368198117:11(3031-3044)Online publication date: 1-Jul-2024
  • (2024)Revisiting Learned Index with Byte-addressable Persistent StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673113(929-938)Online publication date: 12-Aug-2024
  • (2024)Kanva: A Lock-free Learned Search Data StructureProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673082(252-261)Online publication date: 12-Aug-2024
  • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
  • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
  • (2024)AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value StoreIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335510036:7(2877-2894)Online publication date: 17-Jan-2024
  • (2024)Flexible grouping of linear segments for highly accurate lossy compression of time series dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00862-z33:5(1569-1589)Online publication date: 1-Sep-2024
  • (2023)ROLEXProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585945(99-113)Online publication date: 21-Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media