Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3322798.3329252acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open access

Similarity-based Compression with Multidimensional Pattern Matching

Published: 17 June 2019 Publication History

Abstract

Sensors typically record their measurements using more precision than the accuracy of the sensing techniques. Thus, experimental and observational data often contain noise that appears random and cannot be easily compressed. This noise increases storage requirement as well as computation time for analyses. In this work, we describe a line of research to develop data reduction techniques that preserve the key features while reducing the storage requirement. Our core observation is that the noise in such cases could be characterized by a small number of patterns based on statistical similarity. In earlier tests, this approach was shown to reduce the storage requirement by over 100-fold for one-dimensional sequences. In this work, we explore a set of different similarity measures for multidimensional sequences. During our tests with standard quality measures such as Peak Signal to Noise Ratio (PSNR), we observe that the new compression methods reduce the storage requirements over 100- fold while maintaining relatively low errors in PSNR. Thus, we believe that this is an effective strategy to construct data reduction techniques.

References

[1]
Martin Burtscher and Paruj Ratanaworabhan. 2009. FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput., Vol. 58, 1 (Jan. 2009), 18--31.
[2]
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In Proc. Int'l Parallel Distrib. Process. (IPDPS '16). 730--739.
[3]
Sheng Di, Dingwen Tao, and Franck Cappello. 2017. SZ: fast error-bounded floating-point data compressor for scientific applications. https://collab.mcs.anl.gov/display/ESR/SZ Retrieved February 3, 2017 from
[4]
Persi Diaconis. 1988. Recent progress on de Finetti's notions of exchangeability. Bayesian statistics, Vol. 3 (1988), 111--125.
[5]
Shan Dou, Nate Lindsey, Anna M. Wagner, Thomas M. Daley, Barry Freifeld, Michelle Robertson, John Peterson, Craig Ulrich, Eileen R. Martin, and Jonathan B. Ajo-Franklin. 2017. Distributed Acoustic Sensing for Seismic Monitoring of The Near Surface: A Traffic-Noise Interferometry Case Study. Nature (September 2017), 11620.
[6]
Kade Gibson, Dongeun Lee, Jaesik Choi, and Alex Sim. 2018. Dynamic Online Performance Optimization in Streaming Data Compression. In Proc. IEEE International Conference on Big Data (Big Data 2018) .
[7]
Jeremy Iverson, Chandrika Kamath, and George Karypis. 2012. Fast and effective lossy compression algorithms for scientific datasets. In Proc. Int'l Conf. Parallel Process. (Euro-Par '12). 843--856.
[8]
Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. 2011. Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In Proc. Int'l Conf. Parallel Process. (Euro-Par '11). 366--379.
[9]
Dongeun Lee, Alex Sim, Jaesik Choi, and Kesheng Wu. 2016. Novel data reduction based on statistical similarity. In Proc. Int'l Conf. Scient. Stat. Database Manag. (SSDBM '16). 21:1--21:12.
[10]
DongEun Lee, Alex Sim, Jaesik Choi, and Kesheng Wu. 2017. Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns. In 2017 Data Compression Conference (DCC). 445--445.
[11]
Dongeun Lee, Alex Sim, Jaesik Choi, and Kesheng Wu. 2017. Improving Statistical Similarity Based Data Reduction for Non-Stationary Data. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM '17). ACM, New York, NY, USA, Article 37, bibinfonumpages6 pages.
[12]
Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans. Vis. Comput. Graphics, Vol. 20, 12 (Dec. 2014), 2674--2683.
[13]
Sean Peisert, Reinhard Gentz, Joshua Boverhof, Chuck McParland, Sophie Engle, Abdelrahman Elbashandy, and Dan Gunter. 2017. LBNL Open Power Data. Technical Report. Lawrence Berkeley National Laboratory.
[14]
Iain E Richardson. 2010. The H.264 Advanced Video Compression Standard second ed.). John Wiley and Sons.
[15]
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, Vol. 26, 1 (1978), 43--49.
[16]
Khalid Sayood. 2012. Introduction to data compression fourth ed.). Newnes.
[17]
Joan Serrà and Josep Lluís Arcos. 2012. A Competitive Measure to Assess the Similarity between Two Time Series. In Case-Based Reasoning Research and Development, Belén Díaz Agudo and Ian Watson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 414--427.
[18]
Joan Serrà and Josep Ll. Arcos. 2014. An empirical evaluation of similarity measures for time series classification. Knowledge-Based Systems, Vol. 67 (2014), 305--314.
[19]
Alex Sim, Dongeun Lee, Kesheng Wu, and Jaesik Choi. 2016. IDEALEM. https://sdm.lbl.gov/idealem .
[20]
Przemysław Skibi'nski, Szymon Grabowski, and Sebastian Deorowicz. 2005. Revisiting dictionary-based compression. Software: Practice and Experience, Vol. 35, 15 (2005), 1455--1476.
[21]
D. Tao, S. Di, Z. Chen, and F. Cappello. 2017. Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1129--1139.
[22]
J. Uthayakumar, T. Vengattaraman, and P. Dhavachelvan. 2018. A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. Journal of King Saud University - Computer and Information Sciences (2018).
[23]
Kesheng Wu, Dongeun Lee, Alex Sim, and Jaesik Choi. 2017. Statistical data reduction for streaming data. In 2017 New York Scientific Data Summit (NYSDS). 1--6.
[24]
Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, Vol. 23, 3 (May 1977), 337--343.

Index Terms

  1. Similarity-based Compression with Multidimensional Pattern Matching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SNTA '19: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics
    June 2019
    58 pages
    ISBN:9781450367615
    DOI:10.1145/3322798
    • General Chairs:
    • Jinoh Kim,
    • Alex Sim
    © 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. compression
    2. idealem
    3. performance pattern
    4. test statistics

    Qualifiers

    • Research-article

    Funding Sources

    • Office of Advanced Scientific Computing Research Office of Science of the U.S. Department of Energy

    Conference

    HPDC '19
    Sponsor:

    Acceptance Rates

    SNTA '19 Paper Acceptance Rate 22 of 106 submissions, 21%;
    Overall Acceptance Rate 22 of 106 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 222
      Total Downloads
    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media