Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3085504.3085583acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Improving Statistical Similarity Based Data Reduction for Non-Stationary Data

Published: 27 June 2017 Publication History

Abstract

We propose a new class of lossy compression based on locally exchangeable measure that captures the distribution of repeating data blocks while preserving unique patterns. The technique has been demonstrated to reduce data volume by more than 100-fold on power grid monitoring data where a large number of data blocks can be characterized as following stationary probability distributions. To capture data with more diverse patterns, we propose two techniques to transform non-stationary time series into locally stationary blocks. We also propose a strategy to work with values in bounded ranges such as phase angles of alternating current. These new ideas are incorporated into a software package named IDEALEM. In experiments, IDEALEM reduces non-stationary data volume up to 100-fold. Compared with the state-of-the-art lossy compression methods such as SZ, IDEALEM can produce more compact output overall.

References

[1]
Martin Burtscher and Paruj Ratanaworabhan. 2009. FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58, 1 (January 2009), 18--31.
[2]
Emmanuel J. Candès and Michael B. Wakin. 2008. An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 2 (March 2008), 21--30.
[3]
Jaesik Choi, Kejia Hu, and Alex Sim. 2013. Relational dynamic Bayesian networks with locally exchangeable measures. Technical Report LBNL-6341E. Lawrence Berkeley National Laboratory.
[4]
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In Proc. Int'l Parallel Distrib. Process. (IPDPS '16). 730--739.
[5]
Sheng Di, Dingwen Tao, and Franck Cappello. 2017. SZ: fast error-bounded floating-point data compressor for scientific applications. (February 2017). Retrieved May 16, 2017 from https://collab.mcs.anl.gov/display/ESR/SZ
[6]
Jean-Loup Gailly and Mark Adler. 2003. The gzip home page. (July 2003). Retrieved May 16, 2017 from http://www.gzip.org
[7]
NCSU Big Data Analytics Group. 2013. ISABELA: effective in-situ compression of scientific data. (2013). Retrieved May 16, 2017 from http://freescience.org/cs/ISABELA/ISABELA.html
[8]
Jeremy Iverson, Chandrika Kamath, and George Karypis. 2012. Fast and effective lossy compression algorithms for scientific datasets. In Proc. Int'l Conf. Parallel Process. (Euro-Par '12). 843--856.
[9]
Arnaud E. Jacquin. 1992. Image coding based on a fractal theory of iterated contractive image transformations. IEEE Trans. Image Process. 1, 1 (January 1992), 18--30.
[10]
Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. 2011. Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In Proc. Int'l Conf. Parallel Process. (Euro-Par '11). 366--379.
[11]
Dongeun Lee and Jaesik Choi. 2014. Low complexity sensing for big spatio-temporal data. In Proc. Int'l Conf. Big Data (BigData '14). 323--328.
[12]
Dongeun Lee, Jaesik Choi, and Heonshik Shin. 2015. A scalable and flexible repository for big sensor data. IEEE Sensors J. 15, 12 (December 2015), 7284--7294.
[13]
Dongeun Lee, Alex Sim, Jaesik Choi, and Kesheng Wu. 2016. Novel data reduction based on statistical similarity. In Proc. Int'l Conf. Scient. Stat. Database Manag. (SSDBM '16). 21:1--21:12.
[14]
Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans. Vis. Comput. Graphics 20, 12 (December 2014), 2674--2683.
[15]
Peter Lindstrom. 2017. zfp & fpzip: floating point compression. (March 2017). Retrieved May 16, 2017 from http://computation.llnl.gov/projects/floating-point-compression
[16]
Mark Nelson and Jean-Loup Gailly. 1996. The Data Compression Book (second ed.). Vol. 2. M&T Books.
[17]
Céline Quinsac, Adrian Basarab, Jean-Marc Girault, and Denis Kouamé. 2010. Compressed sensing of ultrasound images: sampling of spatial and frequency domains. In Proc. Int'l Workshop Signal Process. Syst. (SiPS '10). 231--236.
[18]
Khalid Sayood. 2012. Introduction to Data Compression (fourth ed.). Morgan Kaufmann.
[19]
José Seabra and Joao Sanches. 2008. Modeling log-compressed ultrasound images for radio frequency signal recovery. In Proc. Int'l Conf. Eng. Med. Biol. Soc. (EMBC '08). 426--429.
[20]
Alex Sim, Dongeun Lee, Kesheng Wu, and Jaesik Choi. 2016. IDEALEM. (November 2016). Retrieved May 16, 2017 from http://datagrid.lbl.gov/idealem
[21]
Brendt Wohlberg and Gerhard De Jager. 1999. A review of the fractal image coding literature. IEEE Trans. Image Process. 8, 12 (December 1999), 1716--1729.
[22]
Jiangsheng Yu, Stefano Ongarello, R. Fiedler, Xuewen Chen, Gianna Toffolo, Claudio Cobelli, and Zlatko Trajanoski. 2005. Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinform. 21, 10 (May 2005), 2200--2209.
[23]
Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 3 (May 1977), 337--343.

Cited By

View all
  • (2020)Fulfilling the Promises of Lossy Compression for Scientific ApplicationsDriving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI10.1007/978-3-030-63393-6_7(99-116)Online publication date: 18-Dec-2020
  • (2019)Similarity-based Compression with Multidimensional Pattern MatchingProceedings of the ACM Workshop on Systems and Network Telemetry and Analytics10.1145/3322798.3329252(19-24)Online publication date: 17-Jun-2019
  • (2018)Dynamic Online Performance Optimization in Streaming Data Compression2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8621867(534-541)Online publication date: Dec-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management
June 2017
373 pages
ISBN:9781450352826
DOI:10.1145/3085504
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • Northwestern University: Northwestern University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Floating-point data
  2. locally exchangeable measure
  3. lossy compression
  4. online algorithm
  5. time series data

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SSDBM '17

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Fulfilling the Promises of Lossy Compression for Scientific ApplicationsDriving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI10.1007/978-3-030-63393-6_7(99-116)Online publication date: 18-Dec-2020
  • (2019)Similarity-based Compression with Multidimensional Pattern MatchingProceedings of the ACM Workshop on Systems and Network Telemetry and Analytics10.1145/3322798.3329252(19-24)Online publication date: 17-Jun-2019
  • (2018)Dynamic Online Performance Optimization in Streaming Data Compression2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8621867(534-541)Online publication date: Dec-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media