Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3307772.3328285acmotherconferencesArticle/Chapter ViewAbstractPublication Pagese-energyConference Proceedingsconference-collections
research-article

Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets

Published: 15 June 2019 Publication History

Abstract

Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Data analysis together with machine learning models can be utilized to extract valuable information for the benefit of occupants themselves (conserve energy and increase comfort levels), power plants (maintenance), and grid operators (stability). Public energy datasets provide a scientific foundation to develop and benchmark these algorithms and techniques. With datasets exceeding tens of terabytes, we present a novel study of five whole-building energy datasets with high sampling rates, their signal entropy, and how a well-calibrated measurement can have a significant effect on the overall storage requirements. We show that some datasets do not fully utilize the available measurement precision, therefore leaving potential accuracy and space savings untapped. We benchmark a comprehensive list of 365 file formats, transparent data transformations, and lossless compression algorithms. The primary goal is to reduce the overall dataset size while maintaining an easy-to-use file format and access API. We show that with careful selection of file format and encoding scheme, we can reduce the size of some datasets by up to 73%.

References

[1]
Francesc Alted. 2017. Blosc: A high performance compressor optimized for binary data. Retrieved January 20, 2018 from http://blosc.org/
[2]
American National Standards Institute. 2016. ANSI C84.1-2016: Standard for Electric Power Systems and Equipment---Voltage Ratings (60 Hz).
[3]
Kyle Anderson, Adrian Ocneanu, Diego Benitez, Derrick Carlson, Anthony Rowe, and Mario Berges. 2012. BLUED: A Fully Labeled Public Dataset for Event-Based Non-Intrusive Load Monitoring Research. In SustKDD '12. ACM, Beijing, China, 1--5.
[4]
R. Arnold and T. Bell. 1997. A corpus for the evaluation of lossless compression algorithms. In Data Compression Conference, 1997. DCC '97. Proceedings. 201--210.
[5]
IEEE Standards Association. 2018. COMTRADE: Common format for Transient Data Exchange for power systems. Retrieved January 20, 2018 from https://standards.ieee.org/findstds/standard/C37.111-2013.html
[6]
Nipun Batra, Jack Kelly, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, and Mani Srivastava. 2014. NILMTK: An Open Source Toolkit for Non-intrusive Load Monitoring. In ACM e-Energy '14. ACM, New York, NY, USA, 265--276.
[7]
Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, and Arie Shoshani. 2014. Parallel Data Analysis Directly on Scientific File Formats. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 385--396.
[8]
Abraham Bookstein and James A. Storer. 1992. Data compression. Information Processing & Management 28, 6 (1992), 675--680. Special Issue: Data compression for images and texts.
[9]
David Bryant. 2018. WavPack: Hybrid Lossless Audio Compression. Retrieved January 20, 2018 from http://www.wavpack.com/
[10]
J. C. S. de Souza, T. M. L. Assis, and B. C. Pal. 2017. Data Compression in Smart Distribution Systems via Singular Value Decomposition. IEEE Transactions on Smart Grid 8, 1 (Jan 2017), 275--284.
[11]
E. Deelman and A. Chervenak. 2008. Data Management Challenges of Data-Intensive Scientific Workflows. In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). 687--692.
[12]
Matthew T. Dougherty, Michael J. Folk, Erez Zadok, Herbert J. Bernstein, Frances C. Bernstein, Kevin W. Eliceiri, Werner Benger, and Christoph Best. 2009. Unifying Biological Image Formats with HDF5. Commun. ACM 52, 10 (Oct. 2009), 42--47.
[13]
Frank Eichinger, Pavel Efros, Stamatis Karnouskos, and Klemens Böhm. 2015. A Time-series Compression Technique and Its Application to the Smart Grid. The VLDB Journal 24, 2 (April 2015), 193--218.
[14]
European Committee for Electrotechnical Standardization. 1989. CENELEC Harmonisation Document HD 472 S1.
[15]
Mike Folk, Gerd Heber, Quincey Koziol, Elena Pourmal, and Dana Robinson. 2011. An Overview of the HDF5 Technology Suite and Its Applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases (AD '11). ACM, New York, NY, USA, 36--47.
[16]
Xiph.Org Foundation. 2018. FLAC: Free Lossless Audio Codec. Retrieved January 20, 2018 from https://xiph.org/flac/
[17]
O. N. Gerek and D. G. Ece. 2004. 2-D analysis and compression of power-quality event data. IEEE Transactions on Power Delivery 19, 2 (April 2004), 791--798.
[18]
L. Gosink, J. Shalf, K. Stockinger, Kesheng Wu, and W. Bethel. 2006. HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices. In 18th International Conference on Scientific and Statistical Database Management (SSDBM'06). 149--158.
[19]
Free Standards Group. 2018. DWARF Debugging Information Format Specification Version 3.0. Retrieved January 20, 2018 from http://dwarfstd.org/doc/Dwarf3.pdf
[20]
HDF Group. 2017. Szip Compression in HDF Products. Retrieved January 20, 2018 from https://support.hdfgroup.org/doc_resource/SZIP/
[21]
Anwar Ul Haq, Thomas Kriechbaumer, Matthias Kahl, and Hans-Arno Jacobsen. 2017. CLEAR -- A Circuit Level Electric Appliance Radar for the Electric Cabinet. In 2017 IEEE International Conference on Industrial Technology (ICIT '17). 1130--1135.
[22]
Nathanael Hübbe and Julian Kunkel. 2013. Reducing the HPC-datastorage footprint with MAFISC---Multidimensional Adaptive Filtering Improved Scientific data Compression. Computer Science - Research and Development 28, 2 (01 May 2013), 231--239.
[23]
Apple Inc. 2018. ALAC: Apple Lossless Audio Codec. Retrieved January 20, 2018 from https://macosforge.github.io/alac/
[24]
Matthias Kahl, Anwar Ul Haq, Thomas Kriechbaumer, and Hans-Arno Jacobsen. 2017. A Comprehensive Feature Study for Appliance Recognition on High Frequency Energy Data. In Proceedings of the 2017 ACM Eighth International Conference on Future Energy Systems (e-Energy '17). ACM, New York, NY, USA.
[25]
Jack Kelly and William Knottenbelt. 2015. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Scientific Data 2, 150007 (2015).
[26]
J. Zico Kolter and Matthew J. Johnson. {n.d.}. REDD: A Public Data Set for Energy Disaggregation Research. In SustKDD '11 (2011), Vol. 25. 59--62.
[27]
Thomas Kriechbaumer, Anwar Ul Haq, Matthias Kahl, and Hans-Arno Jacobsen. 2017. MEDAL: A Cost-Effective High-Frequency Energy Data Acquisition System for Electrical Appliances. In Proceedings of the 2017 ACM Eighth International Conference on Future Energy Systems (e-Energy '17). ACM, New York, NY, USA.
[28]
Thomas Kriechbaumer and Hans-Arno Jacobsen. 2018. BLOND, a building-level office environment dataset of typical electrical appliances.
[29]
Guoxin Liu and Haiying Shen. 2017. Minimum-Cost Cloud Storage Service Across Multiple Cloud Providers. IEEE/ACM Trans. Netw. 25, 4 (Aug. 2017), 2498--2513.
[30]
K. Masui, M. Amiri, L. Connor, M. Deng, M. Fandino, C. Höfer, M. Halpern, D. Hanna, A.D. Hincks, G. Hinshaw, J.M. Parra, L.B. Newburgh, J.R. Shaw, and K. Vanderlinde. 2015. A compression scheme for radio data in high performance computing. Astronomy and Computing 12, Supplement C (2015), 181--190.
[31]
M. N. Meziane, T. Picon, P. Ravier, G. Lamarque, J. C. Le Bunetel, and Y. Raingeaud. 2016. A Measurement System for Creating Datasets of On/Off-Controlled Electrical Loads. In 2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC). 1--5.
[32]
Alistair Miles. 2018. Zarr: A Python package providing an implementation of chunked, compressed, N-dimensional arrays. Retrieved January 20, 2018 from https://zarr.readthedocs.io/en/latest/
[33]
Muhammad Nabeel, Fahad Javed, and Naveed Arshad. 2013. Towards Smart Data Compression for Future Energy Management System. In Fifth International Conference on Applied Energy.
[34]
J. Paris, J. S. Donnal, and S. B. Leeb. 2014. NilmDB: The Non-Intrusive Load Monitor Database. IEEE Transactions on Smart Grid 5, 5 (Sept 2014), 2459--2467.
[35]
Lucas Pereira. 2017. EMD-DF: A Data Model and File Format for Energy Disaggregation Datasets. In Proceedings of the 4th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys '17). ACM, New York, NY, USA, Article 52, 2 pages.
[36]
Lucas Pereira, Nuno Nunes, and Mario Bergés. 2014. SURF and SURF-PI: A File Format and API for Non-intrusive Load Monitoring Public Datasets. In Proceedings of the 5th International Conference on Future Energy Systems (e-Energy '14). ACM, New York, NY, USA, 225--226.
[37]
Krishna P.N. Puttaswamy, Thyaga Nandagopal, and Murali Kodialam. 2012. Frugal Storage for Cloud File Systems. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 71--84.
[38]
A. Qing, Z. Hongtao, H. Zhikun, and C. Zhiwen. 2011. A Compression Approach of Power Quality Monitoring Data Based on Two-dimension DCT. In 2011 Third International Conference on Measuring Technology and Mechatronics Automation, Vol. 1. 20--24.
[39]
Martin Ringwelski, Christian Renner, Andreas Reinhardt, Andreas Weigel, and Volker Turau. 2012. The Hitchhiker's Guide to choosing the Compression Algorithm for your Smart Meter Data. (September 2012), 935--940.
[40]
S. Sehrish, J. Kowalkowski, M. Paterno, and C. Green. 2017. Python and HPC for High Energy Physics Data Analyses. In Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing (PyHPC'17). ACM, New York, NY, USA, Article 8, 8 pages.
[41]
C. E. Shannon. 1949. Communication in the Presence of Noise. Proceedings of the IRE 37, 1 (Jan 1949), 10--21.
[42]
IEEE Power & Energy Society. 2018. IEEE 1159 - PQDIF: Power Quality and Quantity Data Interchange Format. Retrieved January 20, 2018 from http://grouper.ieee.org/groups/1159/3/docs.html
[43]
Z. B. Tariq, N. Arshad, and M. Nabeel. 2015. Enhanced LZMA and BZIP2 for improved energy data compression. In 2015 International Conference on Smart Cities and Green ICT Systems (SMARTGREENS). 1--8.
[44]
Andreas Unterweger and Dominik Engel. 2015. Resumable load data compression in smart grids. IEEE Transactions on Smart Grid 6, 2 (2015), 919--929.
[45]
Andreas Unterweger, Dominik Engel, and Martin Ringwelski. 2015. The Effect of Data Granularity on Load Data Compression. Springer International Publishing, Cham, 69--80.
[46]
D. Yuan, Y. Yang, X. Liu, and J. Chen. 2010. A cost-effective strategy for intermediate data storage in scientific cloud workflow systems. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--12.

Cited By

View all
  • (2023)Non-Intrusive Load Monitoring (NILM) using Deep Neural Networks: A Review2023 IEEE International Conference on Environment and Electrical Engineering and 2023 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I&CPS Europe)10.1109/EEEIC/ICPSEurope57605.2023.10194770(1-6)Online publication date: 6-Jun-2023
  • (2022)A data model and file format to represent and store high frequency energy monitoring and disaggregation datasetsScientific Reports10.1038/s41598-022-14517-y12:1Online publication date: 18-Jun-2022
  • (2021)Frequency Selective Auto-Encoder for Smart Meter Data CompressionSensors10.3390/s2104152121:4(1521)Online publication date: 22-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
e-Energy '19: Proceedings of the Tenth ACM International Conference on Future Energy Systems
June 2019
589 pages
ISBN:9781450366717
DOI:10.1145/3307772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Energy dataset
  2. electricity aggregate
  3. file format
  4. high sampling rate
  5. non-intrusive load monitoring
  6. waveform compression

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

e-Energy '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 160 of 446 submissions, 36%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Non-Intrusive Load Monitoring (NILM) using Deep Neural Networks: A Review2023 IEEE International Conference on Environment and Electrical Engineering and 2023 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I&CPS Europe)10.1109/EEEIC/ICPSEurope57605.2023.10194770(1-6)Online publication date: 6-Jun-2023
  • (2022)A data model and file format to represent and store high frequency energy monitoring and disaggregation datasetsScientific Reports10.1038/s41598-022-14517-y12:1Online publication date: 18-Jun-2022
  • (2021)Frequency Selective Auto-Encoder for Smart Meter Data CompressionSensors10.3390/s2104152121:4(1521)Online publication date: 22-Feb-2021
  • (2020)PowerstripProceedings of the Eleventh ACM International Conference on Future Energy Systems10.1145/3396851.3397716(242-252)Online publication date: 12-Jun-2020
  • (2020)A practical approach to storage and retrieval of high-frequency physiological signalsPhysiological Measurement10.1088/1361-6579/ab7cb541:3(035008)Online publication date: 20-Apr-2020
  • (2020)A Versatile High Frequency Electricity Monitoring Framework for Our Future Connected HomeSustainable Energy for Smart Cities10.1007/978-3-030-45694-8_17(221-231)Online publication date: 9-Apr-2020
  • (2019)dsCleaner: A Python Library to Clean, Preprocess and Convert Non-Intrusive Load Monitoring DatasetsData10.3390/data40301234:3(123)Online publication date: 12-Aug-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media