Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3372454.3372466acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdrConference Proceedingsconference-collections
research-article

Adaptive Normalization in Streaming Data

Published: 21 January 2020 Publication History

Abstract

In today's digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an important preprocessing technique for data analytics. It helps prevent mismodeling and reduce the complexity inherent in the data especially for data integrated from multiple sources and contexts. Normalization of Big Data stream is challenging because of evolving inconsistencies, time and memory constraints, and non-availability of whole data beforehand. This paper proposes a distributed approach to adaptive normalization for Big data stream. Using sliding windows of fixed size, it provides a simple mechanism to adapt the statistics for normalizing changing data in each window. Implemented on Apache Storm, a distributed real-time stream data framework, our approach exploits distributed data processing for efficient normalization. Unlike other existing adaptive approaches that normalize data for a specific use (e.g., classification), ours does not. Moreover, our adaptive mechanism allows flexible controls, via user-specified thresholds, for normalization tradeoffs between time and precision. The paper illustrates our proposed approach along with a few other techniques and experiments on both synthesized and real-world data. The normalized data obtained from our proposed approach, on 160,000 instances of data stream, improves over the baseline by 89% with 0.0041 root-mean-square error compared with the actual data.

References

[1]
Elwell, R., & Polikar, R., "Incremental learning of concept drift in nonstationary environments", IEEE Transactions on Neural Networks, pp. 1517--1531, 2011.
[2]
García, S., et al., "Tutorial on practical tips of the most influential data preprocessing algorithms in data mining", Knowledge-Based Systems, 98, 1--29, 2016.
[3]
García, S., et al., "Big data preprocessing: methods and prospects", Big Data Analytics, 1(1), 9, 2016.
[4]
García, S., et al., "Data Preprocessing in Data Mining", Springer, 2015.
[5]
Gu, X. F., et al., "An improving online accuracy updated ensemble method in learning from evolving data streams", In Proceedings of 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP), pp. 430--433, 2014.
[6]
Han, J., et al., "Data mining: concepts and techniques", San Francisco: Morgan Kauffman, 2001.
[7]
Haykin, S., et al., "Neural networks and learning machines", Upper Saddle River: Pearson education, 2009.
[8]
Hu, H., & Kantardzic, M.," Smart preprocessing improves data stream mining", In Proceedings of 49th Hawaii International Conference on System Sciences (HICSS), pp. 1749--1757, 2016.
[9]
Lin, J., & Keogh, E., "Finding or not finding rules in time series", In Applications of Artificial Intelligence in Finance and Economics, pp. 175--201, Emerald Group Publishing Limited, 2004.
[10]
Lopez, M. A., et al., "A fast unsupervised preprocessing method for network monitoring", Annals of Telecommunications, 74(3-4), 139--155, 2019.
[11]
Ogasawara, E., et al., "Adaptive normalization: A novel data normalization approach for non-stationary time series", In Proceedings of International Joint Conference on Neural Networks (IJCNN), pp. 1--8, 2010.
[12]
Parker, B. S., et al., "Incremental ensemble classifier addressing non-stationary fast data streams", In Proceedings of IEEE International Conference on Data Mining Workshop, pp. 716--723, 2014.
[13]
Passalis, N., et al. "Deep Adaptive Input Normalization for Price Forecasting using Limit Order Book Data." arXiv:190.07892, 2019.
[14]
Pyle, D., Data preparation for data mining, morgan kaufmann, 1999.
[15]
Ramírez-Gallego, et al., "A survey on data preprocessing for data stream mining: Current status and future directions", Neurocomputing, 239, 39--57, 2017.
[16]
Street, W. N., & Kim, Y., "A streaming ensemble algorithm (SEA) for large-scale classification", In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 377--382, 2001.
[17]
Tan, P. N., et al., "Association analysis: basic concepts and algorithms", In Introduction to Data mining (Vol. 321321367). Boston, MA: Addison-Wesley, 2005.
[18]
Zliobaite, I., & Gabrys, B., "Adaptive preprocessing for streaming data", IEEE transactions on knowledge and data Engineering, 26(2), 309--321, 2012.
[19]
Toshniwal, Ankit, et al., "Storm@twitter," In Proceedings of the ACM SIGMOD international conference on Management of data, ACM, 2014.
[20]
Harries, M., & Wales, N. S., Splice-2 comparative evaluation: Electricity pricing, 1999.

Cited By

View all
  • (2024)Adaptive Learning for Soil Classification in Laser-Induced Breakdown Spectroscopy StreamingIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33752605:7(3714-3727)Online publication date: Jul-2024
  • (2023)Distance Functions and Normalization Under Stream Scenarios2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191283(1-8)Online publication date: 18-Jun-2023
  • (2023)ChatAN: Interval Adaptive Normalization based on ChatGPT Knowledge Augmentation2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP)10.1109/ICSP58490.2023.10248838(1082-1087)Online publication date: 21-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICBDR '19: Proceedings of the 3rd International Conference on Big Data Research
November 2019
192 pages
ISBN:9781450372015
DOI:10.1145/3372454
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Shandong Univ.: Shandong University
  • The University of Versailles Saint-Quentin: The University of Versailles Saint-Quentin, Versailles, France

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 January 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data Stream
  2. Normalization
  3. Preprocessing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICBDR 2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Learning for Soil Classification in Laser-Induced Breakdown Spectroscopy StreamingIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33752605:7(3714-3727)Online publication date: Jul-2024
  • (2023)Distance Functions and Normalization Under Stream Scenarios2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191283(1-8)Online publication date: 18-Jun-2023
  • (2023)ChatAN: Interval Adaptive Normalization based on ChatGPT Knowledge Augmentation2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP)10.1109/ICSP58490.2023.10248838(1082-1087)Online publication date: 21-Apr-2023
  • (2023)Quantile Online Learning for Semiconductor Failure AnalysisICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10097116(1-5)Online publication date: 4-Jun-2023
  • (2023)Evaluating the impact of drift detection mechanisms on stock market forecastingKnowledge and Information Systems10.1007/s10115-023-02025-y66:1(723-763)Online publication date: 12-Dec-2023
  • (2023)SymED: Adaptive and Online Symbolic Representation of Data on the EdgeEuro-Par 2023: Parallel Processing10.1007/978-3-031-39698-4_28(411-425)Online publication date: 24-Aug-2023
  • (2022)Load Quality Analysis and Forecasting for Power Data Set on Cloud PlatformCloud Computing10.1007/978-3-030-99191-3_1(3-16)Online publication date: 23-Mar-2022
  • (2020)A Metadata and Z Score-based Load-Shedding Technique in IoT-based Data Collection SystemsInternational Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2021.6.1.0236:1(363-382)Online publication date: 29-Oct-2020
  • (2020)Study on Prediction Model of Cement Precalciner Outlet Temperature2020 Chinese Automation Congress (CAC)10.1109/CAC51589.2020.9326782(3623-3626)Online publication date: 6-Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media