Version 1
: Received: 11 October 2017 / Approved: 12 October 2017 / Online: 12 October 2017 (04:55:57 CEST)
Version 2
: Received: 16 October 2017 / Approved: 17 October 2017 / Online: 17 October 2017 (03:47:41 CEST)
How to cite:
Wang, Q.; Zhao, X.; Huang, J.; Feng, Y.; Su, J.; Luo, Z. Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives. Preprints2017, 2017100076. https://doi.org/10.20944/preprints201710.0076.v2
Wang, Q.; Zhao, X.; Huang, J.; Feng, Y.; Su, J.; Luo, Z. Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives. Preprints 2017, 2017100076. https://doi.org/10.20944/preprints201710.0076.v2
Wang, Q.; Zhao, X.; Huang, J.; Feng, Y.; Su, J.; Luo, Z. Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives. Preprints2017, 2017100076. https://doi.org/10.20944/preprints201710.0076.v2
APA Style
Wang, Q., Zhao, X., Huang, J., Feng, Y., Su, J., & Luo, Z. (2017). Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives. Preprints. https://doi.org/10.20944/preprints201710.0076.v2
Chicago/Turabian Style
Wang, Q., Jiahao Su and Zhihao Luo. 2017 "Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives" Preprints. https://doi.org/10.20944/preprints201710.0076.v2
Abstract
The concept of ‘big data’ has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.
Keywords
big data; machine learning; regularization; data quality; robust learning framework
Subject
Computer Science and Mathematics, Information Systems
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.