Big Data Preprocessing
Enabling Smart Data
Book
Chapter
The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracit...
Chapter
The negative impact on learning associated with imbalanced proportion of classes has exploded lately with the exponential growth of “cheap” data. Many real-world problems present scarce number of instances in ...
Chapter
We live in a world where data is generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithm...
Chapter
Throughout this book we have presented a complete vision about Big Data preprocessing and how it enables Smart Data. Data is only as valuable as the knowledge and insights we can extract from it. Referring to ...
Chapter
Data reduction in data mining selects/generates the most representative instances in the input data in order to reduce the original complex instance space and better define the decision boundaries between clas...
Chapter
Data discretization task transforms continuous numerical data into discrete and bounded values, more understandable for humans and more manageable for a wide range of machine learning methods. With the advent ...
Chapter
The advent of Big Data has created the necessity of new computing tools for processing huge amounts of data. Apache Hadoop was the first open-source framework that implemented the MapReduce paradigm. Apache Sp...
Chapter
The fast evolving Big Data environment has provoked that a myriad of tools, paradigms, and techniques surge to tackle different use cases in industry and science. However, because of the myriad of existing too...
Chapter
In the new era of Big Data, exponential increase in volume is usually accompanied by an explosion in the number of features. Dimensionality reduction arises as a possible solution to enable large-scale learnin...
Chapter
In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent y...