Big Data Preprocessing
Enabling Smart Data
Article
Differences in data size per class, also known as imbalanced data distribution, have become a common problem affecting data quality. Big Data scenarios pose a new challenge to traditional imbalanced classifica...
Book
Chapter
The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracit...
Chapter
The negative impact on learning associated with imbalanced proportion of classes has exploded lately with the exponential growth of “cheap” data. Many real-world problems present scarce number of instances in ...
Chapter
We live in a world where data is generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithm...
Chapter
Throughout this book we have presented a complete vision about Big Data preprocessing and how it enables Smart Data. Data is only as valuable as the knowledge and insights we can extract from it. Referring to ...
Chapter
Data reduction in data mining selects/generates the most representative instances in the input data in order to reduce the original complex instance space and better define the decision boundaries between clas...
Chapter
Data discretization task transforms continuous numerical data into discrete and bounded values, more understandable for humans and more manageable for a wide range of machine learning methods. With the advent ...
Chapter
The advent of Big Data has created the necessity of new computing tools for processing huge amounts of data. Apache Hadoop was the first open-source framework that implemented the MapReduce paradigm. Apache Sp...
Chapter
The fast evolving Big Data environment has provoked that a myriad of tools, paradigms, and techniques surge to tackle different use cases in industry and science. However, because of the myriad of existing too...
Chapter
In the new era of Big Data, exponential increase in volume is usually accompanied by an explosion in the number of features. Dimensionality reduction arises as a possible solution to enable large-scale learnin...
Chapter
In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent y...
Article
Data preprocessing techniques are devoted to correcting or alleviating errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many...
Chapter and Conference Paper
Massive data growth in recent years has made data reduction techniques to gain a special popularity because of their ability to reduce this enormous amount of data, also called Big Data. Random Projection Rand...
Article
The large amounts of data have created a need for new frameworks for processing. The MapReduce model is a framework for processing and generating large-scale datasets with parallel and distributed algorithms. ...