Search Results - Springer

Sort By Newest First Oldest First

Article

Open Access

Smart Data Driven Decision Trees Ensemble Methodology for Imbalanced Big Data

Differences in data size per class, also known as imbalanced data distribution, have become a common problem affecting data quality. Big Data scenarios pose a new challenge to traditional imbalanced classifica...

Diego García-Gil, Salvador García, Ning Xiong, Francisco Herrera in Cognitive Computation (2024)

Download PDF (1075 KB) View Article
Book

Big Data Preprocessing

Enabling Smart Data

Julián Luengo, Diego García-Gil… (2020)
Chapter

Smart Data

The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracit...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Imbalanced Data Preprocessing for Big Data

The negative impact on learning associated with imbalanced proportion of classes has exploded lately with the exponential growth of “cheap” data. Many real-world problems present scarce number of instances in ...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Introduction

We live in a world where data is generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithm...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Final Thoughts: From Big Data to Smart Data

Throughout this book we have presented a complete vision about Big Data preprocessing and how it enables Smart Data. Data is only as valuable as the knowledge and insights we can extract from it. Referring to ...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Data Reduction for Big Data

Data reduction in data mining selects/generates the most representative instances in the input data in order to reduce the original complex instance space and better define the decision boundaries between clas...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Big Data Discretization

Data discretization task transforms continuous numerical data into discrete and bounded values, more understandable for humans and more manageable for a wide range of machine learning methods. With the advent ...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Big Data Software

The advent of Big Data has created the necessity of new computing tools for processing huge amounts of data. Apache Hadoop was the first open-source framework that implemented the MapReduce paradigm. Apache Sp...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Big Data: Technologies and Tools

The fast evolving Big Data environment has provoked that a myriad of tools, paradigms, and techniques surge to tackle different use cases in industry and science. However, because of the myriad of existing too...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Dimensionality Reduction for Big Data

In the new era of Big Data, exponential increase in volume is usually accompanied by an explosion in the number of features. Dimensionality reduction arises as a possible solution to enable large-scale learnin...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Chapter

Imperfect Big Data

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent y...

Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego… in Big Data Preprocessing (2020)
Article

Open Access

DPASF: a flink library for streaming data preprocessing

Data preprocessing techniques are devoted to correcting or alleviating errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many...

Alejandro Alcalde-Barros, Diego García-Gil, Salvador García… in Big Data Analytics (2019)

Download PDF (1078 KB) View Article
Chapter and Conference Paper

On the Use of Random Discretization and Dimensionality Reduction in Ensembles for Big Data

Massive data growth in recent years has made data reduction techniques to gain a special popularity because of their ability to reduce this enormous amount of data, also called Big Data. Random Projection Rand...

Diego García-Gil, Sergio Ramírez-Gallego… in Hybrid Artificial Intelligent Systems (2018)
Article

Open Access

A comparison on scalability for batch big data processing on Apache Spark and Apache Flink

The large amounts of data have created a need for new frameworks for processing. The MapReduce model is a framework for processing and generating large-scale datasets with parallel and distributed algorithms. ...

Diego García-Gil, Sergio Ramírez-Gallego, Salvador García… in Big Data Analytics (2017)

Download PDF (486 KB) View Article

15 Result(s)

Smart Data Driven Decision Trees Ensemble Methodology for Imbalanced Big Data

Big Data Preprocessing

Smart Data

Imbalanced Data Preprocessing for Big Data

Introduction

Final Thoughts: From Big Data to Smart Data

Data Reduction for Big Data

Big Data Discretization

Big Data Software

Big Data: Technologies and Tools

Dimensionality Reduction for Big Data

Imperfect Big Data

DPASF: a flink library for streaming data preprocessing

On the Use of Random Discretization and Dimensionality Reduction in Ensembles for Big Data

A comparison on scalability for batch big data processing on Apache Spark and Apache Flink

Our Content

Other Sites

Help & Contacts