Big Data and Artificial Intelligence in Digital Finance
To process their ever-increasing massive data, financial and insurance organizations are developi... more To process their ever-increasing massive data, financial and insurance organizations are developing and deploying data pipelines. However, state-of-the-art data management platforms have limitations in handling many and complex pipelines that blend different kinds of data stores. This chapter introduces a novel Big Data database, namely the LeanXcale database, which enables the development and management of complex pipelines in a scalable fashion. Specifically, the presented database reduces data access time independently of data size and allows for efficient process parallelization. This combination of capabilities helps to reduce the data pipeline complexity and the total cost of ownership of pipeline management. Moreover, it unveils new ways of generating value with new use cases that were previously not possible.
2018 IEEE International Congress on Big Data (BigData Congress), 2018
The new data-driven industrial revolution highlights the need for big data technologies to unlock... more The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. In this context, emerging innovative solutions exploit several underlying infrastructure and cluster management systems. However, these systems have not been designed and implemented in a "big data context", and they rather emphasize and address the computational needs and aspects of applications and services to be deployed. In this paper we present the architecture of a complete stack (namely BigDataStack), based on a frontrunner infrastructure management system that drives decisions according to data aspects, thus being fully scalable, runtime adaptable and high-performant to address the needs of big data operations and data-intensive applications. Furthermore, the stack goes beyond purely infrastructure elements by introducing techniques for dimensioning big data applications, modelling and analyzing of processes as well as provisioning data-as-a-service by exploiting a seamless analytics framework.
In the last decade new scalable data stores have emerged in order to process and store the increa... more In the last decade new scalable data stores have emerged in order to process and store the increasing amount of data that is produced every day. These data stores are inherently distributed to adapt to the increasing load and generated data. HBase is one of such data stores built after Google BigTable that stores large tables (hundreds of millions of rows) where data is stored sorted by key. A region is the unit of distribution in HBase and is a continuous range of keys in the key space. HBase lacks a mechanism to distribute the load across region servers in an automated manner. In this paper, we present a load balancer that is able to split tables into an appropriate number of regions of appropriate sizes and distribute them across servers in order to attain a balanced load across all servers. The experimental evaluation shows that the performance is improved with the proposed load balancer.
Studies in health technology and informatics, 2017
Today's rich digital information environment is characterized by the multitude of data source... more Today's rich digital information environment is characterized by the multitude of data sources providing information that has not yet reached its full potential in eHealth. The aim of the presented approach, namely CrowdHEALTH, is to introduce a new paradigm of Holistic Health Records (HHRs) that include all health determinants. HHRs are transformed into HHRs clusters capturing the clinical, social and human context of population segments and as a result collective knowledge for different factors. The proposed approach also seamlessly integrates big data technologies across the complete data path, providing of Data as a Service (DaaS) to the health ecosystem stakeholders, as well as to policy makers towards a "health in all policies" approach. Cross-domain co-creation of policies is feasible through a rich toolkit, being provided on top of the DaaS, incorporating mechanisms for causal and risk analysis, and for the compilation of predictions.
NoSQL data stores are becoming more and more popular. Graph databases are one of this kind of dat... more NoSQL data stores are becoming more and more popular. Graph databases are one of this kind of data stores. In this paper we present an overview of the implementation of snapshot isolation for Neo4j, a very popular graph database.
Big Data and Artificial Intelligence in Digital Finance
To process their ever-increasing massive data, financial and insurance organizations are developi... more To process their ever-increasing massive data, financial and insurance organizations are developing and deploying data pipelines. However, state-of-the-art data management platforms have limitations in handling many and complex pipelines that blend different kinds of data stores. This chapter introduces a novel Big Data database, namely the LeanXcale database, which enables the development and management of complex pipelines in a scalable fashion. Specifically, the presented database reduces data access time independently of data size and allows for efficient process parallelization. This combination of capabilities helps to reduce the data pipeline complexity and the total cost of ownership of pipeline management. Moreover, it unveils new ways of generating value with new use cases that were previously not possible.
2018 IEEE International Congress on Big Data (BigData Congress), 2018
The new data-driven industrial revolution highlights the need for big data technologies to unlock... more The new data-driven industrial revolution highlights the need for big data technologies to unlock the potential in various application domains. In this context, emerging innovative solutions exploit several underlying infrastructure and cluster management systems. However, these systems have not been designed and implemented in a "big data context", and they rather emphasize and address the computational needs and aspects of applications and services to be deployed. In this paper we present the architecture of a complete stack (namely BigDataStack), based on a frontrunner infrastructure management system that drives decisions according to data aspects, thus being fully scalable, runtime adaptable and high-performant to address the needs of big data operations and data-intensive applications. Furthermore, the stack goes beyond purely infrastructure elements by introducing techniques for dimensioning big data applications, modelling and analyzing of processes as well as provisioning data-as-a-service by exploiting a seamless analytics framework.
In the last decade new scalable data stores have emerged in order to process and store the increa... more In the last decade new scalable data stores have emerged in order to process and store the increasing amount of data that is produced every day. These data stores are inherently distributed to adapt to the increasing load and generated data. HBase is one of such data stores built after Google BigTable that stores large tables (hundreds of millions of rows) where data is stored sorted by key. A region is the unit of distribution in HBase and is a continuous range of keys in the key space. HBase lacks a mechanism to distribute the load across region servers in an automated manner. In this paper, we present a load balancer that is able to split tables into an appropriate number of regions of appropriate sizes and distribute them across servers in order to attain a balanced load across all servers. The experimental evaluation shows that the performance is improved with the proposed load balancer.
Studies in health technology and informatics, 2017
Today's rich digital information environment is characterized by the multitude of data source... more Today's rich digital information environment is characterized by the multitude of data sources providing information that has not yet reached its full potential in eHealth. The aim of the presented approach, namely CrowdHEALTH, is to introduce a new paradigm of Holistic Health Records (HHRs) that include all health determinants. HHRs are transformed into HHRs clusters capturing the clinical, social and human context of population segments and as a result collective knowledge for different factors. The proposed approach also seamlessly integrates big data technologies across the complete data path, providing of Data as a Service (DaaS) to the health ecosystem stakeholders, as well as to policy makers towards a "health in all policies" approach. Cross-domain co-creation of policies is feasible through a rich toolkit, being provided on top of the DaaS, incorporating mechanisms for causal and risk analysis, and for the compilation of predictions.
NoSQL data stores are becoming more and more popular. Graph databases are one of this kind of dat... more NoSQL data stores are becoming more and more popular. Graph databases are one of this kind of data stores. In this paper we present an overview of the implementation of snapshot isolation for Neo4j, a very popular graph database.
Uploads
Papers by Ricardo Jimenez-peris