Cloud Computing is considered nowadays an attractive solution to serve the Big Data storage, proc... more Cloud Computing is considered nowadays an attractive solution to serve the Big Data storage, processing, and analytics needs. Given the high complexity of Big Data workflows and their contingent requirements, a single cloud provider might not be able alone to satisfy these needs. A multitude of cloud providers that offer myriad of cloud services and resources can be selected. However, such selection is not straightforward since it has to deal with the scaling of Big Data requirements, and the dynamic cloud resources fluctuation. This work proposes a novel cloud service selection approach which evaluates Big Data requirements, matches them in real time to most suitable cloud services, after which suggests the best matching services satisfying various Big Data processing requests. Our proposed selection scheme is performed throughout three phases: 1) capture Big Data workflow requirements using a Big Data task profile and map these to a set of QoS attributes, and prioritize cloud service providers (CSPs) that best fulfil these requirements, 2) rely on the pool of selected providers by phase 1 to then choose the suitable cloud services from a single provider to satisfy the Big Data task requirements, and 3) implement multiple providers selection to better satisfy requirements of Big Data workflow composed of multiples tasks. To cope with the multi-criteria selection problem, we extended the Analytic Hierarchy Process (AHP) to better provide more accurate rankings. We develop a set of experimental scenarios to evaluate our 3-phase selection schemes while verifying key properties such as scalability and selection accuracy. We also compared our selection approach to well-known selection schemes in the literature. The obtained results demonstrate that our approach perform very well compared to the other approaches and efficiently select the most suitable cloud services that guarantee Big Data tasks and workflow QoS requirements.
Conference: IIT 2018 : 13th International Conference on Innovations in Information TechnologyAt: Al Ain, United Arab Emirates, 2018
Big Data has gained an enormous momentum the past few years because of the tremendous volume of g... more Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.
With the advances in communication technologies and the high amount of data generated, collected,... more With the advances in communication technologies and the high amount of data generated, collected, and stored, it becomes crucial to manage the quality of this data deluge in an efficient and cost-effective way. The storage, processing, privacy and analytics are the main keys challenging aspects of Big Data that require quality evaluation and monitoring. Quality has been recognized by the Big Data community as an essential facet of its maturity. Yet, it is a crucial practice that should be implemented at the earlier stages of its lifecycle and progressively applied across the other key processes. The earlier we incorporate quality the full benefit we can get from insights. In this paper, we first identify the key challenges that necessitates quality evaluation. We then survey, classify and discuss the most recent work on Big Data management. Consequently, we propose an across-the-board quality management framework describing the key quality evaluation practices to be conducted through the different Big Data stages. The framework can be used to leverage the quality management and to provide a roadmap for Data scientists to better understand quality practices and highlight the importance of managing the quality. We finally, conclude the paper and point to some future research directions on quality of Big Data.
While the potential benefits of Big Data adoption are significant, and some initial successes hav... more While the potential benefits of Big Data adoption are significant, and some initial successes have already been realized, there remain many research and technical challenges that must be addressed to fully realize this potential. The Big Data processing, storage and analytics, of course, are major challenges that are most easily recognized. However, there are additional challenges related for instance to Big Data collection, integration, and quality enforcement. This paper proposes a hybrid approach to Big Data quality evaluation across the Big Data value chain. It consists of assessing first the quality of Big Data itself, which involve processes such as cleansing, filtering and approximation. Then, assessing the quality of process handling this Big Data, which involve for example processing and analytics process. We conduct a set of experiments to evaluate Quality of Data prior and after its pre-processing, and the Quality of the pre-processing and processing on a large dataset. Quality metrics have been measured to access three Big Data quality dimensions: accuracy, completeness, and consistency. The results proved that combination of data-driven and process-driven quality evaluation lead to improved quality enforcement across the Big Data value chain. Hence, we recorded high prediction accuracy and low processing time after we evaluate 6 well-known classification algorithms as part of processing and analytics phase of Big Data value chain.
Due to the increasing growth of Web Services, Quality of Service (QoS) is becoming a key issue in... more Due to the increasing growth of Web Services, Quality of Service (QoS) is becoming a key issue in web services community. Providers and clients need to use QoS-aware architectures to get/ensure end-to-end QoS. The QoS delivery to clients is highly affected by the web service performance itself, by the hosting platform (e.g., Application Server) and by the underlying network (e.g., Internet). Thus, even if web services together with hosting platform provide acceptable QoS, they also require sufficient available network resources to deliver end-to-end QoS. In this paper, we propose a solution approach to the problem of end-to-end QoS support for web services. Our approach rely on the utilization of a web service, called Network Resources Manager (NRM), to take care of the QoS support in the network connecting the client host and the matching web service location. NRM either relies on the network QoS capabilities (e.g., Integrated Services, Differentiated Services, Multiprotocol Label Switching), if any, or uses a measurement-based scheme to estimate the quality that can be delivered between the two locations. One of the key differentiator of our solution is that it does not require any changes to the currently used infrastructure by the users and web services providers.
Big Data distribution has benefited from the Cloud
resources to accommodate application’s QoS req... more Big Data distribution has benefited from the Cloud resources to accommodate application’s QoS requirements. In this paper, we propose Big Data distribution scheme that matches the Cloud available resources to guarantee application’s QoS given the continuously dynamic and varying resources of the Cloud infrastructure. We developed Two-Level QoS Policies (TLPS) for selecting clusters and nodes while satisfying the client’s application QoS. We also proposed an adaptive data distribution algorithm to cope with changing QoS requirements. Experiments have been conducted to evaluate both the effectiveness and the communication overhead of our proposed distribution scheme and the results we have reported are convincing. Other experiments evaluated our TLPS algorithm against other single-based QoS data distribution algorithms and the results show that TLPS algorithm adapts to the customer QoS requirements.
Data is the most valuable asset companies are proud of. When its quality degrades, the consequenc... more Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable and can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging and must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size and its fast generation, it requires mechanisms and strategies to evaluate and assess data quality in a fast and efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness and consistency. The experimentations have been conducted on Sleep disorder’s data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data and illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
The need for Quality of Service (QoS) support in the network is obvious. Several QoS aware networ... more The need for Quality of Service (QoS) support in the network is obvious. Several QoS aware network management systems have been developed in order to make multimedia applications, such as Video on Demand, available over the network with acceptable quality. In response to a user request with the desired QoS, most of these systems return an acceptance or a simple rejection depending on whether there are available resources for reservation for the request. This implies that a second attempt of the user cannot take advantage of information obtained through the first request. To overcome this limitation, a scheme, called NAFUR, was developed; it allows computing the QoS that can be supported for the time the service request is made, and at certain later times carefully chosen. Indeed, NAFUR produces a list of alternative proposals with delayed starting times or degraded QoS to be presented to the user. In this paper we define, implement, and evaluate two algorithms that can be used to compute the list of proposals. The first algorithm is based on the original NAFUR proposal computation (PC) algorithm while the second algorithm is based on the K-Nearest-Neighbors technique.
Innovations in Information Technology (IIT), Jan 1, 2011
Abstract—With the advances in sensing technology and the proliferation of mobile devices and appl... more Abstract—With the advances in sensing technology and the proliferation of mobile devices and applications, provisioning of context information has been a particularly common research topic. Many research works have proposed, designed, and implemented frameworks and ...
With the abundance of raw data generated from various sources, Big Data has become a preeminent a... more With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
Abstract Due to the advances on mobile technology, it is becoming feasible to host Web Services o... more Abstract Due to the advances on mobile technology, it is becoming feasible to host Web Services on a mobile device, making it perceived as potential data collector and provider. Hosting Web Services on mobile devices gains in importance when it comes to deliver real-time contextual data, such as current location or real-time heart rate. In addition to the characteristics of the available network, the usability of the Mobile Host depends on computational resources of the device itself. Currently, some emerging lightweight ...
With the advances in sensing technology and the proliferation of mobile devices and applications,... more With the advances in sensing technology and the proliferation of mobile devices and applications, provisioning of context information has been a particularly common research topic. Many research works have proposed, designed, and implemented frameworks and middleware infrastructures for context management. High-level context information is typically acquired from context services that aggregate raw context information sensed by sensors and mobile devices. Given the massive amount of context data processed and stored by context services and the widespread penetration of cloud computing technology in the industry, context providers now can leverage their services by deploying them on the cloud. In this paper, we describe a novel framework for context management that relies on cloud-based context services. One of the benefits of the approach is that context providers can scale up and down depending on current demand for context information.
Cloud Computing is considered nowadays an attractive solution to serve the Big Data storage, proc... more Cloud Computing is considered nowadays an attractive solution to serve the Big Data storage, processing, and analytics needs. Given the high complexity of Big Data workflows and their contingent requirements, a single cloud provider might not be able alone to satisfy these needs. A multitude of cloud providers that offer myriad of cloud services and resources can be selected. However, such selection is not straightforward since it has to deal with the scaling of Big Data requirements, and the dynamic cloud resources fluctuation. This work proposes a novel cloud service selection approach which evaluates Big Data requirements, matches them in real time to most suitable cloud services, after which suggests the best matching services satisfying various Big Data processing requests. Our proposed selection scheme is performed throughout three phases: 1) capture Big Data workflow requirements using a Big Data task profile and map these to a set of QoS attributes, and prioritize cloud service providers (CSPs) that best fulfil these requirements, 2) rely on the pool of selected providers by phase 1 to then choose the suitable cloud services from a single provider to satisfy the Big Data task requirements, and 3) implement multiple providers selection to better satisfy requirements of Big Data workflow composed of multiples tasks. To cope with the multi-criteria selection problem, we extended the Analytic Hierarchy Process (AHP) to better provide more accurate rankings. We develop a set of experimental scenarios to evaluate our 3-phase selection schemes while verifying key properties such as scalability and selection accuracy. We also compared our selection approach to well-known selection schemes in the literature. The obtained results demonstrate that our approach perform very well compared to the other approaches and efficiently select the most suitable cloud services that guarantee Big Data tasks and workflow QoS requirements.
Conference: IIT 2018 : 13th International Conference on Innovations in Information TechnologyAt: Al Ain, United Arab Emirates, 2018
Big Data has gained an enormous momentum the past few years because of the tremendous volume of g... more Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.
With the advances in communication technologies and the high amount of data generated, collected,... more With the advances in communication technologies and the high amount of data generated, collected, and stored, it becomes crucial to manage the quality of this data deluge in an efficient and cost-effective way. The storage, processing, privacy and analytics are the main keys challenging aspects of Big Data that require quality evaluation and monitoring. Quality has been recognized by the Big Data community as an essential facet of its maturity. Yet, it is a crucial practice that should be implemented at the earlier stages of its lifecycle and progressively applied across the other key processes. The earlier we incorporate quality the full benefit we can get from insights. In this paper, we first identify the key challenges that necessitates quality evaluation. We then survey, classify and discuss the most recent work on Big Data management. Consequently, we propose an across-the-board quality management framework describing the key quality evaluation practices to be conducted through the different Big Data stages. The framework can be used to leverage the quality management and to provide a roadmap for Data scientists to better understand quality practices and highlight the importance of managing the quality. We finally, conclude the paper and point to some future research directions on quality of Big Data.
While the potential benefits of Big Data adoption are significant, and some initial successes hav... more While the potential benefits of Big Data adoption are significant, and some initial successes have already been realized, there remain many research and technical challenges that must be addressed to fully realize this potential. The Big Data processing, storage and analytics, of course, are major challenges that are most easily recognized. However, there are additional challenges related for instance to Big Data collection, integration, and quality enforcement. This paper proposes a hybrid approach to Big Data quality evaluation across the Big Data value chain. It consists of assessing first the quality of Big Data itself, which involve processes such as cleansing, filtering and approximation. Then, assessing the quality of process handling this Big Data, which involve for example processing and analytics process. We conduct a set of experiments to evaluate Quality of Data prior and after its pre-processing, and the Quality of the pre-processing and processing on a large dataset. Quality metrics have been measured to access three Big Data quality dimensions: accuracy, completeness, and consistency. The results proved that combination of data-driven and process-driven quality evaluation lead to improved quality enforcement across the Big Data value chain. Hence, we recorded high prediction accuracy and low processing time after we evaluate 6 well-known classification algorithms as part of processing and analytics phase of Big Data value chain.
Due to the increasing growth of Web Services, Quality of Service (QoS) is becoming a key issue in... more Due to the increasing growth of Web Services, Quality of Service (QoS) is becoming a key issue in web services community. Providers and clients need to use QoS-aware architectures to get/ensure end-to-end QoS. The QoS delivery to clients is highly affected by the web service performance itself, by the hosting platform (e.g., Application Server) and by the underlying network (e.g., Internet). Thus, even if web services together with hosting platform provide acceptable QoS, they also require sufficient available network resources to deliver end-to-end QoS. In this paper, we propose a solution approach to the problem of end-to-end QoS support for web services. Our approach rely on the utilization of a web service, called Network Resources Manager (NRM), to take care of the QoS support in the network connecting the client host and the matching web service location. NRM either relies on the network QoS capabilities (e.g., Integrated Services, Differentiated Services, Multiprotocol Label Switching), if any, or uses a measurement-based scheme to estimate the quality that can be delivered between the two locations. One of the key differentiator of our solution is that it does not require any changes to the currently used infrastructure by the users and web services providers.
Big Data distribution has benefited from the Cloud
resources to accommodate application’s QoS req... more Big Data distribution has benefited from the Cloud resources to accommodate application’s QoS requirements. In this paper, we propose Big Data distribution scheme that matches the Cloud available resources to guarantee application’s QoS given the continuously dynamic and varying resources of the Cloud infrastructure. We developed Two-Level QoS Policies (TLPS) for selecting clusters and nodes while satisfying the client’s application QoS. We also proposed an adaptive data distribution algorithm to cope with changing QoS requirements. Experiments have been conducted to evaluate both the effectiveness and the communication overhead of our proposed distribution scheme and the results we have reported are convincing. Other experiments evaluated our TLPS algorithm against other single-based QoS data distribution algorithms and the results show that TLPS algorithm adapts to the customer QoS requirements.
Data is the most valuable asset companies are proud of. When its quality degrades, the consequenc... more Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable and can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging and must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size and its fast generation, it requires mechanisms and strategies to evaluate and assess data quality in a fast and efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness and consistency. The experimentations have been conducted on Sleep disorder’s data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data and illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
The need for Quality of Service (QoS) support in the network is obvious. Several QoS aware networ... more The need for Quality of Service (QoS) support in the network is obvious. Several QoS aware network management systems have been developed in order to make multimedia applications, such as Video on Demand, available over the network with acceptable quality. In response to a user request with the desired QoS, most of these systems return an acceptance or a simple rejection depending on whether there are available resources for reservation for the request. This implies that a second attempt of the user cannot take advantage of information obtained through the first request. To overcome this limitation, a scheme, called NAFUR, was developed; it allows computing the QoS that can be supported for the time the service request is made, and at certain later times carefully chosen. Indeed, NAFUR produces a list of alternative proposals with delayed starting times or degraded QoS to be presented to the user. In this paper we define, implement, and evaluate two algorithms that can be used to compute the list of proposals. The first algorithm is based on the original NAFUR proposal computation (PC) algorithm while the second algorithm is based on the K-Nearest-Neighbors technique.
Innovations in Information Technology (IIT), Jan 1, 2011
Abstract—With the advances in sensing technology and the proliferation of mobile devices and appl... more Abstract—With the advances in sensing technology and the proliferation of mobile devices and applications, provisioning of context information has been a particularly common research topic. Many research works have proposed, designed, and implemented frameworks and ...
With the abundance of raw data generated from various sources, Big Data has become a preeminent a... more With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
Abstract Due to the advances on mobile technology, it is becoming feasible to host Web Services o... more Abstract Due to the advances on mobile technology, it is becoming feasible to host Web Services on a mobile device, making it perceived as potential data collector and provider. Hosting Web Services on mobile devices gains in importance when it comes to deliver real-time contextual data, such as current location or real-time heart rate. In addition to the characteristics of the available network, the usability of the Mobile Host depends on computational resources of the device itself. Currently, some emerging lightweight ...
With the advances in sensing technology and the proliferation of mobile devices and applications,... more With the advances in sensing technology and the proliferation of mobile devices and applications, provisioning of context information has been a particularly common research topic. Many research works have proposed, designed, and implemented frameworks and middleware infrastructures for context management. High-level context information is typically acquired from context services that aggregate raw context information sensed by sensors and mobile devices. Given the massive amount of context data processed and stored by context services and the widespread penetration of cloud computing technology in the industry, context providers now can leverage their services by deploying them on the cloud. In this paper, we describe a novel framework for context management that relies on cloud-based context services. One of the benefits of the approach is that context providers can scale up and down depending on current demand for context information.
Uploads
Papers by Ikbal Taleb
resources to accommodate application’s QoS requirements. In
this paper, we propose Big Data distribution scheme that matches
the Cloud available resources to guarantee application’s QoS
given the continuously dynamic and varying resources of the
Cloud infrastructure. We developed Two-Level QoS Policies
(TLPS) for selecting clusters and nodes while satisfying the
client’s application QoS. We also proposed an adaptive data
distribution algorithm to cope with changing QoS requirements.
Experiments have been conducted to evaluate both the
effectiveness and the communication overhead of our proposed
distribution scheme and the results we have reported are
convincing. Other experiments evaluated our TLPS algorithm
against other single-based QoS data distribution algorithms and
the results show that TLPS algorithm adapts to the customer
QoS requirements.
resources to accommodate application’s QoS requirements. In
this paper, we propose Big Data distribution scheme that matches
the Cloud available resources to guarantee application’s QoS
given the continuously dynamic and varying resources of the
Cloud infrastructure. We developed Two-Level QoS Policies
(TLPS) for selecting clusters and nodes while satisfying the
client’s application QoS. We also proposed an adaptive data
distribution algorithm to cope with changing QoS requirements.
Experiments have been conducted to evaluate both the
effectiveness and the communication overhead of our proposed
distribution scheme and the results we have reported are
convincing. Other experiments evaluated our TLPS algorithm
against other single-based QoS data distribution algorithms and
the results show that TLPS algorithm adapts to the customer
QoS requirements.