Almost all testbed experiments deal with different kinds of metrics which are collected from and/... more Almost all testbed experiments deal with different kinds of metrics which are collected from and/or about various kinds of resources. Despite the importance of collecting experiment metrics in the experiment life cycle, this phase is often done via ad hoc, manual, and artisanal actions such as manually combining multiple scripts, or manipulating some missing values. A few tools (e.g. Vendetta, OML) can be used for monitoring experiments. However their work is restricted to communicating metrics towards a central server, and they do not cover different features from user perspective such as drawing and archiving experiments results. In this talk, we will firstly discuss the requirements of experiment monitoring. Having a well-defined set of requirements eliminates the potential ambiguity around what should be targeted by any Experiment Monitoring Framework (EMF). The defined requirements are not testbed dependent nor technology-dependent, so any testbed community can build their own EMF by implementing these requirements, using different software systems. We will then describe our own proposition, MonEx (for long: Monitoring Experiments), which is an EMF that satisfies all the defined requirements. MonEx is built over several off-the-shelf infrastructure monitoring tools, and supports various monitoring approaches such as pull- and push-based monitoring, agent-based and agent-less monitoring. MonEx covers all the required steps of monitoring experiments from collecting metrics to archiving experiments data and producing figures. We will then demonstrate MonEx’s usability through a set of experiments , performed on the Grid’5000 testbed and being monitored by MonEx. Each of those experiments have different requirements, and as a group they show how MonEx meets all defined requirements. We show how MonEx nicely integrates the experimental workflow and how it simplifies the monitoring task, reducing the efforts of users during experimentation and pushing towards the repeatability of experiments’ analysis and metrics comparison.
In the era of Big Data & Clouds distributed databases such as NoSQL databases are taking their pl... more In the era of Big Data & Clouds distributed databases such as NoSQL databases are taking their places among the most used storage systems. Benchmarking could be used to evaluate NoSQL databases. However, most benchmarks such as YCSB focus on high-level metrics like the throughput for evaluating Cloud systems, including NoSQL databases. As a result, some low-level metrics, which give an idea about how the databases are efficient at performing their different operations and interacting with the operating system, could not be evaluated. For example, various internal behaviors such as how the data is accessed on disks are considered as black boxes since tools to analyze them are lacking. We focus on MongoDB and study its I/O system, as MongoDB has a good reputation and is located at the top of document-based NoSQL databases. Its flexible data model and its worthy integrated tools make it a very favorable choice for different kinds of applications. We designed generic tracing tools to study the performance of MongoDB’s I/O system and its behaviors inside the Linux I/O stack. In this talk, we will show, through experiments, the efficiency of our method, which could uncover the hidden reasons behind the performance issues. Our performance results show that MongoDB suffers from a reduced throughput problem when performing heavy operations, such as a secondary indexing, on a clustered MongoDB. The main cause behind that is the noisy, and at worst, the shapeless I/O access patterns. MongoDB sends its I/O requests by verifying the sequentiality of the data records in its index table, but not on the storage support where the data could be allocated by a different order. We give some insights and an ad hoc solution to overcome this performance issue.
Storage systems are getting complex to handle HPC and Big Data requirements. This complexity trig... more Storage systems are getting complex to handle HPC and Big Data requirements. This complexity triggers performing in-depth evaluations to ensure the absence of issues in all systems' layers. However, the current performance evaluation activity is performed around high-level metrics for simplicity reasons. It is therefore impossible to catch potential I/O issues in lower layers along the Linux I/O stack. In this paper, we introduce IOscope tracer for uncovering I/O patterns of storage sys-tems' workloads. It performs filtering-based profiling over fine-grained criteria inside Linux kernel. IOscope has near-zero overhead and verified behaviours inside the kernel thanks to relying on the extended Berkeley Packet Filter (eBPF) technology. We demonstrate the capabilities of IOscope to discover patterns-related issues through a performance study on MongoDB and Cassandra. Results show that clustered MongoDB suffers from a noisy I/O pattern regardless of the used storage support (HDDs or SSDs). Hence, IOscope helps to have better troubleshooting process and contributes to have in-depth understanding of I/O performance.
This poster introduces how eBPF is used as an experimental method to discover I/O issue of a clus... more This poster introduces how eBPF is used as an experimental method to discover I/O issue of a clustered MongoDB. We show how such a tracing method addresses the limitation of high-level evaluation where the focus is limited on high-level metrics.
Most computer experiments include a phase where metrics are gathered from and about various kinds... more Most computer experiments include a phase where metrics are gathered from and about various kinds of resources. This phase is often done via manual, non-reproducible and error-prone steps. Infrastructure monitoring tools facilitate collecting experiments' data to some extent. However, there is no conventional way for doing so, and there is still much work to be done (e.g. capturing user experiments) to leverage the monitoring activity for monitoring experiments. To overcome those challenges, we define the requirements of experiments monitoring, clarifying Experiment Monitoring Frameworks' scope and mainly focusing on reusability of experiments' data, and portability of experiments' metrics. We then propose MonEx EMF that satisfies those requirements. MonEx is built on top of infrastructure monitoring solutions and supports various monitoring approaches. It fully integrates into the experiment workflow by encompassing all steps from data acquisition to producing publishable figures. Hence, MonEx represents a first step towards unifying methods of collecting experiments' data.
The Cloud and Big Data movements triggered a move to more centralized, remote storage
resources. ... more The Cloud and Big Data movements triggered a move to more centralized, remote storage resources. However, the use of such remote resources raises a number of challenges at the protocol level. In this paper, we first evaluate the influence of several factors, including network latency, on the performance of the NFS protocol. Then, we explore how statistical methods such as a fractional factorial design of experiments could have helped to drastically reduce the number of required experiments while still providing a similar amount of information about the impact of the factors.
CertifyIt is a model-based testing tool. It is used in the offline test generation process. In th... more CertifyIt is a model-based testing tool. It is used in the offline test generation process. In this type of generation process, the tests are always generated from a test model without any connection with the SUT. Therefore, CertifyIt generates functional tests from test models written using Unified Modeling Language and Object Constraint Language (UM-L/OCL). By the same token, MBeeTle is another test generation tool that uses random test generation algorithms. Moreover, CertifyIt is the tool that is used in this project work. It is worth mentioning that its time of generation depends strongly on the used model. In other words, if the model is very large i.e. it has thousands number of targets, CertifyIt takes several hours or even several days to generate the tests. This is considered as a problem in this field of study, hence, this project work is a serious attempt to approach the problem and give appropriate solutions. Having considered the above mentioned problem, it is reasonable to look at the solutions offered for tackling such a problem. The following lines are devoted to these solutions in 8brief. MBeeTle is used prior to CertifyIt to generate test cases to reach maximum number of test targets in a very short time in order to know if its generation strategies can reduce the test generation time or not. Then, CertifyIt generates tests for the remaining targets. Furthermore, as a second solution, we investigate to parallelize the test generation of CertifyIt, by sending set of test targets either using a multi-threaded or distributed approach. It is important however to mention that it is also planned to deploy these algorithms on Mésocentre of University de Franche-Comté(UFC). Eventually, we try to distribute the targets by using several distribution strategies to determine its efficiency to reduce the test generation time. Evidently, all these steps are discussed in details in the upcoming chapters with apposite scientific experimentations and examples.
Almost all testbed experiments deal with different kinds of metrics which are collected from and/... more Almost all testbed experiments deal with different kinds of metrics which are collected from and/or about various kinds of resources. Despite the importance of collecting experiment metrics in the experiment life cycle, this phase is often done via ad hoc, manual, and artisanal actions such as manually combining multiple scripts, or manipulating some missing values. A few tools (e.g. Vendetta, OML) can be used for monitoring experiments. However their work is restricted to communicating metrics towards a central server, and they do not cover different features from user perspective such as drawing and archiving experiments results. In this talk, we will firstly discuss the requirements of experiment monitoring. Having a well-defined set of requirements eliminates the potential ambiguity around what should be targeted by any Experiment Monitoring Framework (EMF). The defined requirements are not testbed dependent nor technology-dependent, so any testbed community can build their own EMF by implementing these requirements, using different software systems. We will then describe our own proposition, MonEx (for long: Monitoring Experiments), which is an EMF that satisfies all the defined requirements. MonEx is built over several off-the-shelf infrastructure monitoring tools, and supports various monitoring approaches such as pull- and push-based monitoring, agent-based and agent-less monitoring. MonEx covers all the required steps of monitoring experiments from collecting metrics to archiving experiments data and producing figures. We will then demonstrate MonEx’s usability through a set of experiments , performed on the Grid’5000 testbed and being monitored by MonEx. Each of those experiments have different requirements, and as a group they show how MonEx meets all defined requirements. We show how MonEx nicely integrates the experimental workflow and how it simplifies the monitoring task, reducing the efforts of users during experimentation and pushing towards the repeatability of experiments’ analysis and metrics comparison.
In the era of Big Data & Clouds distributed databases such as NoSQL databases are taking their pl... more In the era of Big Data & Clouds distributed databases such as NoSQL databases are taking their places among the most used storage systems. Benchmarking could be used to evaluate NoSQL databases. However, most benchmarks such as YCSB focus on high-level metrics like the throughput for evaluating Cloud systems, including NoSQL databases. As a result, some low-level metrics, which give an idea about how the databases are efficient at performing their different operations and interacting with the operating system, could not be evaluated. For example, various internal behaviors such as how the data is accessed on disks are considered as black boxes since tools to analyze them are lacking. We focus on MongoDB and study its I/O system, as MongoDB has a good reputation and is located at the top of document-based NoSQL databases. Its flexible data model and its worthy integrated tools make it a very favorable choice for different kinds of applications. We designed generic tracing tools to study the performance of MongoDB’s I/O system and its behaviors inside the Linux I/O stack. In this talk, we will show, through experiments, the efficiency of our method, which could uncover the hidden reasons behind the performance issues. Our performance results show that MongoDB suffers from a reduced throughput problem when performing heavy operations, such as a secondary indexing, on a clustered MongoDB. The main cause behind that is the noisy, and at worst, the shapeless I/O access patterns. MongoDB sends its I/O requests by verifying the sequentiality of the data records in its index table, but not on the storage support where the data could be allocated by a different order. We give some insights and an ad hoc solution to overcome this performance issue.
Storage systems are getting complex to handle HPC and Big Data requirements. This complexity trig... more Storage systems are getting complex to handle HPC and Big Data requirements. This complexity triggers performing in-depth evaluations to ensure the absence of issues in all systems' layers. However, the current performance evaluation activity is performed around high-level metrics for simplicity reasons. It is therefore impossible to catch potential I/O issues in lower layers along the Linux I/O stack. In this paper, we introduce IOscope tracer for uncovering I/O patterns of storage sys-tems' workloads. It performs filtering-based profiling over fine-grained criteria inside Linux kernel. IOscope has near-zero overhead and verified behaviours inside the kernel thanks to relying on the extended Berkeley Packet Filter (eBPF) technology. We demonstrate the capabilities of IOscope to discover patterns-related issues through a performance study on MongoDB and Cassandra. Results show that clustered MongoDB suffers from a noisy I/O pattern regardless of the used storage support (HDDs or SSDs). Hence, IOscope helps to have better troubleshooting process and contributes to have in-depth understanding of I/O performance.
This poster introduces how eBPF is used as an experimental method to discover I/O issue of a clus... more This poster introduces how eBPF is used as an experimental method to discover I/O issue of a clustered MongoDB. We show how such a tracing method addresses the limitation of high-level evaluation where the focus is limited on high-level metrics.
Most computer experiments include a phase where metrics are gathered from and about various kinds... more Most computer experiments include a phase where metrics are gathered from and about various kinds of resources. This phase is often done via manual, non-reproducible and error-prone steps. Infrastructure monitoring tools facilitate collecting experiments' data to some extent. However, there is no conventional way for doing so, and there is still much work to be done (e.g. capturing user experiments) to leverage the monitoring activity for monitoring experiments. To overcome those challenges, we define the requirements of experiments monitoring, clarifying Experiment Monitoring Frameworks' scope and mainly focusing on reusability of experiments' data, and portability of experiments' metrics. We then propose MonEx EMF that satisfies those requirements. MonEx is built on top of infrastructure monitoring solutions and supports various monitoring approaches. It fully integrates into the experiment workflow by encompassing all steps from data acquisition to producing publishable figures. Hence, MonEx represents a first step towards unifying methods of collecting experiments' data.
The Cloud and Big Data movements triggered a move to more centralized, remote storage
resources. ... more The Cloud and Big Data movements triggered a move to more centralized, remote storage resources. However, the use of such remote resources raises a number of challenges at the protocol level. In this paper, we first evaluate the influence of several factors, including network latency, on the performance of the NFS protocol. Then, we explore how statistical methods such as a fractional factorial design of experiments could have helped to drastically reduce the number of required experiments while still providing a similar amount of information about the impact of the factors.
CertifyIt is a model-based testing tool. It is used in the offline test generation process. In th... more CertifyIt is a model-based testing tool. It is used in the offline test generation process. In this type of generation process, the tests are always generated from a test model without any connection with the SUT. Therefore, CertifyIt generates functional tests from test models written using Unified Modeling Language and Object Constraint Language (UM-L/OCL). By the same token, MBeeTle is another test generation tool that uses random test generation algorithms. Moreover, CertifyIt is the tool that is used in this project work. It is worth mentioning that its time of generation depends strongly on the used model. In other words, if the model is very large i.e. it has thousands number of targets, CertifyIt takes several hours or even several days to generate the tests. This is considered as a problem in this field of study, hence, this project work is a serious attempt to approach the problem and give appropriate solutions. Having considered the above mentioned problem, it is reasonable to look at the solutions offered for tackling such a problem. The following lines are devoted to these solutions in 8brief. MBeeTle is used prior to CertifyIt to generate test cases to reach maximum number of test targets in a very short time in order to know if its generation strategies can reduce the test generation time or not. Then, CertifyIt generates tests for the remaining targets. Furthermore, as a second solution, we investigate to parallelize the test generation of CertifyIt, by sending set of test targets either using a multi-threaded or distributed approach. It is important however to mention that it is also planned to deploy these algorithms on Mésocentre of University de Franche-Comté(UFC). Eventually, we try to distribute the targets by using several distribution strategies to determine its efficiency to reduce the test generation time. Evidently, all these steps are discussed in details in the upcoming chapters with apposite scientific experimentations and examples.
Uploads
Papers by Abdulqawi Saif
resources. However, the use of such remote resources raises a number of challenges at the
protocol level. In this paper, we first evaluate the influence of several factors, including network
latency, on the performance of the NFS protocol. Then, we explore how statistical methods
such as a fractional factorial design of experiments could have helped to drastically reduce the
number of required experiments while still providing a similar amount of information about
the impact of the factors.
Thesis Chapters by Abdulqawi Saif
test models written using Unified Modeling Language and Object Constraint Language (UM-L/OCL). By the same token, MBeeTle is another test generation tool that uses random test
generation algorithms.
Moreover, CertifyIt is the tool that is used in this project work. It is worth mentioning that its time of generation depends strongly on the used model. In other words, if the model is very large i.e. it has thousands number of targets, CertifyIt takes several hours or even
several days to generate the tests. This is considered as a problem in this field of study, hence, this project work is a serious attempt to approach the problem and give appropriate solutions.
Having considered the above mentioned problem, it is reasonable to look at the solutions offered for tackling such a problem. The following lines are devoted to these solutions in 8brief. MBeeTle is used prior to CertifyIt to generate test cases to reach maximum number of test targets in a very short time in order to know if its generation strategies can reduce the test generation time or not. Then, CertifyIt generates tests for the remaining targets.
Furthermore, as a second solution, we investigate to parallelize the test generation of CertifyIt, by sending set of test targets either using a multi-threaded or distributed approach.
It is important however to mention that it is also planned to deploy these algorithms on Mésocentre of University de Franche-Comté(UFC). Eventually, we try to distribute the targets
by using several distribution strategies to determine its efficiency to reduce the test generation time. Evidently, all these steps are discussed in details in the upcoming chapters with
apposite scientific experimentations and examples.
resources. However, the use of such remote resources raises a number of challenges at the
protocol level. In this paper, we first evaluate the influence of several factors, including network
latency, on the performance of the NFS protocol. Then, we explore how statistical methods
such as a fractional factorial design of experiments could have helped to drastically reduce the
number of required experiments while still providing a similar amount of information about
the impact of the factors.
test models written using Unified Modeling Language and Object Constraint Language (UM-L/OCL). By the same token, MBeeTle is another test generation tool that uses random test
generation algorithms.
Moreover, CertifyIt is the tool that is used in this project work. It is worth mentioning that its time of generation depends strongly on the used model. In other words, if the model is very large i.e. it has thousands number of targets, CertifyIt takes several hours or even
several days to generate the tests. This is considered as a problem in this field of study, hence, this project work is a serious attempt to approach the problem and give appropriate solutions.
Having considered the above mentioned problem, it is reasonable to look at the solutions offered for tackling such a problem. The following lines are devoted to these solutions in 8brief. MBeeTle is used prior to CertifyIt to generate test cases to reach maximum number of test targets in a very short time in order to know if its generation strategies can reduce the test generation time or not. Then, CertifyIt generates tests for the remaining targets.
Furthermore, as a second solution, we investigate to parallelize the test generation of CertifyIt, by sending set of test targets either using a multi-threaded or distributed approach.
It is important however to mention that it is also planned to deploy these algorithms on Mésocentre of University de Franche-Comté(UFC). Eventually, we try to distribute the targets
by using several distribution strategies to determine its efficiency to reduce the test generation time. Evidently, all these steps are discussed in details in the upcoming chapters with
apposite scientific experimentations and examples.