Keywords

1 Introduction

Service-based systems heavily depend on the interface of selected services used to implement specific features. However, service providers do not know, in general, the impact of their changes, during the evolution Web services, on the applications of subscribers. The subscribers are reluctant, in general, to use Web services that are risky and not stable [10]. Thus, analyzing and predicting Web service changes is critical but also challenging because of the distributed and dynamic nature of services. As a consequence, recent studies were proposed to understand the evolution of Web services especially at the interface level [9, 10, 19].

The few existing work studying the evolution of Web services are limited to the detection of changes between different releases [9] or the analysis of the types of change introduced to the service interfaces. Romano et al. [10] proposed a tool called WSDLDiff to detect changes between different versions of a Web service interface based on structural and textual similarities measure. Fokaefs et al. [9] suggested another tool, called VTracker, which uses XML differencing techniques, to detect changes in WSDL documents. However, both tools are just limited to the detection of changes between different Web service releases and did not target the problem of predicting future changes or providing recommendations to the service providers or subscribers about the quality of services interface based on the collected data.

We use, in this paper, the changes collected from previous Web service releases to address the following problems. Most of the changes in a web service interface typically affect the systems of its subscribers. Thus, it is important for subscribers to estimate the risk of using a specific service and compare its evolution to other services offering the same features in order to reduce the effort of adapting their applications in the next releases. Subscribers prefer to use, in general, Web services that are stable with a low risk to include bugs and introduce major revisions in the future. In addition, the prediction of interface changes may help web service providers to better manage available resources (e.g. programmers’ availability) and efficiently schedule required maintenance activities to improve the quality of developed services. In fact, the prediction of Web service changes can be used to identify potential quality issues that may occur in the future releases. Thus, it is easier to fix these quality issues as early as possible before that they become more complex.

In this work, we propose a machine learning approach based on Artificial Neural Networks (ANN) [5] to predict the evolution of Web services interface from the history of previous releases’ metrics. The predicted interface metrics value are used to predict and estimate the risk and the quality of the studied Web services. We evaluated our approach on a set of 6 popular Web services including more than 90 releases. We report the results on the efficiency and effectiveness of our approach to predict the evolution of Web services interfaces and provide useful recommendations for both service providers and subscribers. The results indicate that the prediction results of several Web service metrics, on the different releases of the 6 Web services, were similar to the expected ones with very low deviation rate. Furthermore, most of the quality issues of Web service interfaces were accurately predicted, for the next releases, with an average precision and recall higher than 82 %. The survey conducted with a set of developers also shows the relevance of prediction technique for both service providers and subscribers.

The remainder of this paper is as follows: Sect. 2 presents the related work; Sect. 3 gives an overview about the proposed predictive modelling technique; Sect. 4 discusses the obtained evaluation results and possible threats of validity of our experiments. Finally, Sect. 5 concludes and proposes future research directions.

2 Related Work

We summarize, in this section, the existing work that focus on studying the evolution of Web services.

Fokaefs et al. [9] used the VTracker tool to calculate the minimum edit distance between two trees representing two WSDL files. The outcome of the tool is the percentage of interface changes such as added, changed and removed elements among the XML models of two WSDL interfaces. Romano et al. [10] proposed a similar tool called WSDLDiff that can identify fewer types of change than VTracker that may help to analyze the evolution of a WSDL interface without manually inspecting the XML changes. Aversano et al. [11] analyzed the relationships between sets of services change during the service evolution based on formal concept analysis. The main focus of the study is to extract relationships among services.

Several studies have been proposed to measure the similarity between different Web services to search for relevant ones or classify them but not to analyze their evolution. Xing et al. [12] suggested a tool, called UMLDiff to detect differences between different UML diagram versions to understand their evolution. Zarras et al. [13] detected evolution patterns and regularities by adapting Lehman’s laws of software evolution. The study was focused only on Amazon Web Services (AWS).

Based on this overview of existing work in the area of Web services evolution, the problem of predicting the evolution of Web services was not addressed before. In addition, the use of machine learning algorithms in Web services was limited to the classification of Web Services and their messages into ontologies [22]. These existing machine learning-based studies are not concerned with the analysis of the releases within the same Web service but more about mining different Web services (one release per service) to classify them in order to help the composition of services process for the subscribers based on their requirements.

Another category of related work focus on detecting and specifying antipatterns in SOA and Web services which is a relatively new area. Rotem-Gal-Oz described the symptoms of a range of SOA antipatterns [15]. Kral et al. [18] listed seven “popular” SOA antipatterns that violate accepted SOA principles. A number of research works have addressed the detection of such antipatterns. Recently, Moha et al. [20] have proposed a rule-based approach called SODA for SCA systems (Service Component Architecture). Later, Palma et al. [19] extended this work for Web service antipatterns in SODA-W using declarative rule specification based a domain-specific language (DSL) to specify/identify the key symptoms that characterize an antipattern using a set of WSDL metrics. Rodriguez et al. [14, 15] and Mateos et al. [16] provided a set of guidelines for service providers to avoid bad practices while writing WSDLs based on eight bad practices in the writing of WSDL for Web services. Recently, Ouni et al. [7] proposed a search-based approach based on standard GP to find regularities, from examples of Web service antipatterns, to be translated into detection rules.

In the next section, we describe the adaptation of the ANN algorithm to the prediction of the evolution of Web services.

3 Prediction of Web Services Evolution Using Artificial Neural Networks

As described in Fig. 1, our technique takes as input the previous releases of the Web service interfaces to predict its evolution, an exhaustive list of metrics to predict, and a list of detection rules to detect potential future quality issues, called Web service antipatterns, based on the predicted metrics. Ou approach generates as output the set of predicted evolution metrics values and possible future quality issues for the next release.

Fig. 1.
figure 1

Prediction approach: overview

Our prediction model is based on machine learning algorithm using Aritificial Neural Network (ANN) model. In the following we describe the ANN adaptation to our Web services evolution prediction problem.

Artificial Neural Network (ANN): ANN models are mathematical models inspired by the functioning of nervous systems [2–5], which are composed by a number of interconnected entities, the artificial neurons. ANNs are based on learning which is a characteristic of adaptive systems which are capable of improving their performance on a problem as a function of previous experience [1]. An ANN builds a map between a set of inputs and the corresponding outputs. This model can deal with non-linear regression analysis with noisy signals and also incomplete data. In this work, we used a Multi-Layer Perception ANN (MLP-ANN) [2]. It is well-known that MLP-ANNs are universal approximators, which makes them attractive for modeling black-box functions for which little information about their form is known. The output of each neuron is expressed as follows:

$$ y = \phi \left( {\sum\limits_{i = 1}^{n} {w_{i} a_{i} + b} } \right) $$

where w denotes the weight vector, a is the input vector, b is the bias, \( \phi \) is the activation function, and n is the number of neurons in the hidden layer. A hidden neuron influences the network outputs only for those inputs that are near to its center, therefore requiring an exponential number of hidden neurons to cover entirely the input space. For this reason, it is suggested that MLP-ANN are suitable for problems with a small number of inputs like our prediction of Web services evolution problem.

We applied the ANN as being among the most reliable predictive models, especially, in the case of noisy and incomplete data. Its architecture is chosen to be a multilayered architecture in which all neurons are fully connected; weights of connections have been, randomly, set at the beginning of the training. Regarding the activation function, the sigmoid function is applied [5] as being adequate in the case of continuous data. The network is composed of three layers: the first layer is composed of p input neurons. Each neuron is assigned the value \( x_{kt} \). The hidden layer is composed of a set of hidden neurons. The learning algorithm is an iterative algorithm that allows the training of the network. Its performance is controlled by two parameters. The first parameter is the momentum factor that tries to avoid local minima by stabilizing weights. The second factor is the learning rate which is responsible of the rapidity of the adjustment of weights.

Learning process. Before the learning process, the data used in the training set should be normalized. In our case, we choose to apply the min-max technique since it is among the most accurate techniques according to [8]. In our adapation, we used the following list of metrics from the literature [7] to predict for the next Web service releases, as described in Table 1.

Table 1. Web service interface metrics used.

During the learning process, our ANN solutions are represented as follows: let us denote by O the matrix that includes numerical values related to the set of metrics to predict. O is composed of n lines and p columns where n is equal to the number of metrics to predict and p is equal to the number of steps (releases).

$$ O = \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \ldots & {x_{1p} } \\ {x_{21} } & {x_{22} } & \ldots & {x_{2p} } \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ {x_{n1} } & {x_{n2} } & \ldots & {x_{np} } \\ \end{array} } \right] $$

Learning technique. There are several learning algorithms, depending on whether the ANN model is linear or non-linear. Our MLP model utilizes a supervised learning technique called back-propagation (BP) for training the network. MLP is a modification of the standard linear perceptron and can distinguish data that are not linearly separable. BP is one of the most popular and common training procedures used, that is described in depth in the literature [5]. Our BP neural network has been trained with moderate values for the learning rate (α) and momentum (μ). The weights are recalculated every time a training vector is presented to the network. The exit strategy or the termination condition for the network is based on the sum square error until it reaches a certain threshold assigned prior to running the network. Our implementation is based on the WekaFootnote 1 framework with it default configuration.

4 Experiments

In order to evaluate the ability of our prediction framework to efficiently predict the evolution trends of Web services, we conducted a set of experiments based on six widely used Web services. In this section, we first present our research questions, the experiments setup and then describe and discuss the obtained results. Finally, we discuss some threats related to our experiments.

4.1 Research Questions and Evaluation Metrics

We defined the following three research questions that address the applicability, performance, and the usefulness of our Web services prediction approach. The three research questions are as follows:

  • RQ1: To what extent can our approach predict correctly the evolution of Web services?

  • RQ2: To what extent can our approach predict Web service quality issues?

  • RQ3: Can our prediction results be useful for developers?

To answer RQ1, we calculated the deviation between the actual expected metrics value and the predicted ones using our ANNs algorithm on different Web service relases. To this end, we considered the list of metrics described in the previous section. The error rate is defined as follows:

$$ e\_rate(M_{i} ,S) = \left| {PM_{i} - EM_{i} } \right|, $$

where PM is the predicted metric value using ANNs and EM is the expected value. We calculated the error rate for one and many steps (releases) over time for every of the considered Web services.

To answer RQ2, we calculated precision and recall scores to compare between the predicted Web services antipatterns and the expected ones:

$$ {\text{RC}}recall = \frac{{{\text{predicted}}\,{\text{antipatterns}} \cap {\text{expected}}\,{\text{antipatterns}}}}{{{\text{expected}}\,{\text{antipatterns}}}} \in [0,1] $$
$$ {\text{PR}}_{\text{precision}} = \frac{{{\text{predicted}}\,{\text{antipatterns}} \cap {\text{expected}}\,{\text{antipatterns}}}}{{{\text{predicted}}\,{\text{antipatterns}}}} \in [0,1] $$

We considered five types of antipatterns from the literature [20]: Multi-service (MS: a service implementing many operations), Nano-service (NS: too-fine grained service), Chatty-service (CS: a service including many fine-grained operations), Data-service (DS: a service including only data access operations) and Ambigous service (AS: a service including ambiguous names of operations). More details about existing Web service antipatterns can be found in the following references [19, 20]. We used the manually defined rules in [7] to detect the predicted and actual Web service antipatterns.

To answer RQ3, we used a post-study questionnaire that collects the opinions of developers on our prediction results. We also wished to assess how these results may help developers working on services-based applications. To this end, we asked 24 software developers, including 11 developers working in a Web development startup and providing some Web services for customers from the automotive industry sector. The remaining participants are 13 graduate students (8 MSc and 5 PhD students) in Software Engineering at the University of Michigan-Dearborn. 9 out the 13 students are working either full-time or part-time programmers in Software industry. All the participants are volunteers and have a minimum of 2 years experience as a developer. The participants were first asked to fill out a pre-study questionnaire containing five questions. The questionnaire helped to collect background information such as their role within the company, their programming experience, their familiarity with Web services and service-based applications. In addition, all the participants attended one lecture about Web service antipatterns and passed five tests to evaluate their performance to evaluate the desing of Web services using quality metrics.

4.2 Studied Web Services

We selected these 6 Web services for our validation because different releases of their WSDL interface are publicly available and belong to different categories. Table 2 provides some descriptive statistics about these six Web services:

Table 2. Web service statistics
  • Amazon EC2: Amazon Elastic Compute Cloud is a web service that offers resizable compute capacity in the cloud. In this study we have considered a total of 44 releases from 2006 until 2014.

  • Amazon Simple Queue Service (Amazon SQS) offers reliable hosted queues for storing messages exchanged between computers. We considered in our study a total of 6 releases.

  • Fedex Track service offers accurate update of the status of shipments. We used 10 releases from this Web service.

  • FedEx Ship Service: the Ship Service provides functionalities for managing package shipments and their options. A total of 17 releases are considered in our experiments from this Web service.

  • FedEx Rate Service: the Rate Service provides the shipping rate quote for a specific service combination depending on the origin and destination information supplied in the request. We used 18 releases for our prediction algorithm.

  • Amazon Mechanical Turk Requester: it is a web service that provides an on-demand, scalable, human workforce to complete jobs that humans can do better than computers such as recognizing objects in photos. We used 15 releases developed between 2005 until 2012.

4.3 Results

Results for RQ1. Figures 2, 3 and 4 summarize the outcome for the first research question. Most of the Web service metrics were predicted accurately on the different Web services with an average error rate lower than 2.8 as described in Fig. 3. For Fedex Track service and Fedex Rate service, the average error rate is the highest. This could be related to the lower training set comparing to the other services. For Amazon EC2, the metrics were predicted with a minimum deviation score of 2.1 due to the large training set available for this service. However, Amazon Simple Queue has one of the lowest deviation score of 1.8. This confirms that our prediction results are independent from the size of the Web services to evaluate and the training data.

Fig. 2.
figure 2

Average error rate (e_rate) on the different Web services

Fig. 3.
figure 3

Average error rate (e_rate) per metric on the different Web services

Fig. 4.
figure 4

Average error rate (e_rate) of the different metrics on the Web services (except Amazon Simple Queue) per prediction step

Figure 3 shows more detailed results of the average error rate by metric. The results clearly support the claim that our results are independent from the type of metric to predict. However, the error rate depends on the range of every metric. For example, it is expected that the number of operations per service may get the highest error rate since the variation of this metric is high and its range is larger than the other metrics.

Figure 4 describes the ability of our algorithm to predict the metrics value not only for the next release but for up-to the next 5 releases. In fact, the obtained results on the different Web services (except Amazon Simple Queue, not considered due to the limited number of releases) clearly show that the error rate for the 5th upcoming release is minimal with a score less than 4.5.

To answer the first research question, our approach is able to predict the evolution of Web service metrics with a high accuracy.

Results for RQ2. Figures 5, 6 and 7 summarize our findings. Overall, most of the expected quality issues (Web service antipatterns) for the next release were identified as described in Fig. 5. Our prediction algorithm were able to detect Web service antipatterns on the different services with an average precision and recall respectively higher than 84 % and 86 %. For Fedex Ship service and Amazon Mechanical Turk, the precision is higher than for the other systems with more than 88 %. This can be explained by the fact that these systems are smaller than others and contain a lower number of antipatterns to predict. For FedEx Rate Service, the precision is also high (around 82 %), i.e., most of the predicted antipatterns are correct. This confirms that our precision results are independent from the size of the Web services to evaluate. For Amazon EC2, the precision is one of the lowest (81 %) but still acceptable. Amazon EC2 contains a high number of ambiguous services that are difficult to detect using metrics.

Fig. 5.
figure 5

Average precision and recall of the predicted antipatterns on the different Web services

Fig. 6.
figure 6

Figure 5. Average precision and recall per antipattern type on the different Web services

Fig. 7.
figure 7

Average precision and recall on the Web services (except Amazon Simple Queue) per prediction step

The same observations are valid for the recall. The average recall on the six Web services was higher than 86 %. For Fedex Track service and Amazon EC2, the precision is higher than for the other systems with more than 90 %. This can be explained by the fact that these systems are using more training data than others. For FedEx Ship Service, the precision is also high (around 81 %), thus the impact of the size of the training data was not high on the quality of the prediction results. An interesting observation is that the obtained precision and recall scores are conflicting since the services with the highest precision scores received the lowest recall. However, both scores are acceptable for all the Web services.

One key strength of our technique is the ability to predict quality issues not only for the next release but for up-to the next 5 releases as described in Fig. 7. In fact, the obtained results clearly show that both precision and recall are still high for all the Web services when predicting quality issues for the 5th upcoming release with an average higher than 73 %. We did not consider in our evaluation the Amazon Simple Queue due to the limited number of available releases.

To summarize, it is clear based on the obtained results that our approach predict Web service quality issues with a high accuracy.

Results for RQ3. To answer RQ3, we used a post-study questionnaire to the opinions of the participants about their experience in using our prediction tool and results. The questionnaire asked participants to rate their agreement on a Likert scale from 1 (complete disagreement) to 5 (complete agreement) with the following statements:

  • The predicted metrics value are useful to estimate the risk and cost of using a specific Web service and may help the developer to select the best service based on his preferences.

  • The predicted quality issues may help developers and managers to better schedule maintenance activities and reduce the cost of fixing these issues.

The agreement of the participants was 4.6 and 4.8 for the first and second statements respectively. This confirms the usefulness of our prediction results for the developers considered in our experiments.

The remaining questions of the post-study questionnaire were about the benefits and also limitations (possible improvements) of our prediction approach. We summarize in the following the feedback of the developers. Most of the participants mention that our results may help developers of the service providers to decide when to refactor their Web service implementations. For example, they can consider to perform some refactorings when the prediction results show that the quality issue may become much more severe after few releases such as a multi-service antipattern. Thus, the developers liked the functionality of our tool that helps them to identify refactoring opportunities as early as possible.

The participants found our tool helpful for also the developers of Service-based applications. In fact, the majority of the participants mention that they consider the stability and quality of services as important critieria to select a Web service when several options are available. The non-stability of a service may negatively impact their systems in the future and it is maybe an indication that the used service includes many bugs explaining several new releases. Furthermore, the subject liked the prediction of antipatterns feature since it is easier for them to evaluate the quality of Web services in next releases based on the number of antipatterns rather than analyzing a set of metrics.

The participants also suggested some possible improvements to our prediction approach. Some participants believe that it will be very helpful to extend the tool by adding a new feature to automatically calculate the risk, cost and benefits of using different possible Web services. Another possibly suggested improvement is to use some visualization techniques to evaluate the evolution of the We services to easily estimate their stability.

4.4 Threats to Validity

There are four types of threats that can affect the validity of our experiments. We consider each of these in the following paragraphs.

Conclusion validity is concerned with the statistical relationship between the treatment and the outcome. The parameter tuning of the ANNs used in our experiments creates a threat that we need to evaluate in our future work. The parameters’ values used in our experiments are found by trial-and-error. However, it would be an interesting perspective to design an adaptive parameter tuning strategy for our approach so that parameters are updated during the execution in order to provide the best possible performance.

Internal validity is concerned with the causal relationship between the treatment and the outcome. We used a set of manually defined rules for the detection of possible future quality issues in the next releases [19]. However, the obtained results depends on the used rules and some of the predicted quality issues may not be important antipatterns to fix by the service provider’s developers.

Construct validity is concerned with the relationship between theory and what is observed. To evaluate the relevance of our prediction results, we interviewed a group of developers. For the selection threat, the participant diversity in terms of experience could affect the results of our study. We addressed the selection threat by making sure that all the participants have almost the same experience in web development and familiarity with Web services. For the fatigue threat, we did not limit the time to fill the questionnaire and we also sent the questionnaires to the participants by email and gave them the required time to complete each of the required tasks.

External validity refers to the generalizability of our findings. In this study, we performed our experiments on six widely used Web services belonging to different domains and having different sizes. However, we cannot assert that our results can be generalized to other Web services, and to other practitioners. Future replications of this study are necessary to confirm our findings. In addition, our study was limited to the use of specific metrics. Future replications of this study are necessary to confirm our findings.

5 Conclusion and Future Work

We proposed, in this paper, an approach to predict the evolution of Web services. In fact, it is maybe important for subscribers to estimate the risk of using a selected service and compare its evolution to other possible services offering the same features. Furthermore, the prediction of future changes may help web service providers to better manage available resources and efficiently schedule required maintenance activities to improve the quality. In this paper, we propose to use machine learning, based on Artificial Neuronal Networks, for the prediction of the evolution of Web services interface design. To validate the proposed approach, we collected training data from quality metrics of previous releases from 6 Web services. The validation of our prediction techniques shows that the predicted metrics value, such as number of operations, on the different releases of the 6 Web services were similar to the expected ones with a very low deviation rate. In addition, most of the quality issues of the studied Web service interfaces were accurately predicted, for the next releases, with an average precision and recall higher than 82 %. The survey conducted with developers also shows the relevance of prediction technique for both service providers and subscribers.

Future work involves validate our prediction technique with additional metrics, Web services and developers to conclude about the general applicability of our methodology. Furthermore, in this paper we only focused on the prediction of Web services evolution. We plan to extend the approach by defining new risk measures based on the predicted metrics value. In addition, we will study of the impact of predicted quality issues on the usability and popularity of Web services over time.