Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System

Rak, Tomasz; Żyła, Rafał

doi:10.3390/app12126115

Open AccessArticle

Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System

by

Tomasz Rak

^*

and

Rafał Żyła

Department of Computer and Control Engineering, Rzeszow University of Technology, Powstancow Warszawy 12, 35-959 Rzeszow, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(12), 6115; https://doi.org/10.3390/app12126115

Submission received: 6 May 2022 / Revised: 10 June 2022 / Accepted: 13 June 2022 / Published: 16 June 2022

(This article belongs to the Special Issue Web Infrastructure Enhancement and Performance Evaluation)

Download

Browse Figures

Versions Notes

Abstract

:

The increasing amount of data from web systems data is becoming one of the most valuable resources for information retrieval and knowledge discovery. The huge content of information makes it an important area for data mining research. To analyze the dependencies of the outcoming data, expressed as query scenarios, we present a new approach for evaluating the behavior of interactive web systems by applying different data mining techniques to solve the problem. We propose tools that take outcoming logs as input, analyze them, and provide information about web client actions. Qualitative and quantitative automatic evaluation of the data can explain the connections between the most significant parameters of the system in particular scenarios. In this paper, we propose a new method, which can be used to efficiently verify the type of client behavior of a web system or design of the system. The analysis of results demonstrates the possibility of efficient pattern search.

Keywords:

machine learning; web system; association rules; regresion trees; data mining

1. Introduction

The system should exhaust performance goals including functionality, extendibility, security, flexibility, reliability, usability, connectivity, and privacy. However, many Web-based systems exposed significant performance issues after they were already deployed [1]. Most of these performance problems were due to software and hardware issues deployed without a web system workload analysis. Performance engineering [2] provides the information needed to build a system that meets performance requirements. The idea is to identify performance flaws even before the system is deployed. The huge content of information on the web system makes it a sophisticated area for data mining research. Web systems present challenging aspects and tasks for extracting information from an output data stream.

Earlier research related to the performance of the web system [2,3] gave rise to the idea of analyzing logs with the use of data mining. It is an innovative approach to the analysis of [3] customers and system behavior by determining the variability of the system output parameters obtained from logs. Our approach differs from the solutions used in web content/structure mining [4]. We focus on log analysis in terms of user’s behavior such as web usage and utilization mining [5]. This research is designed with the main aim of extracting knowledge from the particular web system log. In this study, a log analysis of the web system was performed on sent-to-the-system requests. The examined platform was based on two types of hardware and the same stock exchange software. Additionally, to perform an analysis, the data set contained timestamp data. After visiting any API endpoint, the user leaves some information. This information can be collected, stored in logs and analyzed.

Data mining is a multidisciplinary field involving artificial intelligence, machine learning, databases, statistics, and information retrieval. It is commonly defined as the process of discovering useful patterns or knowledge from different data sources such as texts, images, audio, video, etc. Data mining comprises some classes of tasks performed [4]: association rule—the search for relationships between several variables, clustering—the task of finding and extracting groups and structures in the data, classification—the activity of generalizing known structures to apply to the new data set, and regression—it tries to extract a function that models the data with the least error.

Another solution is web mining. It is the application of data mining techniques to discover patterns from the web system behavior. Web mining can be divided into three major categories [4]: web content mining aims to extract useful information or knowledge from web data contents such as text, images, audio, etc., web structure mining tries to discover useful knowledge from the structure of hyperlinks and tags, and web usage mining refers to the discovery of user’s usage logs, web server logs, application server logs, etc.

Web usage mining also called log mining is a process of recording user access data on the Web and collecting data in the form of logs. Web usage mining is parsed into three distinctive phases preprocessing, pattern discovery, and pattern analysis. We use data mining tools in web usage mining techniques that can bring many new possibilities for the analysis of web system log files. In this article, we follow one case (the stock exchange web system) and describe our study on how to build data mining models to understand user behavior and thoughtful system design. Therefore, in this investigation, our objective was also to check application data mining algorithms, which can help system architects and performance engineers build modern web systems.

The article focuses on how to extract useful information from web system outcoming log data with Association Rules (AR) mining. Regression Trees (RT) were used to extend the proposed analysis. Next, we discuss the strengths and weaknesses of AR and RT. AR and RT are well-known techniques that are often used in the field of knowledge discovery in data mining. The article presents AR and RT that describe the request behavior patterns within the system log data. Data analysis is based on prepared tools. The Python programming language is used to conduct analyses. In this extensive study, 54 test results (more than 32 million records) are included.

The primary focus of this paper is to propose an approach that helps engineers detect dependencies between web clients. It can also be used as a tool to select the architecture for different numbers of requests and scenarios. The proposed approach can also be used for in-depth behavior analysis of systems and their clients. Unlike Web system modeling, experiments with predefined loads are more desirable in industry, but the task is more challenging. Although most of the previous studies focused on modeling ahead prediction, the approach with testing load has not been thoroughly explored [2]. This paper aims to extensively evaluate various real system structures for multiple parameters of the output dataset.

That is why we propose to use well-known algorithms to obtain knowledge of web system behavior. We employed some different types of algorithms, such as AR and RT, to obtain the results [6]. We collected our data from a real application, which includes a large dataset. Finally, we also tried to extract rules from features with the help of AR and to extract the decision path for choice data points based on RT.

The paper is organized as follows. Section 3 introduces the web system. In this section, we provide an overview of the different hardware architectures used in this work. Section 2 includes related works. In the literature, AR and RT are used mainly in health systems, financial market systems, banking industry, trade systems, and Internet systems that produce huge amounts of data. Section 4 gives basic information about data mining techniques. We use two popular data mining areas, namely AR and RT. Section 5 discusses and presents the results. We built the analysis applications by making use of the available algorithms. Finally, Section 6 discusses the approach and concludes the paper. The presented work also points out several possible future research directions.

2. Related Work

Data mining is the process of extracting useful information from data and using it to make decisions. The data mining process can be divided into three parts: data, analysis, and decision making. The information obtained in the decision-making process represents the main source of decisions. Therefore, the goal of data mining is to extract useful information from data to make more appropriate decisions. In recent years, data mining has been used in many industries to help make appropriate decisions [7]: incident data analysis, intrusion detection, crime trend analysis, prediction of heart disease, bank direct marketing, predicting stock market prices, detection of diabetes, insurance domain, spam filtering, software quality prediction, software bug detection, e-commerce system user profiling, etc. It seems that data mining methods are very important to obtain a pattern in many sectors [8,9,10,11]. The most basic characteristic of data mining is the processing of a large amount of data, which is necessary to discover unknown and hidden information, extract valuable information and use this kind of information to make important decisions [12]. Data mining can identify useful information that traditional analysis methods cannot find. Many authors worked on data mining techniques. Traditional algorithms (e.g., FP-Growth algorithm, Apriori algorithm [13]) have not been able to meet data mining requirements in the aspect of efficiency. The authors of article [14] proposed a new data mining algorithm based on an AR algorithm. In the article [15], the authors analyze AR in data mining technology. In the article [16], the authors propose a new design for a very simple data-driven binary classifier and conduct an empirical study of its performance. The classification system consists of highly interpretable fuzzy parameters. The authors in [17] propose a machine learning approach to identify project artifacts that are significantly at risk of failure.

On the other hand, the analysis of log data is critical in understanding the behavior of clients and the success of any business [18,19]. The authors in [20] examine some other available web analytics methodologies and their accuracy. These studies elevated web analytics as an indispensable tool for e-Commerce. Nguyen [21] employs a web usage mining process to uncover interesting patterns in web server access log files. By incorporating feature construction, he gained a wide knowledge of users’ access patterns. Clickstream data provide information about the sequence of pages or the path viewed by users as they navigate a website [22]. The collected clickstream data provide valuable insight into how the website is used by its customers. The sequence of viewed sites and actions taken are commonly referred to as paths. Bucklin [23] asserts that two major categories of data are used for analysis: user-centric data and site-centric data. User-centric data permit the creation of a user profile. Site-centric data represent the activities and behaviors of visitors to the website. In this paper, we focus on site-centric data. Site-centric data permit focussed data mining.

Two major challenges are involved with web system usage mining. First, we are processing the raw data to provide a picture of how the website is used. Second, we are filtering the results of different data mining sets to present rules and patterns. Furthermore, we identify some categories of related work, namely (i) profiling data (static and dynamic), (ii) the web page data (static, dynamic, semantic), and (iii) log data. This approach consists of algorithms to analyze output data. As we mentioned earlier, web mining may be defined as the revelation and analysis of subsidiary information from the web system and can be divided into two major components: web content/structure mining and web usage/utilization mining. Web content mining can be defined as the automatic search and retrieval of information from a web system. Web usage mining can be described as the analysis of utilizing access patterns, through the mining of log files from a particular web system.

There are existing ways to process the raw data to provide a picture of how the Web site is being used. In our approach, we filter the results of different data mining sets to present rules and patterns. Because analysis of the output data is a challenging task, profiling outgoing data is a research direction that has not been studied much so far. To the best of our knowledge, only a few existing papers were proposed.

The last part of the literature survey will focus on pattern discovery in logfile data. To perform any output log assessment, the input data of the web client must be known. Sharma and Lodhi [24] developed a decision tree algorithm, which is an efficient mining method to extract log files. Authors increase the accuracy of generating non-redundant association rules with less time complexity. The generated training rules are helpful for finding different information related to the log file. Liu et al. [25] propose a novel knowledge discovery algorithm based on double-evolving frequent pattern trees. New frequent patterns are extracted from the incremental data. In [26], the authors introduced a novel algorithm that maintains the functional dependencies of dynamic datasets. They propose an algorithm to discover and maintain functional dependencies in dynamic datasets. In [27], mixed data relation algorithm was used to recognize the pattern from the SQL log data. In papers [28,29] authors presented an incremental algorithm for the discovery of functional dependencies. This new regex-based validation method to efficiently select tuples follows an incremental strategy to explore the search space based on the functional dependencies previously held. The research presented in [30] provided evidence that associative classification produces more accurate results compared to other classification models. They compared the performance of the proposed model with Naïve Bayes, Support Vector Machine, Random Forests, and Decision Tree and the results showed that the proposed model outperformed the others. The article [7] looks at data mining procedures and their various applications such as agriculture, transportation, botany, and zoology. In this research article, the authors have compared different machine learning algorithms. The modeling approach presented in this paper differs from other works on data mining because we take a well-known request scenario [3]. The goal of our research is to analyze different algorithms that can understand the behavior of clients. As a result, we focus on the comparison of two types of well-known algorithms and attempt to identify which algorithm gives satisfactory analysis results for our dataset. In our research, we used three parts of web usage mining:

Data Preprocessing—Real-world data and some databases are incomplete, inconsistent, and not understandable. Data preprocessing is a mining technique that integrates databases and makes raw data understandable and consistent.
Pattern Discovery—Web usage pattern discovery techniques are used to discover interesting patterns as statistical analysis. Knowledge obtained by statistical analyzing results may help to improve, e.g., performance. The association rule is one of the basic rules of data mining and is mostly used in web usage mining.
Pattern Analysis—In this step, all irrelevant rules or patterns discovered in the above phases are separated, and relevant rules or patterns are extracted.

3. Web System

These days, people have shown an increasing interest in using a trade-based web system. It becomes difficult to service a large group of client requests for most web systems. This is due to insufficient knowledge about the system’s behavior. Existing research works predict future system behavior [31,32], but these methods are not based on data from different types of systems. Several studies have been written and prepared regarding client and system behavior [21,31].

3.1. Interactive Web System

We prepared the benchmark of the stock exchange web system, as an application programming interface based on IBM DayTrader Benchmark, and data mining models for behavior analysis. First, we describe an interactive transactional environment used in this study. We accept a completed buy or sale stock offer as a transaction. We assume that the client while creating a buy offer specifies the maximum price at which they want to buy, while when creating a sale offer, we assume that the given price is the minimum price to be paid for the share. From the user’s point of view, the option to buy and sell shares is only possible between the user and the company (we do not assume a direct exchange of shares between users). This means that the user can only buy or sell shares that are made available in limited numbers by the company. There is, however, an indirect relationship, such as other users, through the transactions they make, that influences the final price and number of shares of a given company. The purchase of stocks occurs when there is a sell offer with a value equal to or lower than the target share price. Buy offers are collected in the following order: first from the most expensive to the oldest. The sell of shares occurs when the price of the selected stock reaches a value equal to or greater than the target price set by the user. The sell offers are sorted as follows: from the cheapest to the oldest. If the demand is equal to the supply, the price of the stock does not change. Changes in the stock exchange take place through buy or sell transactions. We assume that the current share prices are dictated by the client and their interest in buying or selling. No external interference in the price of shares is necessary. The value of a single share will be determined based on the last transaction made. Buy or sell takes place automatically if the price of the selected share reaches the value specified in the buy/sell offer by the client.

We accept a completed purchase or sale offer as a transaction (Figure 1a). We assume that the user when creating a purchase offer specifies the maximum price at which they want to buy, while when creating a sale offer we assume that the given price is the minimum price to be paid for the share. From the user’s point of view, the option to buy and sell shares is only possible between the user and the company. This means that the user can only buy or sell shares that are made available in a limited number by the company. There is, however, an indirect relationship, as other users, through the transactions they make, have an influence on the final price and number of shares in a given company. Shares are sold when the price of the selected stock reaches a value equal to or higher than the target price set by the user. Sell offers are sorted as follows: from the cheapest to the oldest. The purchase of shares occurs when there is a sale offer with a value equal to or lower than the target share price. Offers are collected in the following order, first from the most expensive to the oldest. If the stay is equal to the supply, the price of the share does not change. Changes in the stock exchange occur through purchase or sale transactions of shares carried out on it. We assume that current stock prices are dictated by users, their interest in buying, and actual purchases. The value of a single share is determined based on the last transaction made.

The diagram (Figure 1b) shows the main functionality of the application. After the registration process and logging in, the client has the opportunity to check current offers on the stock exchange, as well as publish their own buy or sell offers of the held shares. The sale and purchase are carried out automatically if the price of the selected share reaches the value specified in the buy/sell offer by the user.

From a business point of view, the IT system implements business processes. The user, by going through the individual application windows (use cases), performs this process. There is an area of artificial intelligence that deals with the detection of such processes on the basis of system log records - process mining. The model of the detected process is a graph. In a classic system designed according to the principles of business process management, you do not need to detect anything because the designer provides this graph, creating a specification. However, in adaptive case management systems the graph is almost complete and the users themselves decide which paths to follow to implement the process [33]. Adaptive case management systems are not explored. The problem with them is that at the design stage, it is impossible to predict what the resource consumption will be, because we do not know the process. Once the mining process has detected what users are doing, system behavior can be examined for performance. By reversing this process, we try to know the performance (output parameters in log files) to determine what the clients are doing (what is the scenario of their actions).

3.2. Container System

To have an adequate system, we use Docker container structures from the stock exchange system. Orchestration is an advanced DevOps tool that allows you to accurately and automatically manage the entire deployment. Orchestration also allows for very easy scaling of services. It should be remembered that Docker Swarm supports load balancing, i.e., automatic load balancing, due to which external traffic and load are shared between all replicas of a given service. All the services of both applications reside on a single overlay subnet.

In our experiments, all input data have been split into three sets (scenarios). While experiments are carried out, we have several parameters to be defined for each structure. As a general setting for all the structures, the number of CPU (8 and 12) and RAM size (20, 30 GB) are selected. We perform our analysis considering all customers’ requests.

3.3. Services

All parts of the system (front-end and back-end) were implemented based on Web Development Frameworks and experiments were carried out in a PIONIER cloud environment (https://cloud.pionier.net.pl/ visited on 12 June 2022). We prepared the system benchmark of the stock exchange web system. Then, we prepare the automatic customer, which generates outgoing log data. With this data, we will be able to prepare an analysis. Our system consists of containers:

The main back-end application contains Django and the application logic.
The main front-end application contains Vue.js and is responsible for the front-end of the application.
The main Celery (asynchronous task queue/job queue) is responsible for the distribution of tasks.
Celery Beat (a periodic task) is responsible for creating the schedule and running the tasks.
The main database built based on the official distribution of the PostgreSQL container is responsible for launching and maintaining the main database of the system.
The main Rabbitmq built on the bases of the official distribution of the RabbitMQ container is responsible for the main system queuing tasks.

Our automatic customer consists of containers:

Automatic customer back-end application contains Django and client application logic.
Automatic customer front-end application contains Vue.js and is responsible for the front-end of the application.
Automatic customer Celery (asynchronous task queue/job queue) is responsible for the distribution of customers’ tasks.
Database of automatic customer system built based on the official distribution of the PostgreSQL container. It is responsible for launching and maintaining the system responses database.
The Rabbitmq client built on the bases of the official distribution of the RabbitMQ container is responsible for customer queueing tasks.

A job is called via a RabbitMQ-based broker. The responsibility for completing the task rests with Celery, which is used to queue and process long tasks asynchronously and independently of the application. After finding the appropriate testing class, new processes are set up, each as a separate customer.

3.4. Tests Scenarios

The purpose of this paper is to analyze the operation of a Web application that reproduces the behavior of the stock exchange. Performance tests were conducted in a multi-container environment using an automatic client. An attempt was made to verify the correct operation and to detect frequent dependencies in the data collected during the application’s operation.

Note that the test (execution of the code of each class) takes place in the queuing broker (RabbitMQ). The automatic testing client application can perform the test with a specified number of users (parallel poll processes). Data collection is carried out using classes responsible for individual tests. The individual testing classes implement the logic by the points described in each test (Table 1):

Scenario 1—Buy to the limit and put up sales offers ( $S 1$ ).
Scenario 2—Buy and sell ( $S 2$ ).
Scenario 3—Buy more while there is money ( $S 3$ ).

The scenarios take into account more complex activities (

S 1

and

S 3

) or less complex activities (

S 2

). The system can carry out only one type of operation (buy/sell) or use many functionalities (buying, viewing offers, waiting for the right price). We checked how the prepared architecture copes with incoming requests. Specifically, we monitored:

$t i m e_s p e n t_o n_s q l_q u e r i e s$ —Time in milliseconds spent on executing SQL queries.
$t i m e_t a k e n$ —Time in milliseconds needed for processing the query content.
$c p u_c u r r e n t_u s a g e$ —The percentage of CPU usage while the query is running.
$c p u_t i m e_s p e n t_u s e r$ —Time in seconds that the CPU spends performing client’s tasks.
$c p u_t i m e_s p e n t_s y s t e m$ —Time in seconds that the CPU spends performing system tasks.
$c p u_t i m e_s p e n t_i d l e$ —Time in seconds that the CPU spent waiting for tasks.
$m e m o r y_u s a g e$ —Memory usage as a percentage.
$c p u_u s a g e_a g g r e g a t e d$ —Aggregated CPU usage for over 30 s, expressed as a percentage.
$c o n t a i n e r_i d$ —The container ID.

The experiment would stop if the last request is finished. We checked how the prepared architecture copes with incoming applications. It is critical to use high-quality data samples to train the data mining model so that it can produce precise and reliable data predictions. The data points collected must be indicative and ample; otherwise, the conclusions drawn will be incorrect and biased.

4. The Scope of Research Works

In the modern technical world, most researchers prefer to use machine learning or data mining techniques because of their huge advantages. Thus, we tried to incorporate one of these approaches to solve real-world problems. We propose tools to analyze the web client’s behavior. Without applying data mining techniques, the log data analysis process requires a lot of manual effort to complete, and is very time-consuming. Our work was focused on:

Data analysis from various test scenarios.
Analysis of the impact of a different number of requests.
Analysis of the application’s operation for various hardware configurations.

4.1. Association Rules

Association rule learning is known as a rule-based machine learning system. An unsupervised learning method is typically used to establish a relationship among variables. This is a descriptive technique that is often used to analyze large data sets to discover interesting relationships or patterns. AR allows data scientists to identify trends, associations, and co-occurrences between data sets inside large data collections. In a trade system, for example, associations infer knowledge about the buying behavior of consumers for different items. In the health system, it is used to better diagnose patients. Similarly, data mining techniques are useful for web system parameters analysis, etc.

If A is a set of a items and B is a set of b transactions, with each b element being a subset of A. An AR is a rule of the form

X \to Y

, where X and Y are disjoint subsets of A having support and confidence above a minimum threshold [34]. Support gives an idea of how frequent an itemset is in all transactions. Briefly, support is the fraction of the total number of transactions in which the itemset occurs. Confidence defines the likelihood of the occurrences of a consequent given that we already have the antecedents. In short, confidence is the conditional probability of occurrence of a consequent given the antecedent.

Currently, AR is being widely used in predicting, recommending and other analysis fields [35]. AR is applied in various fields, focusing on prediction, recommendation, or analysis. The most widely accepted evaluation indexes of association rules are support and confidence. We can find many applications of AR in the recommendation domain. Rafiqul [36] proposed AR to identify the fraudulent behavior of policyholders and help insurance companies improve business strategies. Yang [37] used an AR to discover potential users’ interests by using historical behavior data without domain knowledge. Zhang [38] proposed an optimization algorithm to mine frequent item sets of high-dimensional data. Some articles presented several new evaluation methods to measure AR. In [39], a market basket analysis was performed of a large hardware company operating in the retail sector, and related product categories were identified.

The AR mining algorithm can achieve the purpose of mining required information from massive data through certain rules. Software has been prepared to detect frequent dependencies in data using AR.

4.2. Regression Trees

The RT is a machine learning method to build prediction models from specific datasets [40]. The data are split into multiple blocks recursively, and the prediction model is fit to each of such partitions of the prediction model. The RT algorithm is able to produce rules.

RTs are used for prediction-type problems when the response variable is continuous. RTs work to produce accurate predictions based on the set of

i f

-

e l s e

conditions. The purpose of the analysis carried out by RT is to create a set of

i f

-

e l s e

conditions that allow for an accurate prediction [41].

The classification using the RT method is divided into two steps. The first step is to obtain the corresponding knowledge and results from the data obtained. The main content is building a tree model. It can be achieved by constructing a tree and defining a constant value on each subregion corresponding to the terminal node of the tree. In this article this step has been completed. The second step is to use the generated tree model to make predictions on unknown data samples. Additionally, the RT analysis method based on AR can effectively uncover the information hidden in the log data.

5. Methodology, Experiments and Results

Important for us is recognition of the nature of traffic generated by the client in the Internet system based on selected parameters in the system log. For this purpose, a tool with predefined behavior scenarios and a tool for collecting initial data were prepared.

5.1. Proposed Methodology

The operation of applications for various architectures is examined. The data were analyzed using several approaches: visual analysis, association rules, and regression tree. The first step was a visual analysis based on the data presented in the charts. However, it is burdensome and in more cases, quite difficult due to the lack of automation. In some cases, it may even be sufficient, and it will be possible to draw specific conclusions based on it. It does not always make sense to use complex solutions to analyze a particularly small amount of data. In this article, we used two next solutions. The standard path of data analysis was used: data acquisition (Section 5.1.1), data preprocessing (Section 5.1.2), implementation of algorithms (Section 5.1.3) and data analysis (Section 5.2).

5.1.1. Data Acquisition

Data were acquired for two hardware architectures

A 1

and

A 2

(Table 2) with different number of containers (1; 5; 10). A total of 27 tests cases

P 1

–

P 27

for the

A 1

architecture (Figure 2) and 27 test cases

P 28

–

P 54

for the

A 2

architecture (with three different test scenarios (

S 1

;

S 2

;

S 3

) and three different numbers of customer requests (100,000; 400,000; 700,000) for a particular test case) were executed.

A web server receives many kinds of requests based on prepared scenarios.

5.1.2. Data Preprocessing

There are two types of system tracking, i.e., general tracking and customized tracking. In general tracking, information is collected from web page history logfile. In customized tracking the information is gathered in a prepared tool. Logs are commonly used in the system management, as logs are often the only data available that record detailed system runtime activities or behaviors in production. The process of creating such records is called data logging. Logs are generated by a wide variety of programmable technologies, including networking devices, operating systems, software, and more. For instance, web servers use log files to record data about website visitors. In general, advanced analytics methods taking modeling into account can play a significant role in extracting patterns from log data. After creating an outgoing data set from an automatic client application, we pre-process our data. For AR we convert the numerical values into categorical values and for RT we convert the categorical values into numerical values.

5.1.3. Implementation of Algorithms

After collecting the data set we prepared Python-based tools for data analysis. These tools were used to draw conclusions about the data. The libraries used to develop the tool are as follows: NumPy [42], Pandas [43], Scikit-learn [44], Matplotlib [45], Seaborn [46] and MLxtend [47].

5.2. Results

Web usage mining is the task of applying data mining techniques to discover usage patterns from web system data. AR were detected for each of the architectures separately, because they require special data processing (other operations were problematic due to memory requirements owing to the limitations of the method). RT was based on all records. The data range is the same in both cases. Based on the results of the whole examination system, for more complete understanding of the proposed system, some parameters are used as a benchmark to evaluate the work.

5.2.1. AR Analysis

AR represent one of the very important concepts of machine learning and allow us to learn about the data structure. They help in the analysis of a set of variables in order to find repeating relationships occurring in it. In this article, rules find relationships between sets of elements of every distinct request. This information can be used in the future as a basis for selecting system parameters. AR were generated for each of 54 cases (27 cases for

A 1

and 27 cases for

A 2

). The steps in generating results (Algorithm 1).

Algorithm 1 Pseudocode of the proposed AR

input data table, support, number of bins for discretization

1: for every PARAMETER do

2: if variance of PARAMETER is equal to zero then

3: remove PARAMETER

4: else

continue

5: end if

6: end for

7: for every REQUEST do

8: if REQUEST contains NaN then

9: remove REQUEST

10: else

11: discretization (

K B i n s D i s c r e t i z e r

from

S c i k i t - l e a r n

library to discretize continuous features) of numerical parameters with quantile strategy

12: end if

13: end for

14: for every PARAMETER do

15: one-hot encoding of PARAMETER

16: end for

17: discretization of continuous values in data with quantile strategy

18: one-hot encoding data

19: generate association rules with Apriori method (

m l x t e n d

library)

output association rules

Parameters as antecedents:

m e m o r y_u s a g e

,

c p u_t i m e_s p e n t_u s e r

,

c p u_t i m e_s p e n t_

system,

c p u_t i m e_s p e n t_i d l e

. Parameters as consequents:

n u m_s q l_q u e r i e s

,

c p u_t i m e_s p e n t_

user,

c p u_t i m e_s p e n t_s y s t e m

,

c p u_t i m e_s p e n t_i d l e

. The steps after obtaining results from the AR algorithm were as follows:

Finding antecedents with the same consequents for all architectures ( $A 1_C 1$ , $A 1_C 5$ , $A 1_C 10$ and $A 2_C 1$ , $A 2_C 5$ , $A 2_C 10$ ).
Searching pairs with the biggest support and confidence.
Joining duplicates.
Tabular summary division according to scenario (Table 3) or number of requests (Table 4).

Possible rules for all data (we chose the first and second one):

$m e m o r y_u s a g e ⟹ n u m_s q l_q u e r i e s$
$c p u_u s a g e_c u r r e n t ⟹ n u m_s q l_q u e r i e s$
$c p u_t i m e_s p e n t_u s e r ⟹ c p u_t i m e_s p e n t_i d l e$
$c p u_t i m e_s p e n t_u s e r ⟹ c p u_t i m e_s p e n t_s y s t e m$

AR can tell us a lot about data. Their task is to find dependencies in the data. They show the relationship between the features. The first rule describes the situation where there is a similar number of SQL queries. Repeatability was noticed for a specific number of containers and a specific architecture. The first summary of the AR analysis results includes

m e m o r y_u s a g e

(predecessor) and

n u m_s q l_q u e r i e s

(successor), for the same value of

n u m_s q l_q u e r i e s

(9–18). As one can see in Table 3 for architectures, e.g.,

A 2_C 5

(5 containers),

A 2_C 10

(10 containers), memory consumption (

m e m o r y_u s a g e

predecessor) is similar regardless of scenario and the number of requests (slightly higher for architecture

A 1

). For architectures with more containers, memory consumption successively increases from 15 to 20 (P1–P9), for 23–24 (P10–P18) to 25–27 (P19–P27) for

A 1

and from 13 to 16 (P28–P36), for 16–18 (P37–P45) to 17–20 (P46–P54) for

A 2

. For the same number of requests (e.g.,

P 10

,

P 13

,

P 16

), the memory consumption (Table 4) is even closer (1–2%) for different scenarios. AR proved to be very useful in this case. The scenario and the number of requests have been shown to have no significant impact on memory usage.

The second summary of the AR analysis results includes

c p u_u s a g e_c u r r e n t

(predecessor) and

n u m_s q l_q u e r i e s

(successor), for the same value of

n u m_s q l_q u e r i e s

(9–18). Processor load (

c p u_u s a g e_c u r r e n t

predecessor) is similar regardless of scenario but in this case is connected to used architecture. For all architectures with more containers, processor load successively decreases (Table 5). For

A 2

architecture the processor load is smaller than for

A 1

architecture. Furthermore, the scenario has been shown to have no significant impact on processor load. With this method, it will be difficult to recognize the scenario.

5.2.2. RT Analysis

Once an RT is built it can be used to predict the requests class of the customer or the behavior scenario. The steps in generating results (Algorithm 2). The steps after obtaining the results from the RT algorithm were as follows:

Return to the categorical values.
Ordering according to the number of processors, the number of containers, the scenario and the number of requests.
Division into architecture $A 1_C 1$ , $A 1_C 5$ , $A 1_C 10$ and $A 2_C 1$ , $A 2_C 5$ , $A 2_C 10$ .
Preparation of charts (Figure 3a–d).

Algorithm 2 Pseudocode of the proposed RT

input data table

1: for every REQUEST do

2: if REQUEST contains NaN then

3: remove REQUEST

4: else

5: adding the parameters (

n u m b e r_o f_c p u

,

R A M

,

n u m b e r_o f_c o n t a i n e r s

,

n u m b e r_o f_q u e r i e s_g l o b a l

,

s c e n a r i o

) of experiments to dataset

6: end if

7: end for

8: orginal encoding (the

s k l e a r n . p r e p r o c e s s i n g

package) of categorical parameters (

s c e n a r i o

,

c o n t a i n e r_i d

)

9: generating tree with parameter

c p u_t i m e_s p e n t_u s e r

as target (

S c i k i t - l e a r n

decision tree classifier)

output tree structure

The number of containers for

A 2

does not affect the value of

c p u_t i m e_s p e n t_u s e r

(Figure 3a). More processors ensure consistent processing regardless of the number of containers. An identical trend can be observed in the case of the implementation of scenarios on different platforms (Figure 3b). The medians for the

S 1

and

S 3

scenarios are similar (Figure 3b), and for the

S 2

scenario it is always much higher. This makes it possible to distinguish between test cases (scenarios). For the

S 2

scenario, the median for different number of containers (Figure 3c) remains almost constant, unlike the other two cases where there are much larger discrepancies. The same is true for the chart of different numbers of requests (Figure 3d), although perhaps not quite as clearly as it is for the previous chart (Figure 3c).

As the presented analysis shows, the results for the

S 2

scenario differ significantly from the other test cases. As the description (Table 1) shows, this scenario contains a decidedly different query sequence. The analysis of the output data only confirms this state, which allows us to conclude that it can be useful for recognizing the behavior of the client (scenario) of the system. The 2D analysis presented is complemented by the results of the 3D analysis (Figure 4a,b), which only confirms the previous conclusions. It has been shown that the scenario has a significant impact on CPU usage. Results of

c p u_t i m e_s p e n t_u s e r

in scenario

S 2

(Figure 4a) for different platforms (

A 1

and

A 2

) and different number of client requests (100,000; 400,000; 700,000) are stable for

A 1

and

A 2

, respectively. The same situation was observed in results of

c p u_t i m e_s p e n t_u s e r

in scenario

S 2

(Figure 4b) for different number of containers (

C 1

,

C 5

and

C 10

) and different number of client requests (100,000; 400,000; 700,000). The median of

c p u_t i m e_s p e n t_u s e r

output parameter is within the range of outcomes for the remaining test cases for

S 2

. The results for the remaining scenarios are more varied, which may be due to the structure of the test scenario itself.

Our experiments show that these algorithms allow us to better understand the system output logs.

6. Conclusions

The knowledge obtained from the web system logs could be directly applied to efficiently manage activities related to system engineering. In our study, we mainly focused on detecting some relationships between the parameters. The proposed tools verify the system’s behavior. After a comprehensive and sound evaluation, we can observe that the proposed tool’s prediction showcased exceptional results for the web system domain. For the mining process from the logfile, an effective algorithm is required. The proposed methods are helpful to find different information related to the log file, with analysis aimed at understanding the results obtained by the algorithms. According to figures (Figure 3 and Figure 4), it can be concluded that the RT method is useful for data mining of output web system log files. As it appeared in practice, the use of AR does not allow for derivation on the basis of test data. RT proved to be the more effective tool. Thanks to this approach, it was possible to read the context of the data. In this case, recognition of the test scenario for both different platforms and for different number of queries is possible. In the future, it will be made possible for it to properly select the system architecture for different requests classes. In this work, we investigated the behavior of the few hardware and software structures in the financial domain. The test was carried out on the web stock exchange system. There is no data set for this particular web system. Therefore, the data set needed to be prepared so that the client behavior was exemplary. We applied the approaches using the automatic client response logs for different workload scenarios. The tests were repeated multiple times to exclude random results and the average performance on the test set was recorded for comparison. We collected millions of measurement samples. Samples are depicted using the response logs. Each sample is a member of one of 54 different test case families.

Then, we propose two effective analysis tools based on the described techniques. Through the prepared automatic tools, we propose a novel analysis framework to improve traditional manual approaches. We use techniques from these areas to deal with the problem of analyzing specific data in the logfile. Our new approach competes favorably with other methods based on visual observation. Additionally, our tool can handle all the data in one iteration, and the results are generally better to interpret.

At the same time, we find that all of these approaches have some advantages and disadvantages. Our evaluation shows that the proposed method, along with the logs obtained, can provide more effective recommendations on the architecture used compared to the architectures without any classification. The most important achievement of the study is the introduction of a new system for recommendations and behaviors in web usage mining. However, more experiments are needed to arrive toward more tangible conclusions. A comparison has not been made with other collaborative filtering systems. Data preparation has been a very laborious task, especially due to the complexity of the structure of the web system and the difficulty in understanding the data.

Previously, we focused on performance management and now we have proposed tools to analyze web system logs for useful customer-related information that can help design websites according to user behavior. Furthermore, analysis results show that increasing resources does not always imply linear productivity gains. A proposed methodology could help provide guidelines for the construction of container-based web-based production systems. The information extracted by this method can be used to implement machine learning techniques.

7. Future Work

There are some areas for future work that are evident. In future work, specialized feature extraction methods can be studied for anomaly detection with machine learning models.

Author Contributions

Conceptualization, T.R.; methodology, T.R. and R.Ż.; software, R.Ż.; investigation, T.R.; data curation, T.R.; writing—original draft preparation, T.R. and R.Ż.; writing—review and editing, T.R. and R.Ż. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

We would like to thank the infrastructure technical support executive and the students from our University for the data collection phase of this project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bernardi, S.; Gómez, A.; Merseguer, J.; Perez-Palacin, D.; Requeno, J. DICE simulation: A tool for software performance assessment at the design stage. Autom. Softw. Eng. 2022, 29, 36. [Google Scholar] [CrossRef]
Rak, T. Cluster-Based Web System Models for Different Classes of Clients in QPN. In International Conference on Computer Networks; Gaj, P., Sawicki, M., Kwiecien, A., Eds.; Springer: Cham, Switzerland, 2019; pp. 347–365. [Google Scholar] [CrossRef]
Rak, T. Modeling Web Client and System Behavior. Information 2020, 11, 337. [Google Scholar] [CrossRef]
Prasad, M.; Manjula, B.; Mohd, A. Comparison of Data Mining and Web Mining. IFRSA Int. J. Data Warehous. Min. 2020, 2, 34–39. [Google Scholar]
Mughal, M.J. Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview. Int. J. Adv. Comput. Sci. Appl. 2018, 9. [Google Scholar] [CrossRef] [Green Version]
Zhao, Z.; Jian, Z.; Gaba, G.S.; Alroobaea, R.; Masud, M.; Rubaiee, S. An improved association rule mining algorithm for large data. J. Intell. Syst. 2021, 30, 750–762. [Google Scholar] [CrossRef]
Mandan, N.; Agrawal, K.; Kumar, S. Analyzing Different Domains using Data Mining Techniques. In Proceedings of the 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 22–24 January 2022; pp. 1–6. [Google Scholar] [CrossRef]
Ali, R.; Liu, H.; Liu, J. Female Employment Data Analysis Based on Decision Tree Algorithm and Association Rule Analysis Method. Sci. Program. 2022, 2022, 8994349. [Google Scholar] [CrossRef]
Sun, G.; Gu, C. Application of Data Mining Technology in Financial Intervention Based on Data Fusion Information Entropy. J. Sens. 2022, 2022, 2192186. [Google Scholar] [CrossRef]
Zhou, M.; Chen, C. An Informatization Model of Scientific Computing for Mining Association Rules Used in Teaching Management Evaluation. J. Sens. 2022, 2022, 2943692. [Google Scholar] [CrossRef]
Johns, H.; Bernhardt, J.; Churilov, L. Distance-based Classification and Regression Trees for the analysis of complex predictors in health and medical research. Stat. Methods Med Res. 2021, 30, 2085–2104. [Google Scholar] [CrossRef]
Yeh, J.Y.; Chen, C.H. A machine learning approach to predict the success of crowdfunding fintech project. J. Enterp. Inf. Manag. 2022; ahead-of-print. [Google Scholar] [CrossRef]
Fu, C.; Wang, X.; Zhang, L.; Qiao, L. Mining algorithm for association rules in big data based on Hadoop. AIP Conf. Proc. 2018, 1955, 040035. [Google Scholar] [CrossRef]
Zhang, G.; Liu, C.; Men, T. Research on Data Mining Technology based on Association Rules Algorithm. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 526–530. [Google Scholar] [CrossRef]
Xu, Y. Research of association rules algorithm in data mining. Int. J. Database Theory Appl. 2016, 9, 119–130. [Google Scholar] [CrossRef]
Kluska, J.; Madera, M. Extremely Simple Classifier Based on Fuzzy Logic and Gene Expression Programming. Inf. Sci. 2021, 571, 560–579. [Google Scholar] [CrossRef]
Madera, M.; Tomoń, R. A case study on machine learning model for code review expert system in software engineering. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Prague, Czech Republic, 3–6 September 2017; pp. 1357–1363. [Google Scholar] [CrossRef] [Green Version]
Rak, T. Performance Analysis of Distributed Internet System Models using QPN Simulation. In Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, Warsaw, Poland, 7–10 September 2014; pp. 769–774. [Google Scholar]
Werewka, J.; Rak, T. Performance Analysis of Interactive Internet Systems for a Class of Systems with Dynamically Changing Offers. In Proceedings of the 4th IFIP TC 2 Central and East European Conference on Software Engineering Techniques (CEE-SET 2009), Krakow, Poland, 12–14 October 2009; Szmuc, T., Szpyrka, M., Zendulka, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 109–123. [Google Scholar]
Clifton, B. Advanced Web Metrics with Google Analytics; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Nguyen, M.T.; Diep, T.D.; Hoang Vinh, T.; Nakajima, T.; Thoai, N. Analyzing and Visualizing Web Server Access Log File. In Future Data and Security Engineering; Dang, T.K., Küng, J., Wagner, R., Thoai, N., Takizawa, M., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 349–367. [Google Scholar]
Ehikioya, S.A.; Zeng, J. Mining web content usage patterns of electronic commerce transactions for enhanced customer services. Eng. Rep. 2021, 3, e12411. [Google Scholar] [CrossRef]
Bucklin, R.E.; Sismeiro, C. Click Here for Internet Insight: Advances in Clickstream Data Analysis in Marketing. J. Interact. Mark. 2009, 23, 35–48. [Google Scholar] [CrossRef]
Sharma, S.; Singh, S. Development of Decision Tree Algorithm for Mining Web Data Stream. Int. J. Comput. Appl. 2016, 138, 34–43. [Google Scholar] [CrossRef]
Liu, X.; Zheng, L.; Zhang, W.; Zhou, J.; Cao, S.; Yu, S. An Evolutive Frequent Pattern Tree-Based Incremental Knowledge Discovery Algorithm. ACM Trans. Manag. Inf. Syst. 2022, 13, 1–20. [Google Scholar] [CrossRef]
Schirmer, P.; Papenbrock, T.; Kruse, S.; Naumann, F.; Hempfing, D.; Mayer, T.; Neuschäfer-Rube, D. DynFD: Functional Dependency Discovery in Dynamic Datasets; EDBT 2019. Available online: https://openproceedings.org/2019/conf/edbt/EDBT19_paper_32.pdf (accessed on 12 June 2022).
Munirathinam, N.; Mushtaq, S.; Patil, P.; Bharambe, S. Using data mining techniques for detection of query patterns in SQL logs. Int. J. Pharm. Technol. 2016, 8, 25932–25937. [Google Scholar]
Caruccio, L.; Cirillo, S.; Deufemia, V.; Polese, G. Efficient Discovery of Functional Dependencies from Incremental Databases. In Proceedings of the 23rd International Conference on Information Integration and Web Intelligence, Linz, Austria, 29 November–1 December 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 400–409. [Google Scholar]
Caruccio, L.; Deufemia, V.; Naumann, F.; Polese, G. Discovering Relaxed Functional Dependencies Based on Multi-Attribute Dominance. IEEE Trans. Knowl. Data Eng. 2021, 33, 3212–3228. [Google Scholar] [CrossRef]
Ayyagari, M.R. Integrating Association Rules with Decision Trees in Object-Relational Databases. arXiv 2019, arXiv:1904.09654. [Google Scholar]
Rak, T. Formal Techniques for Simulations of Distributed Web System Models. In Cognitive Informatics and Soft Computing; Mallick, P.K., Bhoi, A.K., Marques, G., Hugo, C., de Albuquerque, V., Eds.; Springer: Singapore, 2021; pp. 365–380. [Google Scholar] [CrossRef]
Walid, B.; Kloul, L. Formal Models for Safety and Performance Analysis of a Data Center System. Reliab. Eng. Syst. Saf. 2019, 193, 106643. [Google Scholar] [CrossRef]
Shahrah, A.; Al-Mashari, M. Adaptive case management: An overview. Knowl. Process Manag. 2021, 28. [Google Scholar] [CrossRef]
Merceron, A.; Yacef, K. Interestingness Measures for Association Rules in Educational Data. In Proceedings of the Educational Data Mining, Montreal, QC, Canada, 20–21 June 2008; pp. 57–66. [Google Scholar]
Bao, F.; Mao, L.; Zhu, Y.; Xiao, C.; Xu, C. An Improved Evaluation Methodology for Mining Association Rules. Axioms 2022, 11, 17. [Google Scholar] [CrossRef]
Islam, M.R.; Liu, S.; Biddle, R.; Razzak, I.; Wang, X.; Tilocca, P.; Xu, G. Discovering dynamic adverse behavior of policyholders in the life insurance industry. Technol. Forecast. Soc. Chang. 2021, 163, 120486. [Google Scholar] [CrossRef]
Wei, S.; Ye, N.; Zhang, Q. Time-Aware Collaborative Filtering for Recommender Systems; Communications in Computer and Information Science; Springer Nature: Cham, Switzerland, 2012; Volume 321, pp. 663–670. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, W.; Ma, X.; Ogura, H.; Ye, D. Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining. Appl. Sci. 2021, 11, 8971. [Google Scholar] [CrossRef]
Sagin, A.; Ayvaz, B. Determination of Association Rules with Market Basket Analysis: Application in the Retail Sector. Southeast Eur. J. Soft Comput. 2018, 7. [Google Scholar] [CrossRef] [Green Version]
Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification Furthermore, Regression Trees; Routledge: New York, NY, USA, 2017; pp. 1–358. [Google Scholar] [CrossRef]
Krzywinski, M.; Altman, N. Classification and regression trees. Nat. Methods 2017, 14, 757–758. [Google Scholar] [CrossRef]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
Wes McKinney. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar] [CrossRef] [Green Version]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Waskom, M. seaborn: Statistical data visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
Raschka, S. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Softw. 2018, 3, 638. [Google Scholar] [CrossRef]

Figure 1. Diagrams: (a) General diagram of the web stock exchange, (b) Flow diagram.

Figure 2. Multi-container environment for

A 1

.

Figure 2. Multi-container environment for

A 1

.

Figure 3. RT 2D analysis: (a) container-processor, (b) processor-scenario, (c) container-scenario, (d) request-scenario.

Figure 4. RT 3D analysis: (a) request-processor-scenario, (b) request-scenario-container.

Table 1. Requests execution scenarios.

$S 1$	$S 2$	$S 3$
`Registration.` `Logging in.` `Download the list of` `available shares.` `LOOP: buy shares from the` `list UNTIL: a customer` `has enough money.` `Download the list of owned` `shares.` `LOOP: place sales offers for` `the next shares on the` `list UNTIL: there will` `be no offers for each` `share held.` `Download the list of client` `offers.` `Download the history of` `client offers.`	`Registration.` `Logging in.` `Download the list` `of` `available` `shares.` `LOOP: buy more` `shares on` `the list` `UNTIL: there` `is money or` `shares.`	`Registration.` `Logging in.` `Download the list of available` `shares.` `LOOP: buy more stocks from the` `list (1 each) UNTIL: there is` `money.` `Download the list of owned shares.` `LOOP: Place offers to sell the` `next shares on the list UNTIL` `: there will be no offers for` `half of the shares held.` `Download the list of client offers.` `Cancel half of the customer bids.` `LOOP: sell the client shares UNTIL` `: they are not sold all.`

Table 2. Multi-container environment.

Architecture	$A 1$			$A 2$
Processors	8			12
RAM [GB]	20			30
Container structure	$C 1$	$C 5$	$C 10$	$C 1$	$C 5$	$C 10$

Table 3. Summary of the

m e m o r y_u s a g e

[%] results by scenarios and number of requests.

Table 3. Summary of the

m e m o r y_u s a g e

[%] results by scenarios and number of requests.

Architecture	$A 1_C 1$		$A 1_C 5$		$A 1_C 10$		$A 2_C 1$		$A 2_C 5$		$A 2_C 10$
Range of Values	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.
$S 1_{100, 000}$	18.7	20.2	22.8	23.6	25.9	27.4	13.1	15.5	16.7	17.7	19	20.2
$S 1_{400, 000}$	14.5	17.4	23.4	25	23	24	11.4	13.2	16.3	18.6	18	19.8
$S 1_{700, 000}$	14.7	21.7	22.7	27.4	22.4	23.7	11.4	12.1	15.6	18.1	16.9	20.8
$S 2_{100, 000}$			23	24.5	26.5	27.6	12.8	16.1	17.2	17.6	18.5	19.9
$S 2_{400, 000}$	15.3	21.8	22.6	25.6	25.8	27.9	11.5	13.3	15.5	18.2	17.1	18.6
$S 2_{700, 000}$	14.5	15.1	22.6	24.4	22.4	27.9			17.1	18.3	16.5	17.8
$S 3_{100, 000}$	20.4	22.6	23.3	24.1	25.5	26.8	14.8	16.4	17.5	18.6	18.8	20.5
$S 3_{400, 000}$	15	15.7	23.4	24.9	25.2	27.9	11.7	21.1	16.1	17.7	17.4	20.8
$S 3_{700, 000}$	14.4	15.3	18.5	20.1	23.3	27.9			16.1	19	17.1	20.6

Table 4. Summary of the

m e m o r y_u s a g e

[%] results by number of requests and scenarios.

Table 4. Summary of the

m e m o r y_u s a g e

[%] results by number of requests and scenarios.

Architecture	$A 1_C 1$		$A 1_C 5$		$A 1_C 10$		$A 2_C 1$		$A 2_C 5$		$A 2_C 10$
Range of Values	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.
100,000 $_{S 1}$	18.7	20.2	22.8	23.6	25.9	27.4	13.1	15.5	16.7	17.7	19	20.2
100,000 $_{S 2}$			23	24.5	26.5	27.6	12.8	16.1	17.2	17.6	18.5	19.9
100,000 $_{S 3}$	20.4	22.6	23.3	24.1	25.5	26.8	14.8	16.4	17.5	18.6	18.8	20.5
400,000 $_{S 1}$	14.5	17.4	23.4	25	23	24	11.4	13.2	16.3	18.6	18	19.8
400,000 $_{S 2}$	15.3	21.8	22.6	25.6	25.8	27.9	11.5	13.3	15.5	18.2	17.1	18.6
400,000 $_{S 3}$	15	15.7	23.4	24.9	25.2	27.9	11.7	21.1	16.1	17.7	17.4	20.8
700,000 $_{S 1}$	14.7	21.7	22.7	27.4	22.4	23.7	11.4	12.1	15.6	18.1	16.9	20.8
700,000 $_{S 2}$	14.5	15.1	22.6	24.4	22.4	27.9			17.1	18.3	16.5	17.8
700,000 $_{S 3}$	14.4	15.3	18.5	20.1	23.3	27.9			16.1	19	17.1	20.6

Table 5. Summary of the

c p u_u s a g e_c u r r e n t

[%] results by scenarios and number of requests.

Table 5. Summary of the

c p u_u s a g e_c u r r e n t

[%] results by scenarios and number of requests.

Architecture	$A 1_C 1$		$A 1_C 5$		$A 1_C 10$		$A 2_C 1$		$A 2_C 5$		$A 2_C 10$
Range of Values	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.	Min.	Max.
$S 1_{100, 000}$					90.4	97.4					95.3	98.1
$S 1_{400, 000}$			78.7	81.4	93.3	95.7	16.7	19.8
$S 1_{700, 000}$	20	22.2							60	62.6
$S 2_{100, 000}$					89.5	93.3					93.3	95.7
$S 2_{400, 000}$					90.7	95	16.7	20	62.9	71.6
$S 2_{700, 000}$	17.9	19.6	76.9	79.1	94.7	98
$S 3_{100, 000}$					88.5	93.5
$S 3_{400, 000}$	29.4	31.7	76.9	79.5	89	96.7	16.7	20	61.1	67.7
$S 3_{700, 000}$									61.5	64.3

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rak, T.; Żyła, R. Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Appl. Sci. 2022, 12, 6115. https://doi.org/10.3390/app12126115

AMA Style

Rak T, Żyła R. Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Applied Sciences. 2022; 12(12):6115. https://doi.org/10.3390/app12126115

Chicago/Turabian Style

Rak, Tomasz, and Rafał Żyła. 2022. "Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System" Applied Sciences 12, no. 12: 6115. https://doi.org/10.3390/app12126115

APA Style

Rak, T., & Żyła, R. (2022). Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Applied Sciences, 12(12), 6115. https://doi.org/10.3390/app12126115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System

Abstract

1. Introduction

2. Related Work

3. Web System

3.1. Interactive Web System

3.2. Container System

3.3. Services

3.4. Tests Scenarios

4. The Scope of Research Works

4.1. Association Rules

4.2. Regression Trees

5. Methodology, Experiments and Results

5.1. Proposed Methodology

5.1.1. Data Acquisition

5.1.2. Data Preprocessing

5.1.3. Implementation of Algorithms

5.2. Results

5.2.1. AR Analysis

5.2.2. RT Analysis

6. Conclusions

7. Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI