Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System
Abstract
:1. Introduction
2. Related Work
- Data Preprocessing—Real-world data and some databases are incomplete, inconsistent, and not understandable. Data preprocessing is a mining technique that integrates databases and makes raw data understandable and consistent.
- Pattern Discovery—Web usage pattern discovery techniques are used to discover interesting patterns as statistical analysis. Knowledge obtained by statistical analyzing results may help to improve, e.g., performance. The association rule is one of the basic rules of data mining and is mostly used in web usage mining.
- Pattern Analysis—In this step, all irrelevant rules or patterns discovered in the above phases are separated, and relevant rules or patterns are extracted.
3. Web System
3.1. Interactive Web System
3.2. Container System
3.3. Services
- The main back-end application contains Django and the application logic.
- The main front-end application contains Vue.js and is responsible for the front-end of the application.
- The main Celery (asynchronous task queue/job queue) is responsible for the distribution of tasks.
- Celery Beat (a periodic task) is responsible for creating the schedule and running the tasks.
- The main database built based on the official distribution of the PostgreSQL container is responsible for launching and maintaining the main database of the system.
- The main Rabbitmq built on the bases of the official distribution of the RabbitMQ container is responsible for the main system queuing tasks.
- Automatic customer back-end application contains Django and client application logic.
- Automatic customer front-end application contains Vue.js and is responsible for the front-end of the application.
- Automatic customer Celery (asynchronous task queue/job queue) is responsible for the distribution of customers’ tasks.
- Database of automatic customer system built based on the official distribution of the PostgreSQL container. It is responsible for launching and maintaining the system responses database.
- The Rabbitmq client built on the bases of the official distribution of the RabbitMQ container is responsible for customer queueing tasks.
3.4. Tests Scenarios
- Scenario 1—Buy to the limit and put up sales offers ().
- Scenario 2—Buy and sell ().
- Scenario 3—Buy more while there is money ().
- —Time in milliseconds spent on executing SQL queries.
- —Time in milliseconds needed for processing the query content.
- —The percentage of CPU usage while the query is running.
- —Time in seconds that the CPU spends performing client’s tasks.
- —Time in seconds that the CPU spends performing system tasks.
- —Time in seconds that the CPU spent waiting for tasks.
- —Memory usage as a percentage.
- —Aggregated CPU usage for over 30 s, expressed as a percentage.
- —The container ID.
4. The Scope of Research Works
- Data analysis from various test scenarios.
- Analysis of the impact of a different number of requests.
- Analysis of the application’s operation for various hardware configurations.
4.1. Association Rules
4.2. Regression Trees
5. Methodology, Experiments and Results
5.1. Proposed Methodology
5.1.1. Data Acquisition
5.1.2. Data Preprocessing
5.1.3. Implementation of Algorithms
5.2. Results
5.2.1. AR Analysis
Algorithm 1 Pseudocode of the proposed AR |
input data table, support, number of bins for discretization |
1: for every PARAMETER do |
2: if variance of PARAMETER is equal to zero then |
3: remove PARAMETER |
4: else |
continue |
5: end if |
6: end for |
7: for every REQUEST do |
8: if REQUEST contains NaN then |
9: remove REQUEST |
10: else |
11: discretization ( from library to discretize continuous features) of numerical parameters with quantile strategy |
12: end if |
13: end for |
14: for every PARAMETER do |
15: one-hot encoding of PARAMETER |
16: end for |
17: discretization of continuous values in data with quantile strategy |
18: one-hot encoding data |
19: generate association rules with Apriori method ( library) |
output association rules |
- Finding antecedents with the same consequents for all architectures (, , and , , ).
- Searching pairs with the biggest support and confidence.
- Joining duplicates.
5.2.2. RT Analysis
- Return to the categorical values.
- Ordering according to the number of processors, the number of containers, the scenario and the number of requests.
- Division into architecture , , and , , .
- Preparation of charts (Figure 3a–d).
Algorithm 2 Pseudocode of the proposed RT |
input data table |
1: for every REQUEST do |
2: if REQUEST contains NaN then |
3: remove REQUEST |
4: else |
5: adding the parameters (, , , , ) of experiments to dataset |
6: end if |
7: end for |
8: orginal encoding (the package) of categorical parameters (, ) |
9: generating tree with parameter as target ( decision tree classifier) |
output tree structure |
6. Conclusions
7. Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bernardi, S.; Gómez, A.; Merseguer, J.; Perez-Palacin, D.; Requeno, J. DICE simulation: A tool for software performance assessment at the design stage. Autom. Softw. Eng. 2022, 29, 36. [Google Scholar] [CrossRef]
- Rak, T. Cluster-Based Web System Models for Different Classes of Clients in QPN. In International Conference on Computer Networks; Gaj, P., Sawicki, M., Kwiecien, A., Eds.; Springer: Cham, Switzerland, 2019; pp. 347–365. [Google Scholar] [CrossRef]
- Rak, T. Modeling Web Client and System Behavior. Information 2020, 11, 337. [Google Scholar] [CrossRef]
- Prasad, M.; Manjula, B.; Mohd, A. Comparison of Data Mining and Web Mining. IFRSA Int. J. Data Warehous. Min. 2020, 2, 34–39. [Google Scholar]
- Mughal, M.J. Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview. Int. J. Adv. Comput. Sci. Appl. 2018, 9. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Z.; Jian, Z.; Gaba, G.S.; Alroobaea, R.; Masud, M.; Rubaiee, S. An improved association rule mining algorithm for large data. J. Intell. Syst. 2021, 30, 750–762. [Google Scholar] [CrossRef]
- Mandan, N.; Agrawal, K.; Kumar, S. Analyzing Different Domains using Data Mining Techniques. In Proceedings of the 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 22–24 January 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Ali, R.; Liu, H.; Liu, J. Female Employment Data Analysis Based on Decision Tree Algorithm and Association Rule Analysis Method. Sci. Program. 2022, 2022, 8994349. [Google Scholar] [CrossRef]
- Sun, G.; Gu, C. Application of Data Mining Technology in Financial Intervention Based on Data Fusion Information Entropy. J. Sens. 2022, 2022, 2192186. [Google Scholar] [CrossRef]
- Zhou, M.; Chen, C. An Informatization Model of Scientific Computing for Mining Association Rules Used in Teaching Management Evaluation. J. Sens. 2022, 2022, 2943692. [Google Scholar] [CrossRef]
- Johns, H.; Bernhardt, J.; Churilov, L. Distance-based Classification and Regression Trees for the analysis of complex predictors in health and medical research. Stat. Methods Med Res. 2021, 30, 2085–2104. [Google Scholar] [CrossRef]
- Yeh, J.Y.; Chen, C.H. A machine learning approach to predict the success of crowdfunding fintech project. J. Enterp. Inf. Manag. 2022; ahead-of-print. [Google Scholar] [CrossRef]
- Fu, C.; Wang, X.; Zhang, L.; Qiao, L. Mining algorithm for association rules in big data based on Hadoop. AIP Conf. Proc. 2018, 1955, 040035. [Google Scholar] [CrossRef]
- Zhang, G.; Liu, C.; Men, T. Research on Data Mining Technology based on Association Rules Algorithm. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 526–530. [Google Scholar] [CrossRef]
- Xu, Y. Research of association rules algorithm in data mining. Int. J. Database Theory Appl. 2016, 9, 119–130. [Google Scholar] [CrossRef]
- Kluska, J.; Madera, M. Extremely Simple Classifier Based on Fuzzy Logic and Gene Expression Programming. Inf. Sci. 2021, 571, 560–579. [Google Scholar] [CrossRef]
- Madera, M.; Tomoń, R. A case study on machine learning model for code review expert system in software engineering. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Prague, Czech Republic, 3–6 September 2017; pp. 1357–1363. [Google Scholar] [CrossRef] [Green Version]
- Rak, T. Performance Analysis of Distributed Internet System Models using QPN Simulation. In Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, Warsaw, Poland, 7–10 September 2014; pp. 769–774. [Google Scholar]
- Werewka, J.; Rak, T. Performance Analysis of Interactive Internet Systems for a Class of Systems with Dynamically Changing Offers. In Proceedings of the 4th IFIP TC 2 Central and East European Conference on Software Engineering Techniques (CEE-SET 2009), Krakow, Poland, 12–14 October 2009; Szmuc, T., Szpyrka, M., Zendulka, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 109–123. [Google Scholar]
- Clifton, B. Advanced Web Metrics with Google Analytics; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Nguyen, M.T.; Diep, T.D.; Hoang Vinh, T.; Nakajima, T.; Thoai, N. Analyzing and Visualizing Web Server Access Log File. In Future Data and Security Engineering; Dang, T.K., Küng, J., Wagner, R., Thoai, N., Takizawa, M., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 349–367. [Google Scholar]
- Ehikioya, S.A.; Zeng, J. Mining web content usage patterns of electronic commerce transactions for enhanced customer services. Eng. Rep. 2021, 3, e12411. [Google Scholar] [CrossRef]
- Bucklin, R.E.; Sismeiro, C. Click Here for Internet Insight: Advances in Clickstream Data Analysis in Marketing. J. Interact. Mark. 2009, 23, 35–48. [Google Scholar] [CrossRef]
- Sharma, S.; Singh, S. Development of Decision Tree Algorithm for Mining Web Data Stream. Int. J. Comput. Appl. 2016, 138, 34–43. [Google Scholar] [CrossRef]
- Liu, X.; Zheng, L.; Zhang, W.; Zhou, J.; Cao, S.; Yu, S. An Evolutive Frequent Pattern Tree-Based Incremental Knowledge Discovery Algorithm. ACM Trans. Manag. Inf. Syst. 2022, 13, 1–20. [Google Scholar] [CrossRef]
- Schirmer, P.; Papenbrock, T.; Kruse, S.; Naumann, F.; Hempfing, D.; Mayer, T.; Neuschäfer-Rube, D. DynFD: Functional Dependency Discovery in Dynamic Datasets; EDBT 2019. Available online: https://openproceedings.org/2019/conf/edbt/EDBT19_paper_32.pdf (accessed on 12 June 2022).
- Munirathinam, N.; Mushtaq, S.; Patil, P.; Bharambe, S. Using data mining techniques for detection of query patterns in SQL logs. Int. J. Pharm. Technol. 2016, 8, 25932–25937. [Google Scholar]
- Caruccio, L.; Cirillo, S.; Deufemia, V.; Polese, G. Efficient Discovery of Functional Dependencies from Incremental Databases. In Proceedings of the 23rd International Conference on Information Integration and Web Intelligence, Linz, Austria, 29 November–1 December 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 400–409. [Google Scholar]
- Caruccio, L.; Deufemia, V.; Naumann, F.; Polese, G. Discovering Relaxed Functional Dependencies Based on Multi-Attribute Dominance. IEEE Trans. Knowl. Data Eng. 2021, 33, 3212–3228. [Google Scholar] [CrossRef]
- Ayyagari, M.R. Integrating Association Rules with Decision Trees in Object-Relational Databases. arXiv 2019, arXiv:1904.09654. [Google Scholar]
- Rak, T. Formal Techniques for Simulations of Distributed Web System Models. In Cognitive Informatics and Soft Computing; Mallick, P.K., Bhoi, A.K., Marques, G., Hugo, C., de Albuquerque, V., Eds.; Springer: Singapore, 2021; pp. 365–380. [Google Scholar] [CrossRef]
- Walid, B.; Kloul, L. Formal Models for Safety and Performance Analysis of a Data Center System. Reliab. Eng. Syst. Saf. 2019, 193, 106643. [Google Scholar] [CrossRef]
- Shahrah, A.; Al-Mashari, M. Adaptive case management: An overview. Knowl. Process Manag. 2021, 28. [Google Scholar] [CrossRef]
- Merceron, A.; Yacef, K. Interestingness Measures for Association Rules in Educational Data. In Proceedings of the Educational Data Mining, Montreal, QC, Canada, 20–21 June 2008; pp. 57–66. [Google Scholar]
- Bao, F.; Mao, L.; Zhu, Y.; Xiao, C.; Xu, C. An Improved Evaluation Methodology for Mining Association Rules. Axioms 2022, 11, 17. [Google Scholar] [CrossRef]
- Islam, M.R.; Liu, S.; Biddle, R.; Razzak, I.; Wang, X.; Tilocca, P.; Xu, G. Discovering dynamic adverse behavior of policyholders in the life insurance industry. Technol. Forecast. Soc. Chang. 2021, 163, 120486. [Google Scholar] [CrossRef]
- Wei, S.; Ye, N.; Zhang, Q. Time-Aware Collaborative Filtering for Recommender Systems; Communications in Computer and Information Science; Springer Nature: Cham, Switzerland, 2012; Volume 321, pp. 663–670. [Google Scholar] [CrossRef]
- Zhang, Y.; Yu, W.; Ma, X.; Ogura, H.; Ye, D. Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining. Appl. Sci. 2021, 11, 8971. [Google Scholar] [CrossRef]
- Sagin, A.; Ayvaz, B. Determination of Association Rules with Market Basket Analysis: Application in the Retail Sector. Southeast Eur. J. Soft Comput. 2018, 7. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification Furthermore, Regression Trees; Routledge: New York, NY, USA, 2017; pp. 1–358. [Google Scholar] [CrossRef]
- Krzywinski, M.; Altman, N. Classification and regression trees. Nat. Methods 2017, 14, 757–758. [Google Scholar] [CrossRef]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
- Wes McKinney. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar] [CrossRef] [Green Version]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Waskom, M. seaborn: Statistical data visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
- Raschka, S. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Softw. 2018, 3, 638. [Google Scholar] [CrossRef]
Registration. Logging in. Download the list of available shares. LOOP: buy shares from the list UNTIL: a customer has enough money. Download the list of owned shares. LOOP: place sales offers for the next shares on the list UNTIL: there will be no offers for each share held. Download the list of client offers. Download the history of client offers. | Registration. Logging in. Download the list of available shares. LOOP: buy more shares on the list UNTIL: there is money or shares. | Registration. Logging in. Download the list of available shares. LOOP: buy more stocks from the list (1 each) UNTIL: there is money. Download the list of owned shares. LOOP: Place offers to sell the next shares on the list UNTIL : there will be no offers for half of the shares held. Download the list of client offers. Cancel half of the customer bids. LOOP: sell the client shares UNTIL : they are not sold all. |
Architecture | ||||||
---|---|---|---|---|---|---|
Processors | 8 | 12 | ||||
RAM [GB] | 20 | 30 | ||||
Container structure |
Architecture | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Range of Values | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. |
18.7 | 20.2 | 22.8 | 23.6 | 25.9 | 27.4 | 13.1 | 15.5 | 16.7 | 17.7 | 19 | 20.2 | |
14.5 | 17.4 | 23.4 | 25 | 23 | 24 | 11.4 | 13.2 | 16.3 | 18.6 | 18 | 19.8 | |
14.7 | 21.7 | 22.7 | 27.4 | 22.4 | 23.7 | 11.4 | 12.1 | 15.6 | 18.1 | 16.9 | 20.8 | |
23 | 24.5 | 26.5 | 27.6 | 12.8 | 16.1 | 17.2 | 17.6 | 18.5 | 19.9 | |||
15.3 | 21.8 | 22.6 | 25.6 | 25.8 | 27.9 | 11.5 | 13.3 | 15.5 | 18.2 | 17.1 | 18.6 | |
14.5 | 15.1 | 22.6 | 24.4 | 22.4 | 27.9 | 17.1 | 18.3 | 16.5 | 17.8 | |||
20.4 | 22.6 | 23.3 | 24.1 | 25.5 | 26.8 | 14.8 | 16.4 | 17.5 | 18.6 | 18.8 | 20.5 | |
15 | 15.7 | 23.4 | 24.9 | 25.2 | 27.9 | 11.7 | 21.1 | 16.1 | 17.7 | 17.4 | 20.8 | |
14.4 | 15.3 | 18.5 | 20.1 | 23.3 | 27.9 | 16.1 | 19 | 17.1 | 20.6 |
Architecture | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Range of Values | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. |
100,000 | 18.7 | 20.2 | 22.8 | 23.6 | 25.9 | 27.4 | 13.1 | 15.5 | 16.7 | 17.7 | 19 | 20.2 |
100,000 | 23 | 24.5 | 26.5 | 27.6 | 12.8 | 16.1 | 17.2 | 17.6 | 18.5 | 19.9 | ||
100,000 | 20.4 | 22.6 | 23.3 | 24.1 | 25.5 | 26.8 | 14.8 | 16.4 | 17.5 | 18.6 | 18.8 | 20.5 |
400,000 | 14.5 | 17.4 | 23.4 | 25 | 23 | 24 | 11.4 | 13.2 | 16.3 | 18.6 | 18 | 19.8 |
400,000 | 15.3 | 21.8 | 22.6 | 25.6 | 25.8 | 27.9 | 11.5 | 13.3 | 15.5 | 18.2 | 17.1 | 18.6 |
400,000 | 15 | 15.7 | 23.4 | 24.9 | 25.2 | 27.9 | 11.7 | 21.1 | 16.1 | 17.7 | 17.4 | 20.8 |
700,000 | 14.7 | 21.7 | 22.7 | 27.4 | 22.4 | 23.7 | 11.4 | 12.1 | 15.6 | 18.1 | 16.9 | 20.8 |
700,000 | 14.5 | 15.1 | 22.6 | 24.4 | 22.4 | 27.9 | 17.1 | 18.3 | 16.5 | 17.8 | ||
700,000 | 14.4 | 15.3 | 18.5 | 20.1 | 23.3 | 27.9 | 16.1 | 19 | 17.1 | 20.6 |
Architecture | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Range of Values | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. | Min. | Max. |
90.4 | 97.4 | 95.3 | 98.1 | |||||||||
78.7 | 81.4 | 93.3 | 95.7 | 16.7 | 19.8 | |||||||
20 | 22.2 | 60 | 62.6 | |||||||||
89.5 | 93.3 | 93.3 | 95.7 | |||||||||
90.7 | 95 | 16.7 | 20 | 62.9 | 71.6 | |||||||
17.9 | 19.6 | 76.9 | 79.1 | 94.7 | 98 | |||||||
88.5 | 93.5 | |||||||||||
29.4 | 31.7 | 76.9 | 79.5 | 89 | 96.7 | 16.7 | 20 | 61.1 | 67.7 | |||
61.5 | 64.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rak, T.; Żyła, R. Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Appl. Sci. 2022, 12, 6115. https://doi.org/10.3390/app12126115
Rak T, Żyła R. Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Applied Sciences. 2022; 12(12):6115. https://doi.org/10.3390/app12126115
Chicago/Turabian StyleRak, Tomasz, and Rafał Żyła. 2022. "Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System" Applied Sciences 12, no. 12: 6115. https://doi.org/10.3390/app12126115
APA StyleRak, T., & Żyła, R. (2022). Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Applied Sciences, 12(12), 6115. https://doi.org/10.3390/app12126115