Abstract
A distributed firewall is a security application that monitors and controls traffic on an organization’s network. While centralized firewalls are used against attacks coming from outside a network, distributed firewalls are considered for inside attacks from internal networks such as wireless access and VPN tunnel. Distributed firewalls use policies, which are stated by rules, to find anomalous packets. However, such static rules may be incomplete. In this case, by monitoring firewall logs, the anomalies can be detected. Such logs become big when networks have high traffic, but their hidden knowledge contains valuable information about existing anomalies. In this paper, to detect the anomalies, we extract patterns from big data logs of distributed firewalls using data mining and machine learning. The proposed method is applied to big logs from distributed firewalls in a real security environment, and results are analyzed.















Similar content being viewed by others
Availability of data and materials
The log dataset used in the case study of this research is big data, about one Terabyte. A part of the log (500 records) is located at: https://github.com/babamir/Firewall-data.
References
Ucar E, Ozhan E (2017) The analysis of firewall policy through machine learning and data mining. Wirel Pers Commun 96(2):2891–2909
Esmaeil Qasem H, Khamitkar SD (2020) Discovering anomalous rules in firewall logs using data mining and machine learning classifiers. Int J Sci Technol Res 9(2):2491–2497
He W (2018) Big-data analysis of multi-source logs for network anomaly detection, In: The 5th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS)
Lkotun AM et al (2023) K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf Sci 622:178–210
Shami TM et al (2022) Particle swarm optimization: a comprehensive survey. IEEE Access 10:10031–10061
Cheng Y, Shi Q (2022) Analysis of policy anomalies in distributed firewalls. Int J Netw Secur 24(4):617–627
Gutierrez R et al (2018) Cyber anomaly detection: using tabulated vectors and embedded analytics for efficient data mining. J Algorithms Comput Technol 12(4):293–310
Younas M (2019) Research challenges of big data. Serv Oriented Comput Appl 13:105–107
Medjedovic D, Tahirovic E, Dedovic I (2022) Algorithms and Data Structures for Massive Datasets, Manning publication
Adadi A (2021) A survey on data-efficient algorithms in big data era. J Big Data 8(24):1–54
Mahdi MA, Hosny KM, Elhenawy I (2021) Scalable clustering algorithms for big data: a review. IEEE Access 9:80015–80027
Fahad A (2014) A survey of clustering techniques for big data: taxenomy and emprical analysis. IEEE Trans Emerg Top Comput 2(3):267–279
Likas A, Vlassis N, Verbeek J (2003) The global K-means clustering algorithm. Pattern Recognit J 36(2):451–461
Pelleg D, Pelleg D, Moore A (2000) X-means: extending k-means with efficient estimation of the number of clusters, In: Proceedings of the 17th International Conf on Machine Learning, pp 727–734
Dhaenens C, Jourdan L (2016) Metaheuristics for big data. Wiley
Kumar A et al (eds) (2021) Swarm Intelligence Optimization: Algorithms and Applications. Wiley
Gad AG (2022) Particle swarm optimization algorithm and its applications: a systematic review. Arch Comput Methods Eng 29:2531–2561
Andalib A, Babamir SM (2014) A new approach for test case generation by discrete PSO algorithm, In: 22nd Iranian Conference on Electrical Engineering (ICEE), pp 1180–1185
Andalib A, Babamir SM (2016) A class-based link prediction using distance dependent chinese restaurant process. Physica A Stat Mech Appl 456:204–214
Lin C, Liu J, Ho C (2008) Anomaly detection using LibSVM training tools. In: International Conference on Information Security and Assurance, Korea, pp. 166–171
Nanda M (2018) A comparison study of kernel functions in the SVM and its application for termite detection. J Inf 9(1):5–11
Costa VG, Pedreira CE (2022) Recent advances in decision trees: an updated survey, Artificial Intelligence Review, Issue 5, Springer
Alzubaidi L et al (2021) Review of deep learning: concepts CNN architectures, challenges, applications, future directions. J Big Data 8(53):1–74
Yang T, Ying Y (2023) AUC maximization in the era of big data and AI: a survey. Acm Comput Surv 55(8):1–36
Chang C (2022) Comprehensive analysis of receiver operating characteristic (ROC) curves for hyperspectral anomaly detection. IEEE Trans Geosci Remote Sens 60:1–24
Al-Shaer E, Hamed H (2004) Design and implementation of firewall policy advisor tools. DePaul University, Technical Report
Al-Shaer E, Hamed H (2003) Firewall policy advisor for anomaly discovery and rule editing, In: IFIP/IEEE 8th International Symposium on Integrated Network Management, pp 17–30
Rezvani M, Aryan R (2009) Analyzing and resolving anomalies in firewall security policies based on propositional logic, In: 13th IEEE International Multitopic Conference on INMIC
Al-Shaer E and Hamed H (2004) Discovery of policy anomalies in distributed firewalls. In: Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, Vol 4, pp 2605–2616
Chaure R, Shandilya S (2010) Firewall anomalies detection and removal techniques, a survey. Int J Emerg Technol 1(1):71–74
K. Golnabi, R. Min, L. Khan, and E. Al-Shaer, Analysis of Firewall Policy Rules using Data Mining Techniques, IEEE/IFIP Network Operations and Management Symposium NOMS, pp. 305–315, 2006.
Cuppens F, Boulahia N, García-Alfaro J (2005) Detection and removal of firewall misconfiguration. In: The IASTED International Conference on Communication, Network and Information Security, pp 154–161
Liu A, Gouda M (2005) Complete redundancy detection in firewalls, In: IFIP Annual Conference on Data and Applications Security and Privacy, Springer, pp 193- 206
Qian J, Hinrichs S (2001) ACLA: a framework for access control list (ACL) analysis and optimization, In: The 5th Joint Working Conference on Communications and Multimedia Security Issues of the New Century, Springer, pp 197–211
Khan B, Khan M, Mahmud M, Alghathbar K (2010) Security analysis of firewall rule sets in computer networks, In: The 4th IEEE International Conference on Emerging Security Information, Systems and Technologies
Liu A, Gouda M (2009) Firewall policy queries. IEEE Trans Parallel Distrib Syst 20(6):766–777
Bartal Y, Mayer A, Nissim K, Wool A (2002) Firmato: a novel firewall management toolkit, In: IEEE Symposium on Security and Privacy, pp 381–420
Alfaro J, Cuppens F, Boulahia N (2006) Analysis of policy anomalies on distributed network security setups, In: European Symposium on Research in Computer Security, Springer, pp. 496–511
Abedin M, Nessa S, Khan L, Thuraisingham B (2006) Detection and resolution of anomalies in firewall policy rules, In: IFIP Annual Conference on Data and Applications Security and Privacy, Springer, pp 15–29
Katic T, Pale P (2007) Optimization of firewall rules. In: The 29th International Conference on Information Technology Interfaces, pp 685–690
Capretta V, Stepien B, Felty A (2007) Formal correctness of conflict detection for firewalls. In: ACM Workshop on Formal Methods in Security Engineering, pp 22–30
Liu A (2012) Firewall policy change-impact analysis. ACM Trans Internet Technol (TOIT) 11(4):1–24
Ektefa M, Memar S, Sidi F, Suriani L (2010) Intrusion detection using data mining techniques, In: International Conference on Information Retrieval and Knowledge Management (CAMP)
Kabiri P, Ghorbani A (2005) Research on intrusion detection and response: a survey. Int J Netw Secur 1(2):84–102
Baboescu F, Varghese G (2003) Fast and scalable conflict detection for packet classifiers. Comput Netw 42(6):717–735
Tapdiya A, Fulp E (2009) Towards optimal firewall rule ordering utilizing directed acyclical graphs, In: Proceedings of 18th International Conference on Computer Communications and Networks
Zhang C, Winslett M, Gunter C (2007) On the safety and efficiency of firewall policy deployment, In: IEEE Symposium on Security and Privacy (SP '07)
Ahmed Z, Imine A, Rusinowitch M (2010) Safe and efficient strategies for updating firewall policies, In: International Conference on Trust, Privacy and Security in Digital Business, Springer
Benelbahri M, Bouhoula M (2007) Tuple based approach for anomalies detection within firewall filtering rules, In: The 12th IEEE Symposium on Computers and Communications
Yoon M, Chen S, Zhang Z (2010) Minimizing the maximum firewall rule set in a network with multiple firewalls. IEEE Trans Comput 59(2):218–230
Aneja A, Thapar V (2013) Optimizing packet filter firewall using duple decision scheme. SIJ Trans Comput Netw Commun Eng J (CNCE) 1(2):28–34
Karoui K, Ben Ftima F, Ben Ghezala H (2009) A multi-agent framework for anomalies detection on distributed firewalls using data mining techniques, data mining and multi-agent integration, Springer, pp 267–278
Saboori E, Parsazad S, Sanatkhani Y (2010) Automatic Firewall Rules Generator for Anomaly Detection Systems with Apriori Algorithm, In: The 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE)
Acharya S, Wang J, Ge Z, Znati T, Greenberg A (2006) Simulation study of firewalls to aid improved performance, In: The 39th Annual Simulation Symposium (ANSS'06)
Acharya S, Wang J, Ge Z, Znati T, Greenberg A (2006) Traffic-aware firewall optimization strategies, In: IEEE International Conference on Communications
Denning D (1987) An intrusion-detection model. IEEE Trans Softw Eng 13(2):222–232
Lubna K, Cyiac R, Kavitha Karun A (2013) Firewall log analysis and dynamic rule re-ordering in firewall policy anomaly management framework, In: International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), pp 853–856, India
Jakhale AR, Patil GA (2014) Anomaly detection system by mining frequent pattern using data mining algorithm from network flow. Int J Eng Res Technol 3(1):2278
Yuan L, Chen H, Mai J, Chuah CN, Su Z, Mohapatra P (2006) FIREMAN: a Toolkit for Firewall modeling and Analysis, In: IEEE Symposium on Security and Privacy
Apostolopoulos G, Aubespin D, Peris V, Pradham P, Saha D (2000) Design, implementation and performance of a content-based switch, In: The 19th Annual Joint Conference of the IEEE Computer and Communications Societies, pp 1117–1126
Rezvani M, Arian R (2009) Analyzing and resolving anomalies in firewall security policies based on propositional logic, In: IEEE 13th International Multitopic Conference
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp 1285–1298
Muniyandi AP, Rajeswari R, Rajaram R (2012) Network anomaly detection by cascading k-Means clustering and C45 decision tree algorithm. Procedia Eng 30:174–182
Ying S, Bingming Wang Lu, Wang QL, Zhao Y, Shang J, Huang H, Cheng G, Yang Z, Geng J (2021) An improved KNN-based efficient log anomaly detection method with automatically labeled samples. ACM Trans Knowl Discov Data (TKDD) 15(3):1–22
Rahman R, Toma DS (2017) Scalable security analytics framework using NoSQL databas. Int J Datab Theory Appl 10(11):27–46
Hossen MS (2020) Data processing, Chapter 3 of machine learning and big data: concepts, algorithms, tools and applications, In: UN Dulhare, K Ahmad, KAB Ahmad, Wiley
Panjei E et al (2022) A survey on outlier explanations. VLDB J 31:977–1008
Acknowledgements
We thank University of Kashan for supporting this research by grant# 234340.
Funding
This research was fully supported and funded by University of Kashan.
Author information
Authors and Affiliations
Contributions
AA obtained the real case study and achieved the results by applying the proposed method to the study. In addition, she provided the related study. B proposed the suggested method and supervised the research.
Corresponding author
Ethics declarations
Ethical Approval
The authors declare that they used no ethical matters.
Conflict of interest
The authors declare that have no competing interests. Andlib is a PhD candidate, and her research has been focused on anomalies of firewalls and their logs as big data. Babamir is a faculty member with University of Kashan with interests in distributed systems and cloud computing. This research has received its support from University of Kashan, Kashan, Iran, and ICT Research Institute, Tehran, Iran.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Andalib, A., Babamir, S.M. Anomaly detection of policies in distributed firewalls using data log analysis. J Supercomput 79, 19473–19514 (2023). https://doi.org/10.1007/s11227-023-05417-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05417-7