Abstract
Traditionally, research in Network Security has largely focused on Intrusion Detection and the use of Machine Learning techniques towards identifying malicious agents as well as work on methods towards protecting ourselves from such attacks. In this paper, we wish to make use of the same techniques to analyze the profile of the attacker in the case of a DDoS attack on a distributed honeypot.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Distributed denial of service attacks
- Honey pot
- Machine learning
- Clustering algorithms
- Attacker profiling
1 Introduction
The username password combination is one of the primary methods of authentication in most of the organizations portals. Many methods such as the man in the middle attack [3], DNS spoofing [6], and phishing attacks [16] are used to obtain username password combinations. All of these activities are examples of penetration attacks as they allow an attacker to intercept the connection and make them believe that they are on the right website [1]. In the aforementioned approaches, the user is fooled into giving their access credentials. Here, we analyze another type of attack, known as a brute force attack. In this approach, the attacker attempts to guess the username and password with the help of tools that make use of dictionaries of a username and password combinations. This approach leads to an increase in load on the server, which in turn block the actual user from logging in, this is an example of a denial of service attack. In the scenario in which, such an attack is distributed, it is an example of a distributed denial of service attack [5, 7, 13].
In this paper, we make use of Kippo honeypot [4, 10], which helps us log brute force attacks and help us understand the behavior patterns of the hacker. The hacker attempts to gain access with the help of a Secure Shell session. Here, we have made use of the data obtained from a honey pot deployed within the Information Security Lab of BITS Pilani, Hyderabad Campus [14, 17].
The primary reason for targeting SSH sessions is due to the fact that a significant number of servers are not well maintained and often make use of weak credentials which make a perfect target for malicious agents [12]. A preliminary analysis of credentials and passwords on SSH remote login servers from securehoney.net gave the following results (Table 1):
The primary motive of our research is to find out how data with respect to login credentials propagates [15], once a hacker has been successful in obtaining access to an SSH server. Figure 1 shows how successful attacks on the honey tend to be clustered around certain locations [9].
We also have an image that shows us a zoomed-in perspective in China, from which the majority of the attacks had originally originated. As we can see from the image it appears as if all the attacks appear in pockets, which lends some preliminary support to the hypothesis that data of the credentials appears to spread in the vicinity of the original successful attempt. In the remainder of the paper, we make use of a variety of clustering methods to catch patterns that may escape the human eye (Fig. 2).
2 Related Work
Babak Nabiyev in his work on the application of Clustering Techniques for the detection of DDoS attacks had made use of the KDD CUP 99 dataset which had been developed by DARPA. He attempted to differentiate between Normal Traffic and DDoS traffic with the help of K-Means and EM Clustering techniques. He had clubbed together six cases of DoS attacks as a single type and he defined normal traffic flow to be the other type of behavior. Consquently, he made use of these two classes for the final clustering analysis [8].
Shi Zhong also had made use of different clustering techniques for intrusion detection. In addition, he had also made use of the DARPA intrusion detection project for his dataset. Furthermore, he had done a comparative study on different clustering algorithms for intrusion detection, in which he concluded that unsupervised clustering algorithms performed better than supervised learning methods. Out of all the clustering algorithms, his proposed self-labeling heuristic performed the best with an overall accuracy of 93.6% [19].
Nikolskaia Kseniia analyzed IP traffic with the help of clustering on IP packet headers. He considered multiple parameters such as the classification parameters based on packet and transmission properties, choice of clustering methods and the number of clusters. He concluded that real-time data is too complex to dynamically change features or clustering algorithms. A hybrid neural network approach showed the best results with about 95% correctness [11].
Jie Wang argues that clustering algorithms may not work very properly for intrusion detection because the similarity level of data points cannot be controlled. He proposes a two seed expanding algorithm that splits the attacks into different phases. The preprocessing includes creating a network flow and changing continuous-valued features to binary features. Based on these features, the algorithm selects seeds until all flows are divided into clusters. Their experiments show that two seed expanding algorithm performs better than the k-means and other clustering methods [18].
Geoff Boeing used k-means clustering and dbscan techniques to cluster 1759 points of latitude and longitude data and they were reduced to 138 points and obtained 92% compression, without losing out on the key features of the information that had been spatially represented within the dataset [2].
3 Research Framework
Experimental Setup. We have deployed honey pots with the distributed architecture as shown in Fig. 3.
The hypervisor runs five virtual machines, each of which runs a mini-Ubuntu 16.04. Each instance, in turn, runs a different honeypot. The traffic to the virtual machines is controlled with the help of a firewall and Network Address Translation(NAT) to assist us to communicate with the outside world. The server runs within the Information Security Laboratory of BITS, Pilani-Hyderabad campus network. The server continuously monitors the activity that occurs on the public IP addresses (Table 2).
4 Analysis
4.1 Attackers Origin
The origin of the attacker refers to the country or the city location from which the attack is being initiated. The source of their IP address help determines the location of the attacker. We made use of the urllib2 library to find the location of the attackers. However, IP addresses do not prove to be useful if the attacker makes use of a VPN or Tor Network. The results of the analysis have been mentioned in Table 3:
We observe that there seem to be clusters of activity followed by patches of inactivity as seen in Fig. 4. Here, we observe there as spikes of activity in the second week and the last week of June as well as the second week of July as well as the end of October and the beginning of November. On the other hand, there seem to be very less attacks initiated in the months of August and September and hence they were not accommodated in the graph.
4.2 Traffic Analysis
We had segmented the data into files of 1MB size and had a total of 250MB data. The configuration had allowed at most 21 attempts from a particular IP before the IP was banned. Total 870 usernames and 9027 unique passwords were attempted.
The most attempted username was “root” and the most attempted password was “admin”. In addition to the popular combination of ‘root’ and ‘admin’ we also get to see that the attackers tried other popular default passwords such as ubnt (as we made use of the Ubuntu operating system) as well as 1234, support and password. Furthermore, the hackers had also made use of popular usernames such as admin, user and guest. This analysis shows something as simple as setting a strong username password combination can reduce the number of successful breaches in security. Finally, we observe that an overwhelming majority of attacks on the distributed honeypot system appear to be coming from China (Tables 4, 5 and 6).
4.3 Machine Learning Analysis
On this data, we have made use of three clustering methods which has helped us gain insight on the attacker’s profile after obtaining access to the system. Here, we have pooled the data in a manner that is similar to that used within n-gram models of Natural language processing. Thus, the data comes in three forms-
-
Single day data
-
Two days at a time
-
Three days at a time
We have made use of 3 different clustering algorithms to gain a better insight on the information presented through the data. From the Figs. 5, 6 and 7 we observe that most of the attacks seem to be concentrated only in certain parts of the world. This means that the information gained by the attacker seems to be spreading only to the vicinity to the earliest attack, rather than spreading randomly over the world.
All three techniques seem to give us the similar results-
-
All techniques give cluster centers which are very close to one another.
-
The cluster centers obtained are similar across 1 g, 2 g and 3 g
On the other hand there seem to be some key differences-
-
The mean shift algorithm appears to be more susceptible to outliers, which causes it to detect a greater number of clusters.
-
On the other hand, the algorithm behaves better when we increase the number of data points as in the case of 2 g and 3 g.
To better understand why the clustering algorithms have singled out these locations, we have probed the data from 1 g, 2 g and 3 g on specific geographic locations so as to search for patterns that could help us better understand how the attack seems to propogate.
In the 1 g analysis for Table 7, we observe that all the successful attacks have appeared to have taken place one after another after short intervals of time. In addition, we can see that once an attacker gains access, it seems like the others in the vicinity gain access after a short interval of time.
In Table 8, we observe the following observation. The set of IP addresses that make a successful attempt on the first day are the same as those which are obtained on the following day. However, we notice that now there is a new IP from the same location that is now able to successfully gain access to the honeypot. This means either the attacker has gained access to a new IP or another attacker has received information about the same from another attacker in the same geolocation.
In Table 9, the pattern in the data obtained from the 3 g analysis further strengthens the observations that we had made in the case of 2 g. Here, we can clearly observe that the same set of IP addresses make attack in regular intervals of time. In addition, to those we see additional IP addresses which originate from the same or nearby locations which gives weight to the argument that the information about the credentials is spreading to the geographical vicinity.
5 Conclusion
We would like to draw the conclusion that attacks appear to be concentrated in certain regions. Furthermore, it appears as if the data with respect to the access credentials does not seem to spread randomly rather, it appears as if the success with respect to successful attacks seems to spread in the near vicinity of the first attack.
References
Bacudio, A., Yuan, X., Chu, B., Jones, M.: An overview of penetration testing. Int. J. Netw. Secur. Appl. 3, 19–38 (2011). https://doi.org/10.5121/ijnsa.2011.3602
Boeing, G.: Clustering to reduce spatial data set size. arXiv preprint arXiv:1803.08101 (2018)
Callegati, F., Cerroni, W., Ramilli, M.: Man-in-the-middle attack to the https protocol. IEEE Secur. Priv. 7(1), 78–81 (2009)
Doubleday, H., Maglaras, L., Janicke, H.: SSH honeypot: building, deploying and analysis. Int. J. Adv. Comput. Sci. Appl. 7 (2016). https://doi.org/10.14569/IJACSA.2016.070518
Feinstein, L., Schnackenberg, D., Balupari, R., Kindred, D.: Statistical approaches to DDOS attack detection and response. In: Proceedings DARPA information survivability conference and exposition, vol. 1, pp. 303–314. IEEE (2003)
Klein, A., Golan, Z.: System and method for detecting and mitigating DNS spoofing trojans, 11 September 2012, US Patent 8,266,295
Mirkovic, J., Reiher, P.: A taxonomy of DDOS attack and DDOS defense mechanisms. ACM SIGCOMM Comput. Commun. Rev. 34(2), 39–53 (2004)
Nabiyev, B.: Application of clustering methods network traffic for detecting DDOS attacks. Probl. Inf. Technol. 09, 98–107 (2018). https://doi.org/10.25045/jpit.v09.i1.11
Najafabadi, M.M., Khoshgoftaar, T.M., Kemp, C., Seliya, N., Zuech, R.: Machine learning for detecting brute force attacks at the network level. In: 2014 IEEE International Conference on Bioinformatics and Bioengineering, pp. 379–385. IEEE (2014)
Nawrocki, M., Wählisch, M., Schmidt, T.C., Keil, C., Schönfelder, J.: A survey on honeypot software and data analysis. arXiv preprint arXiv:1608.06249 (2016)
Nikolskaia, K.: Network attacks detection based on cluster analysis, September 2017
Owens, J., Matthews, J.: A study of passwords and methods used in brute-force SSH attacks (2008)
Schuba, C.L., Krsul, I.V., Kuhn, M.G., Spafford, E.H., Sundaram, A., Zamboni, D.: Analysis of a denial of service attack on TCP. In: Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No. 97CB36097), pp. 208–223. IEEE (1997)
Sochor, T., Zuzcak, M.: Study of internet threats and attack methods using honeypots and honeynets. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2014. CCIS, vol. 431, pp. 118–127. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07941-7_12
Tatlı, E.: Cracking more password hashes with patterns. IEEE Trans. Inf. Forensics Secur. 10, 1 (2015). https://doi.org/10.1109/TIFS.2015.2422259
Thomas, K., et al. (eds.): Data Breaches, Phishing, or Malware? Understanding the Risks of Stolen Credentials (2017)
Valli, C., Rabadia, P., Woodward, A.: Patterns and patter-an investigation into SSH activity using kippo honeypots (2013)
Wang, J., Yang, L., Wu, J., Abawajy, J.H.: Clustering analysis for malicious network traffic. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2017)
Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Clustering-based network intrusion detection. Int. J. Reliab. Qual. Saf. Eng. 14(02), 169–187 (2007)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Gupta, H., Kulkarni, T.G., Kumar, L., Murthy, N.L.B. (2020). A Novel Approach Towards Analysis of Attacker Behavior in DDoS Attacks. In: Boumerdassi, S., Renault, É., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2019. Lecture Notes in Computer Science(), vol 12081. Springer, Cham. https://doi.org/10.1007/978-3-030-45778-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-45778-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45777-8
Online ISBN: 978-3-030-45778-5
eBook Packages: Computer ScienceComputer Science (R0)