A Novel Approach Towards Analysis of Attacker Behavior in DDoS Attacks

Gupta, Himanshu; Kulkarni, Tanmay Girish; Kumar, Lov; Murthy, Neti Lalita Bhanu

doi:10.1007/978-3-030-45778-5_27

Himanshu Gupta¹¹,
Tanmay Girish Kulkarni¹¹,
Lov Kumar¹¹ &
…
Neti Lalita Bhanu Murthy¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12081))

Included in the following conference series:

International Conference on Machine Learning for Networking

1577 Accesses

Abstract

Traditionally, research in Network Security has largely focused on Intrusion Detection and the use of Machine Learning techniques towards identifying malicious agents as well as work on methods towards protecting ourselves from such attacks. In this paper, we wish to make use of the same techniques to analyze the profile of the attacker in the case of a DDoS attack on a distributed honeypot.

You have full access to this open access chapter, Download conference paper PDF

Hybrid Method for Discovering DDOS Attack

A Survey of Learning Techniques for Detecting DDOS Assaults

N-Tier Machine Learning-Based Architecture for DDoS Attack Detection

Keywords

1 Introduction

The username password combination is one of the primary methods of authentication in most of the organizations portals. Many methods such as the man in the middle attack [3], DNS spoofing [6], and phishing attacks [16] are used to obtain username password combinations. All of these activities are examples of penetration attacks as they allow an attacker to intercept the connection and make them believe that they are on the right website [1]. In the aforementioned approaches, the user is fooled into giving their access credentials. Here, we analyze another type of attack, known as a brute force attack. In this approach, the attacker attempts to guess the username and password with the help of tools that make use of dictionaries of a username and password combinations. This approach leads to an increase in load on the server, which in turn block the actual user from logging in, this is an example of a denial of service attack. In the scenario in which, such an attack is distributed, it is an example of a distributed denial of service attack [5, 7, 13].

In this paper, we make use of Kippo honeypot [4, 10], which helps us log brute force attacks and help us understand the behavior patterns of the hacker. The hacker attempts to gain access with the help of a Secure Shell session. Here, we have made use of the data obtained from a honey pot deployed within the Information Security Lab of BITS Pilani, Hyderabad Campus [14, 17].

The primary reason for targeting SSH sessions is due to the fact that a significant number of servers are not well maintained and often make use of weak credentials which make a perfect target for malicious agents [12]. A preliminary analysis of credentials and passwords on SSH remote login servers from securehoney.net gave the following results (Table 1):

Table 1. Most common SSH usernames and passwords.

Full size table

The primary motive of our research is to find out how data with respect to login credentials propagates [15], once a hacker has been successful in obtaining access to an SSH server. Figure 1 shows how successful attacks on the honey tend to be clustered around certain locations [9].

We also have an image that shows us a zoomed-in perspective in China, from which the majority of the attacks had originally originated. As we can see from the image it appears as if all the attacks appear in pockets, which lends some preliminary support to the hypothesis that data of the credentials appears to spread in the vicinity of the original successful attempt. In the remainder of the paper, we make use of a variety of clustering methods to catch patterns that may escape the human eye (Fig. 2).

2 Related Work

Babak Nabiyev in his work on the application of Clustering Techniques for the detection of DDoS attacks had made use of the KDD CUP 99 dataset which had been developed by DARPA. He attempted to differentiate between Normal Traffic and DDoS traffic with the help of K-Means and EM Clustering techniques. He had clubbed together six cases of DoS attacks as a single type and he defined normal traffic flow to be the other type of behavior. Consquently, he made use of these two classes for the final clustering analysis [8].

Shi Zhong also had made use of different clustering techniques for intrusion detection. In addition, he had also made use of the DARPA intrusion detection project for his dataset. Furthermore, he had done a comparative study on different clustering algorithms for intrusion detection, in which he concluded that unsupervised clustering algorithms performed better than supervised learning methods. Out of all the clustering algorithms, his proposed self-labeling heuristic performed the best with an overall accuracy of 93.6% [19].

Nikolskaia Kseniia analyzed IP traffic with the help of clustering on IP packet headers. He considered multiple parameters such as the classification parameters based on packet and transmission properties, choice of clustering methods and the number of clusters. He concluded that real-time data is too complex to dynamically change features or clustering algorithms. A hybrid neural network approach showed the best results with about 95% correctness [11].

Jie Wang argues that clustering algorithms may not work very properly for intrusion detection because the similarity level of data points cannot be controlled. He proposes a two seed expanding algorithm that splits the attacks into different phases. The preprocessing includes creating a network flow and changing continuous-valued features to binary features. Based on these features, the algorithm selects seeds until all flows are divided into clusters. Their experiments show that two seed expanding algorithm performs better than the k-means and other clustering methods [18].

Geoff Boeing used k-means clustering and dbscan techniques to cluster 1759 points of latitude and longitude data and they were reduced to 138 points and obtained 92% compression, without losing out on the key features of the information that had been spatially represented within the dataset [2].

3 Research Framework

Experimental Setup. We have deployed honey pots with the distributed architecture as shown in Fig. 3.

The hypervisor runs five virtual machines, each of which runs a mini-Ubuntu 16.04. Each instance, in turn, runs a different honeypot. The traffic to the virtual machines is controlled with the help of a firewall and Network Address Translation(NAT) to assist us to communicate with the outside world. The server runs within the Information Security Laboratory of BITS, Pilani-Hyderabad campus network. The server continuously monitors the activity that occurs on the public IP addresses (Table 2).

Table 2. Spec table of the honeypot used | Kippo.

Full size table

4 Analysis

4.1 Attackers Origin

The origin of the attacker refers to the country or the city location from which the attack is being initiated. The source of their IP address help determines the location of the attacker. We made use of the urllib2 library to find the location of the attackers. However, IP addresses do not prove to be useful if the attacker makes use of a VPN or Tor Network. The results of the analysis have been mentioned in Table 3:

Table 3. Successful attempts city and country wise.

Full size table

Table 4. Most popular passwords and number of attempts

Full size table

Table 5. Most popular passwords and number of attempts

Full size table

We observe that there seem to be clusters of activity followed by patches of inactivity as seen in Fig. 4. Here, we observe there as spikes of activity in the second week and the last week of June as well as the second week of July as well as the end of October and the beginning of November. On the other hand, there seem to be very less attacks initiated in the months of August and September and hence they were not accommodated in the graph.

4.2 Traffic Analysis

We had segmented the data into files of 1MB size and had a total of 250MB data. The configuration had allowed at most 21 attempts from a particular IP before the IP was banned. Total 870 usernames and 9027 unique passwords were attempted.

The most attempted username was “root” and the most attempted password was “admin”. In addition to the popular combination of ‘root’ and ‘admin’ we also get to see that the attackers tried other popular default passwords such as ubnt (as we made use of the Ubuntu operating system) as well as 1234, support and password. Furthermore, the hackers had also made use of popular usernames such as admin, user and guest. This analysis shows something as simple as setting a strong username password combination can reduce the number of successful breaches in security. Finally, we observe that an overwhelming majority of attacks on the distributed honeypot system appear to be coming from China (Tables 4, 5 and 6).

Table 6. Two Day of Interactions for the Ho Chi Minh City, Vietnam on 26th June and 27th June, 2018—Obtained by 2 g Clustering Approach

Full size table

4.3 Machine Learning Analysis

On this data, we have made use of three clustering methods which has helped us gain insight on the attacker’s profile after obtaining access to the system. Here, we have pooled the data in a manner that is similar to that used within n-gram models of Natural language processing. Thus, the data comes in three forms-

Single day data
Two days at a time
Three days at a time

We have made use of 3 different clustering algorithms to gain a better insight on the information presented through the data. From the Figs. 5, 6 and 7 we observe that most of the attacks seem to be concentrated only in certain parts of the world. This means that the information gained by the attacker seems to be spreading only to the vicinity to the earliest attack, rather than spreading randomly over the world.

All three techniques seem to give us the similar results-

All techniques give cluster centers which are very close to one another.
The cluster centers obtained are similar across 1 g, 2 g and 3 g

On the other hand there seem to be some key differences-

The mean shift algorithm appears to be more susceptible to outliers, which causes it to detect a greater number of clusters.
On the other hand, the algorithm behaves better when we increase the number of data points as in the case of 2 g and 3 g.

To better understand why the clustering algorithms have singled out these locations, we have probed the data from 1 g, 2 g and 3 g on specific geographic locations so as to search for patterns that could help us better understand how the attack seems to propogate.

Table 7. One Day of interaction for the Date 27th October, 2018 from China on - Obtained from the 1 g clustering approach

Full size table

In the 1 g analysis for Table 7, we observe that all the successful attacks have appeared to have taken place one after another after short intervals of time. In addition, we can see that once an attacker gains access, it seems like the others in the vicinity gain access after a short interval of time.

In Table 8, we observe the following observation. The set of IP addresses that make a successful attempt on the first day are the same as those which are obtained on the following day. However, we notice that now there is a new IP from the same location that is now able to successfully gain access to the honeypot. This means either the attacker has gained access to a new IP or another attacker has received information about the same from another attacker in the same geolocation.

Table 8. Two day of interactions for the Ho Chi Minh City, Vietnam on 26^th June and 27^th June, 2018 | Obtained by 2 g clustering approach

Full size table

Table 9. 3 Days of interactions for the country of Vietnam from 6^th June to 8^th June 2018 | Obtained by 3 g clustering approach

Full size table

In Table 9, the pattern in the data obtained from the 3 g analysis further strengthens the observations that we had made in the case of 2 g. Here, we can clearly observe that the same set of IP addresses make attack in regular intervals of time. In addition, to those we see additional IP addresses which originate from the same or nearby locations which gives weight to the argument that the information about the credentials is spreading to the geographical vicinity.

5 Conclusion

We would like to draw the conclusion that attacks appear to be concentrated in certain regions. Furthermore, it appears as if the data with respect to the access credentials does not seem to spread randomly rather, it appears as if the success with respect to successful attacks seems to spread in the near vicinity of the first attack.

References

Bacudio, A., Yuan, X., Chu, B., Jones, M.: An overview of penetration testing. Int. J. Netw. Secur. Appl. 3, 19–38 (2011). https://doi.org/10.5121/ijnsa.2011.3602
Article Google Scholar
Boeing, G.: Clustering to reduce spatial data set size. arXiv preprint arXiv:1803.08101 (2018)
Callegati, F., Cerroni, W., Ramilli, M.: Man-in-the-middle attack to the https protocol. IEEE Secur. Priv. 7(1), 78–81 (2009)
Article Google Scholar
Doubleday, H., Maglaras, L., Janicke, H.: SSH honeypot: building, deploying and analysis. Int. J. Adv. Comput. Sci. Appl. 7 (2016). https://doi.org/10.14569/IJACSA.2016.070518
Feinstein, L., Schnackenberg, D., Balupari, R., Kindred, D.: Statistical approaches to DDOS attack detection and response. In: Proceedings DARPA information survivability conference and exposition, vol. 1, pp. 303–314. IEEE (2003)
Google Scholar
Klein, A., Golan, Z.: System and method for detecting and mitigating DNS spoofing trojans, 11 September 2012, US Patent 8,266,295
Google Scholar
Mirkovic, J., Reiher, P.: A taxonomy of DDOS attack and DDOS defense mechanisms. ACM SIGCOMM Comput. Commun. Rev. 34(2), 39–53 (2004)
Article Google Scholar
Nabiyev, B.: Application of clustering methods network traffic for detecting DDOS attacks. Probl. Inf. Technol. 09, 98–107 (2018). https://doi.org/10.25045/jpit.v09.i1.11
Article Google Scholar
Najafabadi, M.M., Khoshgoftaar, T.M., Kemp, C., Seliya, N., Zuech, R.: Machine learning for detecting brute force attacks at the network level. In: 2014 IEEE International Conference on Bioinformatics and Bioengineering, pp. 379–385. IEEE (2014)
Google Scholar
Nawrocki, M., Wählisch, M., Schmidt, T.C., Keil, C., Schönfelder, J.: A survey on honeypot software and data analysis. arXiv preprint arXiv:1608.06249 (2016)
Nikolskaia, K.: Network attacks detection based on cluster analysis, September 2017
Google Scholar
Owens, J., Matthews, J.: A study of passwords and methods used in brute-force SSH attacks (2008)
Google Scholar
Schuba, C.L., Krsul, I.V., Kuhn, M.G., Spafford, E.H., Sundaram, A., Zamboni, D.: Analysis of a denial of service attack on TCP. In: Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No. 97CB36097), pp. 208–223. IEEE (1997)
Google Scholar
Sochor, T., Zuzcak, M.: Study of internet threats and attack methods using honeypots and honeynets. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2014. CCIS, vol. 431, pp. 118–127. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07941-7_12
Chapter Google Scholar
Tatlı, E.: Cracking more password hashes with patterns. IEEE Trans. Inf. Forensics Secur. 10, 1 (2015). https://doi.org/10.1109/TIFS.2015.2422259
Article Google Scholar
Thomas, K., et al. (eds.): Data Breaches, Phishing, or Malware? Understanding the Risks of Stolen Credentials (2017)
Google Scholar
Valli, C., Rabadia, P., Woodward, A.: Patterns and patter-an investigation into SSH activity using kippo honeypots (2013)
Google Scholar
Wang, J., Yang, L., Wu, J., Abawajy, J.H.: Clustering analysis for malicious network traffic. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2017)
Google Scholar
Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Clustering-based network intrusion detection. Int. J. Reliab. Qual. Saf. Eng. 14(02), 169–187 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

BITS Pilani, Hyderabad Campus, Hyderabad, India
Himanshu Gupta, Tanmay Girish Kulkarni, Lov Kumar & Neti Lalita Bhanu Murthy

Authors

Himanshu Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Girish Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Lov Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Neti Lalita Bhanu Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Himanshu Gupta , Tanmay Girish Kulkarni , Lov Kumar or Neti Lalita Bhanu Murthy .

Editor information

Editors and Affiliations

Conservatoire National des Arts Métiers, Paris Cedex 03, France
Selma Boumerdassi
ESIEE Paris, Noisy-le-Grand, France
Éric Renault
Inria, Paris, France
Paul Mühlethaler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, H., Kulkarni, T.G., Kumar, L., Murthy, N.L.B. (2020). A Novel Approach Towards Analysis of Attacker Behavior in DDoS Attacks. In: Boumerdassi, S., Renault, É., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2019. Lecture Notes in Computer Science(), vol 12081. Springer, Cham. https://doi.org/10.1007/978-3-030-45778-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-45778-5_27
Published: 20 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45777-8
Online ISBN: 978-3-030-45778-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

A Novel Approach Towards Analysis of Attacker Behavior in DDoS Attacks

Abstract

Similar content being viewed by others

Hybrid Method for Discovering DDOS Attack

A Survey of Learning Techniques for Detecting DDOS Assaults

N-Tier Machine Learning-Based Architecture for DDoS Attack Detection

Keywords

1 Introduction

2 Related Work

3 Research Framework