Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cybersleuth IEEE Format

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Cybersleuth: Crawling the Dark Web to detect and

analyse the cybercrimes committed over it

Sangavi Vijayakumar, Sashaank Pejathaya Murali


Department of Information Technology, SSNCE
SSN College of Engineering
Chennai, India
sangavi.vijayakumar@gmail.com, sashaankpm@gmail.com

Abstract—The dark web is a part of the World Wide Web The increasing rate of cybercriminal activities on the Dark
that requires a special protocol to access. Once inside, web sites Web are raising significant concerns among governments all
and other services can be accessed through a browser in much over the world. Experts everywhere are looking for solutions
the same way as the normal web. Some sites are effectively to protect people’s privacy and prevent security breaches by
hidden, in that they have not been indexed by a search engine
and can only be accessed if the address of the site is known.
analysing the vulnerabilities and the potential exploits possible
Special markets also operate within the dark web called dark net using these vulnerabilities. They are trying to detect and
markets, which mainly sell illegal products like drugs, firearms, analyse spoofed content, ransomwares and malwares to build a
pornography, counterfeit goods and even hitmen, paid for in the profile of the various attacks and attackers that exist in today’s
crypto currency Bitcoin. Dark web access is done using the TOR world, to develop appropriate solutions for the same. Due to
network with the TOR browser bundle. TOR is the most widely lack of focused crawlers for the Dark Web and the
used dark web browser. In this research, a crawler script was indiscernible nature of identities of the attackers on the Dark
initially developed, that scraped cybercrime domains on the dark Web, curbing this menace of cybercriminal activities on the
web, through the TOR service. Further, the preponderance of Dark Web is a major challenge faced by researchers and
each cybercrime domain was analysed, from the crawled data.
Results showed that, marketplaces for commercial services are
cybercrime experts today.
the highest used domains on the dark web, followed by
marketplaces for drugs. Later, the script was extended with the The motivation for this research was the increasing growth
functionality of scanning the hidden services along with their rate of cybercriminal activities on the Dark Web and its
clones, on the dark web, to detect data leaks and also visualizing potentially dangerous impact on the people. This research
the links between the hidden services, using their shared SSH concentrates mainly on developing a focused crawler for dark
keys. The visualization was in the form of clusters of IP web forums and identifying compromised SSH Keys used by
addresses. In order to aid a cybersecurity expert to curb this hidden services in the Dark Web. SSH Keys are alpha-
growing menace of crime through the web this tool has been numeric entities that uniquely identify a server that hosts the
proposed as a strategy.
hidden service. Crawlers are programs that visit a number of
Keywords—Dark Web, Cybercrime, Hidden Services, SSH websites on the web and index the collected data from these
Keys, Crawling, TOR websites for further analysis. Most of the focused crawlers [6]
are designed to collect information from the surface web with
I. INTRODUCTION little concentration on the dark websites. The dataset for this
research was collected using a deep web crawler from the
The “Dark Web” is a small part of the World Wide Web internet [1], and was used by Cybersleuth, to identify
that requires special protocols (such as I2P, Freenet or TOR) compromised SSH Keys, clones of hidden services and
to access it. The websites and services of the Dark Web are predominant marketplace domains in the Dark Web.
hidden, in that they are not indexed by a search engine.
Special markets and forums also operate on the dark web This paper is organized as follows. Section 2 discusses the
called, “Darknet Markets”, selling illegal products like drugs, literature survey, Section 3 covers the methodology and
firearms, pornography, counterfeit goods and even hitmen, framework such as the crawlers used and dataset collected for
paid for, with crypto currency called bitcoins. Scams, hoaxes analysis. Section 4 highlights the experimental results of the
and cyber terrorism also happen over the Dark Web. These crawler script and the clones of hidden services on the Dark
illegal activities happening on the Dark Web disrupt the life of Web by analyzing their SSH Keys. Finally, Section 5 talks
common people as well as corporate enterprises that face huge about the conclusion and future work.
data losses, illegal use of copyrighted products, etc. People
face difficulties in carrying out their daily tasks and other
activities online, as they tend to become potential victims to
cybercriminal activities on the Dark Web.
II. LITERATURE SURVEY
Dark Web mining is a popular and trending research area in
the current scenario and several people have tried to scour the
Dark Web to collect information on potentially dangerous sites
and have also attempted to predict emerging threats. In this
study [2], a deep web crawler was developed that collected
URLs containing hidden resources from the Dark Web using
TOR and I2P. These collected URLs were classified language-
wise and country-wise. Page scouting was performed on the
acquired URLs and the protocols used by them were identified
to create a complete profile of the various hidden resources in
the Dark Web. The main objective of this research [3] is to
gather intelligence related to malicious forums and illegal
markets in the Dark Web in order to aid a security expert in
threat analysis.
Terrorists and other attackers extensively use the Dark Web
forums for secure communication and the darknet markets for
Fig. 1 System Architecture of Cybersleuth
arms dealing or other illegal purchases. Several terrorist groups
can be identified as active users of the Dark Web [5][7]. In this C. Analysis of hidden services on the Dark Web
work [4], various groups such as Hizbollah, Hamas, Palestinian This module was developed as four sub-modules such as,
Islamic Jihad, etc., were uncovered by crawling the Dark Web
using Arabic Jihad keywords. This study was able to gather a 1) Onion Scanning: In the sub-module, a user feeds in a list of
lot of information about predominant terrorist groups in the .onion sites (hidden services) to this tool using the Cybersleuth
Dark Web and give a detailed report about them after analysis GUI. The script accepts this list of onion sites and scans each
of current trends among the groups, the relationships between site separately using OnionScan (a Python framework written
them and the slogans they use frequently. in Go) and creates .JSON files for each of them that contains
SSH Keys of the hidden services, server version, protocols and
III. METHODOLOGY databases used, linked sites, etc.
This research was completed in three modules namely, 2) Identification of compromised SSH Keys: This sub-
module discovered Clearnet servers that shared SSH
1. Dark Web Crawling fingerprints with hidden services, using Shodan (A python API
2. Analysis of hidden services on the Dark Web that lets the user to find devices connected to a network). The
SSH fingerprint is a short sequence of alpha-numeric
3. Blacklisting of malicious sites in TOR characters that represents the public key of the server that a
user tries connecting to. It can be used to uniquely identify
A. System Architecture servers and devices. Fig. 2 above summarizes this sub-
In Fig. 1, as one can see, a security analyst uses module’s methodology.
Cybersleuth to access the Dark Web to monitor and detect the 3) Finding Clones of hidden services: This sub-module
cybercriminal activities in it. Cybersleuth uses the TOR uncovered clusters of sites that are similar based on their DOM
Service to enable the user to perform various operations on the (XML structure of the index page), which helped find clones of
Dark Web such as Analysis of Hidden Services, Crawling the hidden services using a machine learning library called scikit-
Dark Web, Domain Profiling of various darknet markets and learn.
Blacklisting of these potentially dangerous .onion sites in TOR
using Cybersleuth’s proprietary TOR filter. Fig. 1 represents
Cybersleuth’s system architecture.

B. Crawling the Dark Web


A python script was developed to crawl .onion sites on the
Dark Web using the TOR service, for the purpose of
determining predominant domains of cybercrimes committed
in the Dark Web. The script collected various darknet sites up-
to three depths and classified them under popular domains such
as Search Engines, Bitcoin Wallets, Narcotics, Trafficking and
Counterfeiting based on keywords present on the name of the
web document.

Fig. 2. SSH Key Analysis


Fig. 3. Blacklisting hidden services in TOR using a Firefox add-on.

4) Visualization: In this sub-module, the relationships between


hidden services, Clearnet sites and IP addresses are represented
in a graphical manner using Gephi (an open-source Fig. 5. Relationships between hidden services, Clearnet Servers and their IP
Addresses
visualization and network analysis software).
sites in one day. It was observed that Search Engines and
D. Blacklisting of malicious sites in TOR Bitcoin Wallets were the most popular sites in the Dark Web
The final module consists of a TOR browser (Firefox) with a popularity value (calculated as the ratio of number of
extension that accepts a malicious site from the user and sites in that domain to the total number of sites crawled) of 9
calculates the MD5 hash (Hashing algorithm that converts a and 8 respectively, with 9 being the highest. Trafficking comes
given string into a 128-bit hash value) of the same. This hashed next with a popularity factor of 4, followed by counterfeiting
value is added to the list of blacklisted sites in the JavaScript and narcotics with popularity values of 2 and 1 respectively as
program of the add-on in Firefox. This site remains blocked shown in Fig. 4.
unless removed from the list manually. Fig. 3 above shows The second part of this research, as mentioned above, was
how the blacklisting is performed. performed in four portions. Initially, onionscanning was done
for each input hidden service. As a result, JSON files were
IV. EXPERIMENTAL RESULTS AND DISCUSSION created containing the SSH fingerprints for each input hidden
This section explores the results of this research in detail. service. Secondly, compromised SSH Keys of hidden services
The first part of this experiment crawled .onion sites from the were identified that were shared by Clearnet Servers, as shown
Dark Web using TOR, having https://thehiddenwiki.org/ as the in the figure below. Thirdly, clusters of sites that are similar
index site up-to three depths. (It can also accept other index based on their DOM, helped find clones of hidden services
sites of the user’s choice). Additionally, domain profiling was using scikit-learn. Lastly, the associations between hidden
performed, where the sites were grouped under five domains services, Clearnet sites and IP addresses were represented in a
such as Search Engines, Bitcoin Wallets, Narcotics, Trafficking graphical manner using Gephi, as shown in Fig. 5.
and Counterfeiting based on keywords present in the web
document’s name. The crawler managed to collect over 1200
Fig. 5 shows the orange nodes (Clearnet Servers) overlapping a
lot of purple nodes (Hidden Service), which shows that a lot of
hidden services are covertly hosted on legitimate Clearnet
Servers. Fig. 6 illustrates this in detail.

Fig. 6. Enhanced view of the relationships between the Clearnet Servers,


Fig.4 Popularity of various domains in the Dark Web Hidden Services and IP Addresses
Fig. 8. Successful implementation of Cybersleuth’s TOR Filter in the user’s
browser

of the given sites and added them to the browser’s add-on in


Fig. 7. Cyberseluth’s TOR Filter’s user interface order to block all future connections to that site using TOR.
Cybersleuth was developed as a tool to aid Dark Web
The third part of this study involved manual blacklisting of cyberthreat intelligence experts to combat the burgeoning rate
illegal .onion sites through a Firefox Add-on for TOR Browser of cybercrimes. For future work there is a plan to extend this
that was developed. A snippet of this is shown in Fig. 7 above. research to identify the exact location where these potentially
Furthermore, the repository of crawled sites can be submitted dangerous sites are hosted, followed by automated blacklisting
to TRAI, which will advise the ISPs to block these sites of the same and integration of these modules into a fully-
globally and permanently. Figure below shows how a given functioning and independent tool.
site’s hash value is taken and added to Cybersleuth’s Firefox
add-on. This add-on is then enabled in the user’s browser,
which blocks the site permanently in the user’s browser. Fig. 8 REFERENCES
represents the successful implementation of Cybersleuth’s [1] https://darkweb.hunch.ly/
TOR Filter in the user’s browser. [2] Balduzzi, M., V. Ciancaglini, “Cybercrime in the Deep-
Web.”, Proc. Black Hat 2015 EU, 2015.
V. CONCLUSION AND FUTURE WORK [3] Eric, Nunes, et al., “Darknet and deepnet mining for
Cybersleuth is primarily a tool that performs three different proactive cybersecurity threat intelligence.”, Intelligence
operations on the Dark Web. It crawls the Dark Web using a and Security Informatics (ISI), 2016 IEEE Conference on,
Python script for hidden services and performs a domain IEEE, pp. 07-12, 2016.
profiling of the various illegal markets, Bitcoin Wallets and [4] Chen, Hsinchun., “Dark web: Exploring and mining the
popular search engines used in the Dark Web. Cybersleuth dark side of the web.”, Intelligence and Security
also scans a given set of .onion sites to extract their SSH Keys Informatics Conference (EISIC), 2011 European, IEEE,
and compares them with Clearnet Server’s SSH Keys using pp. 01-02, 2011.
Shodan to determine whether these illegal sites are hosted [5] Chen, Hsinchun, et al., “Uncovering the dark Web: A
covertly on legitimate servers. Additionally, this module of case study of Jihad on the Web.”, Journal of the
Cybersleuth also identified the clones of various hidden Association for Information Science and Technology,
services on the Dark Web to enable users to blacklist these 59.8, pp. 1347-1359, 2008.
sites easily. Moreover, Cybersleuth calculated the MD5 hash [6] Ahmed Abbasi, Fu, Hsinchun Chen and Tianjun., “A
focused crawler for Dark Web forums.”, Journal of the
Association for Information Science and Technology,
61.6, pp. 1213-1231, 2010.
[7] Yulei, Zhang, et al., “Dark web forums portal: searching
and analyzing jihadist forums.”, Intelligence and Security
Informatics, 2009. ISI'09. IEEE International Conference
on., IEEE, pp. 71-76, 2009.

You might also like