Cybersleuth IEEE Format
Cybersleuth IEEE Format
Cybersleuth IEEE Format
Abstract—The dark web is a part of the World Wide Web The increasing rate of cybercriminal activities on the Dark
that requires a special protocol to access. Once inside, web sites Web are raising significant concerns among governments all
and other services can be accessed through a browser in much over the world. Experts everywhere are looking for solutions
the same way as the normal web. Some sites are effectively to protect people’s privacy and prevent security breaches by
hidden, in that they have not been indexed by a search engine
and can only be accessed if the address of the site is known.
analysing the vulnerabilities and the potential exploits possible
Special markets also operate within the dark web called dark net using these vulnerabilities. They are trying to detect and
markets, which mainly sell illegal products like drugs, firearms, analyse spoofed content, ransomwares and malwares to build a
pornography, counterfeit goods and even hitmen, paid for in the profile of the various attacks and attackers that exist in today’s
crypto currency Bitcoin. Dark web access is done using the TOR world, to develop appropriate solutions for the same. Due to
network with the TOR browser bundle. TOR is the most widely lack of focused crawlers for the Dark Web and the
used dark web browser. In this research, a crawler script was indiscernible nature of identities of the attackers on the Dark
initially developed, that scraped cybercrime domains on the dark Web, curbing this menace of cybercriminal activities on the
web, through the TOR service. Further, the preponderance of Dark Web is a major challenge faced by researchers and
each cybercrime domain was analysed, from the crawled data.
Results showed that, marketplaces for commercial services are
cybercrime experts today.
the highest used domains on the dark web, followed by
marketplaces for drugs. Later, the script was extended with the The motivation for this research was the increasing growth
functionality of scanning the hidden services along with their rate of cybercriminal activities on the Dark Web and its
clones, on the dark web, to detect data leaks and also visualizing potentially dangerous impact on the people. This research
the links between the hidden services, using their shared SSH concentrates mainly on developing a focused crawler for dark
keys. The visualization was in the form of clusters of IP web forums and identifying compromised SSH Keys used by
addresses. In order to aid a cybersecurity expert to curb this hidden services in the Dark Web. SSH Keys are alpha-
growing menace of crime through the web this tool has been numeric entities that uniquely identify a server that hosts the
proposed as a strategy.
hidden service. Crawlers are programs that visit a number of
Keywords—Dark Web, Cybercrime, Hidden Services, SSH websites on the web and index the collected data from these
Keys, Crawling, TOR websites for further analysis. Most of the focused crawlers [6]
are designed to collect information from the surface web with
I. INTRODUCTION little concentration on the dark websites. The dataset for this
research was collected using a deep web crawler from the
The “Dark Web” is a small part of the World Wide Web internet [1], and was used by Cybersleuth, to identify
that requires special protocols (such as I2P, Freenet or TOR) compromised SSH Keys, clones of hidden services and
to access it. The websites and services of the Dark Web are predominant marketplace domains in the Dark Web.
hidden, in that they are not indexed by a search engine.
Special markets and forums also operate on the dark web This paper is organized as follows. Section 2 discusses the
called, “Darknet Markets”, selling illegal products like drugs, literature survey, Section 3 covers the methodology and
firearms, pornography, counterfeit goods and even hitmen, framework such as the crawlers used and dataset collected for
paid for, with crypto currency called bitcoins. Scams, hoaxes analysis. Section 4 highlights the experimental results of the
and cyber terrorism also happen over the Dark Web. These crawler script and the clones of hidden services on the Dark
illegal activities happening on the Dark Web disrupt the life of Web by analyzing their SSH Keys. Finally, Section 5 talks
common people as well as corporate enterprises that face huge about the conclusion and future work.
data losses, illegal use of copyrighted products, etc. People
face difficulties in carrying out their daily tasks and other
activities online, as they tend to become potential victims to
cybercriminal activities on the Dark Web.
II. LITERATURE SURVEY
Dark Web mining is a popular and trending research area in
the current scenario and several people have tried to scour the
Dark Web to collect information on potentially dangerous sites
and have also attempted to predict emerging threats. In this
study [2], a deep web crawler was developed that collected
URLs containing hidden resources from the Dark Web using
TOR and I2P. These collected URLs were classified language-
wise and country-wise. Page scouting was performed on the
acquired URLs and the protocols used by them were identified
to create a complete profile of the various hidden resources in
the Dark Web. The main objective of this research [3] is to
gather intelligence related to malicious forums and illegal
markets in the Dark Web in order to aid a security expert in
threat analysis.
Terrorists and other attackers extensively use the Dark Web
forums for secure communication and the darknet markets for
Fig. 1 System Architecture of Cybersleuth
arms dealing or other illegal purchases. Several terrorist groups
can be identified as active users of the Dark Web [5][7]. In this C. Analysis of hidden services on the Dark Web
work [4], various groups such as Hizbollah, Hamas, Palestinian This module was developed as four sub-modules such as,
Islamic Jihad, etc., were uncovered by crawling the Dark Web
using Arabic Jihad keywords. This study was able to gather a 1) Onion Scanning: In the sub-module, a user feeds in a list of
lot of information about predominant terrorist groups in the .onion sites (hidden services) to this tool using the Cybersleuth
Dark Web and give a detailed report about them after analysis GUI. The script accepts this list of onion sites and scans each
of current trends among the groups, the relationships between site separately using OnionScan (a Python framework written
them and the slogans they use frequently. in Go) and creates .JSON files for each of them that contains
SSH Keys of the hidden services, server version, protocols and
III. METHODOLOGY databases used, linked sites, etc.
This research was completed in three modules namely, 2) Identification of compromised SSH Keys: This sub-
module discovered Clearnet servers that shared SSH
1. Dark Web Crawling fingerprints with hidden services, using Shodan (A python API
2. Analysis of hidden services on the Dark Web that lets the user to find devices connected to a network). The
SSH fingerprint is a short sequence of alpha-numeric
3. Blacklisting of malicious sites in TOR characters that represents the public key of the server that a
user tries connecting to. It can be used to uniquely identify
A. System Architecture servers and devices. Fig. 2 above summarizes this sub-
In Fig. 1, as one can see, a security analyst uses module’s methodology.
Cybersleuth to access the Dark Web to monitor and detect the 3) Finding Clones of hidden services: This sub-module
cybercriminal activities in it. Cybersleuth uses the TOR uncovered clusters of sites that are similar based on their DOM
Service to enable the user to perform various operations on the (XML structure of the index page), which helped find clones of
Dark Web such as Analysis of Hidden Services, Crawling the hidden services using a machine learning library called scikit-
Dark Web, Domain Profiling of various darknet markets and learn.
Blacklisting of these potentially dangerous .onion sites in TOR
using Cybersleuth’s proprietary TOR filter. Fig. 1 represents
Cybersleuth’s system architecture.