Detection of Phishing Attack

Detection of phishing attacks
Introduction
Domain: Networking
Elementary schools have embraced computers as an effective means of engaging

students in the learning process. They serve the students' needs in a variety of ways. Students
welcome computers as a tool for learning—as well as a fun choice at free-time. Adults marvel at
how easily students interact with computers and how motivated they are to use them
In support of this realization we've seen explosive growth of information technology in

elementary schools. A combination of federal legislation and funding in support of increased
access to technology has fueled this growth. Elementary schools now have networked computer
labs, libraries, classrooms, administrative offices and special services.
The Telecommunications Act of 1996 served to expand and maintain an existing system
of universal service that provides schools and libraries with affordable access to advanced
telecommunications.1 As a result, the proportion of instructional classrooms with Internet access
increased from 14% in 1996 to 77% in the year 2000, with about 98% of schools having some
internet access.
While elementary schools may not be in the business of generating revenue, they are held
accountable for making sound investments in their educational facilities. In recent years
Technology Literacy Challenge Funds and E-Rate discounts allowed schools to invest in new
computers, peripherals, software, high-speed Internet access, networking equipment and
infrastructure, as well as personnel to mentor the use of information technology. K-12
technology expenditures were expected to reach $8.8 billion by 2001-2002.4
For the first time, in many schools, new computers and networking equipment have been
deployed en masse. This creates a need to provide adequate technical support for these
installations. Elementary schools often struggle to afford technicians who have formal
certification or IT-related degrees. It is common to find a part-time technician supporting a
multi-platform LAN of servers, routers, switches and hubs with anywhere from 50 to 150
networked clients, the installed software and peripherals; printers, scanners, still and digital
video cameras, and projection devices. The age of the equipment varies widely with a number of
operating system implementations and software versions to match, further increasing the need
for support.
Computer networks present a new set of challenges to administrators and technical

support personnel for providing a safe learning environment. Not so long ago the hot debate
about network security in elementary schools was whether students should have password-
protected accounts or whether the "cubby rule" sufficed. The cubby rule states, "You don't touch
things in your neighbor's cubby" (and, by extension, you don't log into your neighbor's account
on the network and mess with their files). Kindergarteners being introduced to computer lab
rules nod their heads sagely when network security policy is presented in this context. They
know the cubby rule.
Today, network security is a much bigger issue and the context is difficult to define. The
dangers are real; they are physical, digital and intellectual, with threats that multiply and divide.
The threats exist within and without, feeding off vulnerabilities that are inherent in the
technology and the users. It is daunting for technical support personnel in elementary schools
(who quite often have other professional responsibilities) to identify, quantify, and justify the
measures necessary to maintain a safe and secure network installation.
A Network Security Policy defines the school's expectations for proper computer and
network use and defines procedures to prevent and respond to security incidents. The goal of the
policy, written clearly and concisely, is to balance the availability of resources with the need for
protection. The policy describes what is covered, defines contacts and responsibilities, and
outlines how violations will be handled.
Abstract
Detecting & Identifying phishy websites is a tedious work. Several attributes are needed to
be taken into consideration & finally using the data mining algorithms, a final decision is made.
For finding these attribute we have used MapReduce algorithm in collaboration with Hadoop file
system. In existing Online Phishing Detection systems, usually the reference to the database is
taken for making any conclusion about the degree of phishiness of the website. In this proposed
system, we concentrate on getting the necessary attributes in real time environment using
Hadoop-MapReduce, thus increasing both speed & efficiency of the system. This system
is very trustful, which surely guarantees that we will not miss a phishy website, even if it is a
new-born.
Introduction:
Phishing is a fraudulent attempt, usually made through email, to steal your personal
information. Phishing websites are forged websites created by malicious people to mimic
real websites. The victims of these phishing websites or emails might expose their personal
information like the passwords of their credit card, bank account etc. This normally results into
the financial loss of the victim. Currently the systems available in the market like anti-
phishing toolbars , anti-phishing filters [2] embedded in the antivirus programs, etc. rely
mainly on the database obtained by checking the suspicious websites & storing the phishy
websites in the database. The procedure involves a number of steps. Firstly the anti-phishing
program tries to detect the phishing website on its accord, if it failed to decide about the
authenticity of the program, it simply sends the URL to the server for checking the
authenticity. The testers check whether the site is original or not, if not, then an entry is
made in the database & the updated database is sent to all the known users of the anti-phishing
product. The whole process is lengthy, so relying totally on it might prove dangerous.
The solution is to check the website attributes, as many possible, in the real time
environment. The less the suspicious values of the attributes, more the authenticity of the
website, and vice versa. Since the process is done in the real time, the user doesn’t need to wait
till the update is made & downloaded in proposed system. User is safe from the phishing
websites if user uses our proposed system.
Machine learning has found its significant applications in various security related fields
like Intrusion detection system, email filtering, social network analysis, image analysis.
Likewise in anti-phishing also machine learning is very useful. It is needed to speed up in finding
attribute values which can be given by means of Hadoop-MapReduce. MapReduce using
Hadoop is also having significant application in many ways. It is used mostly in cloud
services EC2. Hadoop-MapReduce is mainly used to handle large sized files. Here we are going
to use this quality of Hadoop-MapReduce in making fast decision about the incoming website
URLs are phishy or not.
Literature survey
Title 1: Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth
Mover's Distance (EMD)
Author: Fu, A.Y.; Liu Wenyin; Xiaotie Deng
Year: 2006
Description:
An effective approach to phishing Web page detection is proposed, which uses Earth
Mover's Distance (EMD) to measure Web page visual similarity. We first convert the involved
Web pages into low resolution images and then use color and coordinate features to represent the
image signatures. We use EMD to calculate the signature distances of the images of the Web
pages. We train an EMD threshold vector for classifying a Web page as a phishing or a normal
one.
Advantages:
Large-scale experiments with 10,281 suspected Web pages are carried out to show high
classification precision, phishing recall, and applicable time performance for online enterprise
solution. We also compare our method with two others to manifest its advantage. We also built
up a real system which is already used online and it has caught many real phishing cases.
Disadvantages:
This corpus is available for free for developers developing applications. There are other
techniques for identification of phishing web-pages include image processing. It not highly
effective and accurate.
Title 2: A Hybrid System to Find & Fight Phishing Attacks Actively
Author: Hong Bo; Wang Wei; Wang Liming; Geng Guanggang; Xiao Yali; Li Xiaodong; Mao
Wei
Year: 2011
Description:
Traditional anti-phishing methods and tools always worked in a passive way to receive
users' submission and determine phishing URLs. Usually, they are not fast and efficient enough
to find and take down phishing attacks. We analyze phishing reports from Anti-phishing Alliance
of China(APAC) and propose a hybrid method to discover phishing attacks in an active way
based on DNS query logs and known phishing URLs.
Advantages:
We develop and deploy our system to report living phishing URLs automatically to
APAC every day. Our system has become a main channel in supplying phishing reports to
APAC in China and can be a good complement to traditional anti-phishing methods.
Disadvantages:
The webpage image is divided into block and block by block characters are cross checked with
the certain base line and reach to the conclusion. In case of Logo based watermarks in those
logos are checked. Server based techniques directly scan E-mail servers, domain servers which
host websites, DNS servers which resolve the URLs.
Title 3: MapReduce: simplified data processing on large clusters
Author: J. Dean and S. Ghemawat
Year: 2008
Description:
Map Reduce is a programming model and an associated implementation for processing and
generating large data sets. Users specify a map function that processes a key/value pair to
generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Many real world tasks are expressible in this
model, as shown in the paper.
Advantages:
Our implementation of MapReduce runs on a large cluster of commodity machines and is

highly scalable: a typical MapReduce computation processes many terabytes of data on
thousands of machines. Programmers and the system easy to use: hundreds of MapReduce
programs have been implemented and upwards of one thousand MapReduce jobs are executed
on Google's clusters every day.
Disadvantages:
In some cases, users of MapReduce have found it convenient to produce auxiliary les as
additional outputs from their map and/or reduce operators. We do not provide support for atomic
two-phase commits of multiple output les produced by a single task.
Title 4: Modelling Intelligent Phishing Detection System for E-banking Using Fuzzy Data
Mining Author: Aburrous, M.; Hossain, M.A.; Dahal, K.; Thabatah, F.
Year: 2009
Description:
In this paper, we present novel approach to overcome the 'fuzziness' in the e-banking
phishing Website assessment and propose an intelligent resilient and effective model for
detecting e-banking phishing Websites. The proposed model is based on Fuzzy logic (FL)
combined with Data Mining algorithms to characterize the e-banking phishing Website factors
and to investigate its techniques by classifying there phishing types and defining six e-banking
phishing Website attack criteria's with a layer structure.
Advantages:
 The proposed e-banking phishing Website model showed the significance

importance of the phishing Website two criteria's (URL & Domain Identity) and
(Security & Encryption) in the final phishing detection rate result, taking into
consideration its characteristic association and relationship with each others as
showed from the fuzzy data mining classification and association rule algorithms.
Disadvantages:
 As it fulfilled all the requirements, we didn’t modify it, instead concentrated on

making it efficient and speedy.
Existing System
Phishing is direct attack on identity of user, attacker steals identity of user and
impersonate as that victim user. So it is way too different than the virus, malware attacks. It is
more of user specific attack so security need to be provided at user level. For user level
security toolbars are developed as add-on to the browsers. Netcraft toolbar for the Mozilla
browser. Mostly work of these toolbars is to just send URL to their respective servers where all
necessary processing is done. After finding the result it is sent back to the toolbar which
indeed displays the result in that respective browser. This process takes considerable amount
of time to reduce this real time processing at end user is necessary. The other way is to use
of Black Lists by the browsers. Black List is maintained by browsers like Google Crome
(Google Safe Browsing). These Black Lists are updated to time manually by hiring expert who
manually categories suspected URL whether they are Genuine of phishing. These updates may
take little time as it is done manually and updated manually. Other browsers also use same
technic for anti-phishing. Anti-phishing working Group (AWPG) helps its partners to build anti-
phishing solutions. AWPG generates monthly reports about the current phishing activities.
It keeps an eye on phishing activities all over the world and collaborates with its partners with
the information obtained. Another main contributor to the anti-phishing work is phishtank.com.
It provide free corpus of the current active phishing websites as well history. This corpus also
contains detailed information about that phishing website. Phishtank.com corpus is updated
by the volunteering users who report phishing websites. This corpus is available for free
for developers developing applications. There are other techniques for identification of
phishing web-pages include image processing. In this technic snap-shot image of the webpage
is compared with the original web-pages of legitimate websites. The original sites show up
with certain logo characteristics which helps them to differentiate from the duplicate
phishing websites. The webpage image is divided into block and block by block characters are
cross checked with the certain base line and reach to the conclusion. In case of Logo based
watermarks in those logos are checked. Server based techniques
directly scan E-mail servers, domain servers which host websites, DNS servers which
resolve the URLs. E-mail server based technique extract all the suspected URLs from
inbox as well as from spams and examine them as most attacked users are from phishing
E-mails.
Disadvantages:
• The original sites show up with certain logo characteristics which helps them to
differentiate from the duplicate phishing websites.
• The webpage image is divided into block and block by block characters are cross checked
with the certain base line and reach to the conclusion.
• In case of Logo based watermarks in those logos are checked. Server based techniques
directly scan E-mail servers, domain servers which host websites, DNS servers which
resolve the URLs.
• E-mail server based technique extract all the suspected URLs from inbox as well as from
spams and examine them as most attacked users are from phishing E-mails.
Proposed System:
The user will enter the URL of the webpage, she wishes to visit. Using that URL, we will
download the source code of the webpage & then decide the values of the attributes. For finding
these values we will make use of Hadoop-MapReduce [9]. This will speed up the
process of attribute value assignment. Basic word count example [10] of Hadoop-
MapReduce is used to search sensitive words in webpages. In same way wherever required
help of Hadoop is taken. These calculated attributes are the input to the Prediction module.
Based on the records stored from phishtank.com database, training data is prepared. All
the characteristics of reported phishing website at phishtank.com corpus are studied and
based on that attributes are decided and training data for machine learning algorithm is
prepared. Using training data machine learning algorithm generates set of rules based on which
decision is to be made. Prediction module gets two inputs rules generated by machine
learning algorithm and attribute found from requested URL. Prediction module finally
predict URL falls under which category (Phishing, Legitimate, and Doubtful).
While concentrating on getting the required attributes, which will be enough to decide the
phishiness of the website, we searched lot of documents, formulated a few on our own. But since
the attackers are quite advanced these days, we needed to consider the visual aspects too,
along with the usual coding methods. We got this architectural model the Intelligent
Phishing Detection System for e-banking Using Fuzzy Data Mining. As it fulfilled all the
requirements, we didn’t modify it, instead concentrated on making it efficient and speedy.
The three layers act as the backbone of the system. The layer manager part of prediction
module acts as the brain of the system, making the decisions there itself. As the system is not
limited to specific use, you can use it for any general purpose. The twenty seven attributes as
seen in the figure, are calculated in real time & these values might be different from the values
calculated for the same website, even if slight changes are made to the website. Thus the
user can remain free from a suspicious website, although it was previously listed as the
authentic one. In order to simplify the system, we have simply kept all the three layers at the
same priorities, thus ensuring that even the less important factors can take part in decision
making. This helps in detecting the suspicious websites. There are some attributes which need
only word count whether that word is present in source code or not. For example ATM
PIN is restricted word then we just need to find out in source code of page word is present
or not and assign value to attribute. These situations are very often in anti-phishing systems
based on machine learning.
Advantages:
• This process obviously take several seconds thought it takes several milliseconds for
actual processing. It suffers from network propagation delay. Instead we brought the
processing at proxy server so it that delay is minimized.
• One obvious result of using the Hadoop-Map Reduce is speedup. For results we are going
to compare the proposed system against existing all type of anti-phishing systems using
timely updated lists, Toolbars, data mining based systems.
• There are data mining based anti-phishing systems which use browser extensions just to
send URL at processing site and then process URL at processing site ad send back result
to the requested user.
• One thing to be noticed is when the URL is genuine it takes a lot of time for response
from other systems as it has to go through all processing to prove it is Normal URL so the
response time increases but in case of our system that time is also reduced though it has
to go through extensive processing.
SYSTEM REQUIREMENTS
Software Requirements
• O/S : Windows XP.
• Language : Java.
• IDE : Net Beans 6.9.1
• Data Base : My Sql
Hardware Requirements
• System : Pentium IV 2.4 GHz
• Hard Disk : 160 GB
• Monitor : 15 VGA color
• Mouse : Logitech.
• Keyboard : 110 keys enhanced

• Ram : 2GB
Modules:
 Process dataset
 Hadoop Mapping
 Hadoop Reduce
 Assign Values
 Compute Result
Modules Description:
Process dataset:
• The user will enter the URL of the webpage, she wishes to visit. Using that URL, we will
download the source code of the webpage & then decide the values of the attributes.
• This Module Performs to get the web site URL and get the data set regarding to our
system.
• Process the dataset and add the data set into database tables.
• Preprocessing the dataset for adopt dataset.
Input Web Url
Get Source of URL
Generate
attributes
Show Attributes
Hadoop Mapping:
• The map or mapper’s job is to process the input data. Generally the input data is in the
form of file or directory and is stored in the Hadoop file system (HDFS).
• The input file is passed to the mapper function line by line. The mapper processes the
data and creates several small chunks of data.
Get attributes
Classify Phishing
attributes
Mapping the
attributes
Separate the
attributes
Hadoop Reduce
• This stage is the combination of the Shuffle stage and the Reduce stage.
• The Reducer’s job is to process the data that comes from the mapper.
• After processing, it produces a new set of output, which will be stored in the HDFS.
Get Mapped attributes
Produce categories
Create layers
Show layers
Assign Values
• Generating rules is quite tedious job. In order to make this job more reliable we
made the use of WEKA data mining tool.
• With all the data mining algorithms implemented, it proved very helpful in determining
the most efficient ones.
• When actual experiments were done on the database, it was found that although for
smaller data sets J4.8 & PRISM prove more efficient but when a dataset comprising
of more than 500 entries was taken, PART leaped much ahead of them. So we
preferred PART over all others. PART stands for Projective Adaptive Resonance
Theory.
Get occurrence
Match Values
Assign Value
Compute Result
• First of all the sub-layers attributes are found and hierarchically layer wise the final result
is generated in last step.
• For generating the intermediate datasets while finding phishi ness of any site we need to
simply put the “?” for the last attribute in every layer attribute so that weka tool rule
generator will add the predicted value at the place of “?” in dataset given to it, which will
be used for the further process in hierarchically next level and so on.
• At last we get the final result as phishy, Legitimate or Doubtful. Efficiency goes on
increasing as the correctly classified instances percentage increases. For that accurate
priority-based dataset is provided to the rule generator.
• As the numbers of records in the data set are increased the correctly classified instances
are increased.
Compute Value
Generate Rule
Produce Result
System Architecture:
Sequence Diagram:
Hadoop Hadoop
User Mining data Prediction
Mapping Reduce
Web URL
Select phishing
attributes Layered the
attributes
Produce
Result
Activity Diagram
Input Web Collection

url data
Hadoop Mining The
Mapping data
Hadoop
Set Rules
Reduce
Selected Predict
attributes Module
Result
Trusted Phishing
Flow Diagram:
Start
Input Web url

Hadoop Mapping
Hadoop Reducing
Generate Attribute
Mining the data
Predict the Rule
Trusted Doubted Phishing
Sample Coding:
1. Weburl
import java.awt.Color;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import static java.lang.Thread.sleep;
import java.net.URL;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Scanner;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Weburl extends javax.swing.JFrame {
Connection con1;
Statement st1, stmt;
ResultSet rs;
public static String site;
public Smartcraw() {
initComponents();
try {
Class.forName("com.mysql.jdbc.Driver");
con1 = DriverManager.getConnection("jdbc:mysql://localhost:3306/smartcrawl",
"root", "");
stmt = con1.createStatement();
} catch (Exception e) {
}
}
@SuppressWarnings("unchecked")
// <editor-fold defaultstate="collapsed" desc="Generated Code">
private void initComponents() {
jLabel1 = new javax.swing.JLabel();
jPanel1 = new javax.swing.JPanel();
jToggleButton1 = new javax.swing.JToggleButton();
jTextField1 = new javax.swing.JTextField();
jTextField2 = new javax.swing.JTextField();
jProgressBar1 = new javax.swing.JProgressBar();
jScrollPane1 = new javax.swing.JScrollPane();
jTextArea1 = new javax.swing.JTextArea();
jButton1 = new javax.swing.JButton();
jMenuBar1 = new javax.swing.JMenuBar();
jMenu1 = new javax.swing.JMenu();
jMenuItem1 = new javax.swing.JMenuItem();
jMenuItem2 = new javax.swing.JMenuItem();
jMenu2 = new javax.swing.JMenu();
setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
jLabel1.setFont(new java.awt.Font("Times New Roman", 1, 18)); // NOI18N

jLabel1.setText("SmartCrawler: A Two-stage Crawler for");

jLabel2.setText("Efficiently Harvesting Deep-Web Interfaces");
jPanel1.setBorder(javax.swing.BorderFactory.createLineBorder(new
java.awt.Color(0, 0, 0)));

jLabel3.setText("Enter URL");

jLabel4.setText("Limit of Crawl");
jToggleButton1.setFont(new java.awt.Font("Tw Cen MT Condensed Extra Bold", 1,
14)); // NOI18N
jToggleButton1.setText("Crawl");
jToggleButton1.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEvent evt) {
jToggleButton1ActionPerformed(evt);
}
});
javax.swing.GroupLayout jPanel1Layout = new javax.swing.GroupLayout(jPanel1);

jPanel1.setLayout(jPanel1Layout);
jPanel1Layout.setHorizontalGroup(
jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGroup(jPanel1Layout.createSequentialGroup()
.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LE
ADING)
.addContainerGap()
ADING)
.addComponent(jLabel3)
.addComponent(jLabel4))
.addGap(68, 68, 68)
ADING, false)
.addComponent(jTextField1,
javax.swing.GroupLayout.DEFAULT_SIZE, 392, Short.MAX_VALUE)
.addComponent(jTextField2)))
.addGap(288, 288, 288)
.addComponent(jToggleButton1)))
.addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE,
Short.MAX_VALUE))
);
jPanel1Layout.setVerticalGroup(
.addGap(20, 20, 20)
.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BA
SELINE)
javax.swing.GroupLayout.PREFERRED_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.PREFERRED_SIZE))
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED,
28, Short.MAX_VALUE)
SELINE)
.addGap(32, 32, 32)
.addComponent(jToggleButton1)
.addContainerGap())
);

jLabel6.setText("Crawling URL :");

jLabel7.setText("Time of Crawl :");

jLabel8.setText("Progress :");
jLabel9.setText("_");
jLabel10.setText("_");
jTextArea1.setColumns(20);
jTextArea1.setFont(new java.awt.Font("Times New Roman", 0, 12)); // NOI18N
jTextArea1.setRows(5);
jScrollPane1.setViewportView(jTextArea1);
.addGap(19, 19, 19)
.addGroup(jPanel2Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TR
AILING)
.addGap(18, 18, 18)
ADING)
.addComponent(jProgressBar1,
javax.swing.GroupLayout.PREFERRED_SIZE, 325,
javax.swing.GroupLayout.PREFERRED_SIZE)
Short.MAX_VALUE))
.addContainerGap()
.addComponent(jScrollPane1)
.addContainerGap())
);
.addContainerGap()
SELINE)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELATED)
SELINE)
ADING)
.addComponent(jProgressBar1,
.addGap(18, 18, 18)
.addComponent(jScrollPane1, javax.swing.GroupLayout.DEFAULT_SIZE,
254, Short.MAX_VALUE)
.addContainerGap())
);
jButton1.setFont(new java.awt.Font("Times New Roman", 1, 12)); // NOI18N

jButton1.setText("Next");
jButton1.addActionListener(new java.awt.event.ActionListener() {
jButton1ActionPerformed(evt);
}
});
jMenu1.setText("View");
jMenuItem1.setText("View Websites");
jMenuItem1.addActionListener(new java.awt.event.ActionListener() {
jMenuItem1ActionPerformed(evt);
}
});
jMenu1.add(jMenuItem1);
jMenuItem2.setText("Exit");
jMenuItem2.addActionListener(new java.awt.event.ActionListener() {
jMenuItem2ActionPerformed(evt);
}
});
jMenu1.add(jMenuItem2);
jMenuBar1.add(jMenu1);
jMenuBar1.add(jMenu2);
setJMenuBar(jMenuBar1);
javax.swing.GroupLayout layout = new

javax.swing.GroupLayout(getContentPane());
getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGroup(layout.createSequentialGroup()
.addContainerGap()
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jButton1)
.addGap(116, 116, 116)
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addGap(11, 11, 11)
.addComponent(jLabel1)))
.addGap(0, 173, Short.MAX_VALUE))
.addComponent(jPanel2, javax.swing.GroupLayout.Alignment.TRAILING,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)
.addComponent(jPanel1, javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))
.addContainerGap())
);
layout.setVerticalGroup(
.addGap(12, 12, 12)
.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE
)
.addComponent(jButton1))
.addGap(19, 19, 19)
.addComponent(jPanel1, javax.swing.GroupLayout.PREFERRED_SIZE,
.addContainerGap())
);
pack();
}// </editor-fold>
private void jMenuItem2ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:
new Smartcraw().setVisible(false);
this.setVisible(false);
}
private void jToggleButton1ActionPerformed(java.awt.event.ActionEvent evt) {

long tStart = System.currentTimeMillis();
String URL = jTextField1.getText();
site = URL;
int lim = Integer.parseInt(jTextField2.getText());
jProgressBar1.setBackground(Color.darkGray);
Scanner in = null;
// int limit=Integer.parseInt(jTextField2.getText());
String file = null;
if (jToggleButton1.isSelected()) {
jProgressBar1.setValue(0);
jProgressBar1.setMinimum(0);
jProgressBar1.setMaximum(lim);
jTextArea1.setText("");
jToggleButton1.setText("Stop");
jTextField1.disable();
jTextField2.disable();
try {
String urls[] = new String[lim];
// String url = "http://www.programcreek.com";
String url = jTextField1.getText();
int i = 0, j = 0, tmp = 0, total = 0, MAX = lim;
int start = 0, end = 0;
File dir = new File(".");
String loc = dir.getCanonicalPath() + File.separator + "allcrawl.txt";
File file1 = new File(loc);
BufferedWriter outFile = new BufferedWriter(new FileWriter(file1));
String webpage = getWeb(url);
end = webpage.indexOf("<body");
for (i = total; i < MAX; i++, total++) {
jProgressBar1.setValue(i);
start = webpage.indexOf("http://", end);
if (start == -1) {
start = 0;
end = 0;
try {
webpage = getWeb(urls[j++]);
System.out.println("******************");
System.out.println(urls[j - 1]);
System.out.println("Exception caught \n" + e);
}
/*logic to fetch urls out of body of webpage only */

end = webpage.indexOf("<body");
if (end == -1) {
end = start = 0;
continue;
}
}
end = webpage.indexOf("\"", start);
tmp = webpage.indexOf("'", start);
if (tmp < end && tmp != -1) {
end = tmp;
}
url = webpage.substring(start, end);
urls[i] = url;
jTextArea1.append(urls[i] + "\n");
System.out.println(urls[i]);
jLabel9.setText(urls[i]);
outFile.write(urls[i] + "\n");
}
System.out.println("Total URLS Fetched are " + total);
long tEnd = System.currentTimeMillis();

long tDelta = tEnd - tStart;
double elapsedSeconds = tDelta / 1000.0;
jLabel10.setText("Total time taken: " + elapsedSeconds);
}
} else {
jToggleButton1.setText("Search");
jTextField1.enable();
jTextField2.enable();
}
}
private void jMenuItem1ActionPerformed(java.awt.event.ActionEvent evt) {

new viewwebsite().setVisible(true);
}
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {

new Smartcraw().setVisible(false);
new Searchcrawl().setVisible(true);
}
public static String getWeb(String address) throws Exception {
String webpage = "";
String inputLine = "";
URL url = new URL(address);
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream()));
while ((inputLine = in.readLine()) != null) {
webpage += inputLine;
}
in.close();
return webpage;
}
public static void main(String args[]) {

/* Set the Nimbus look and feel */
//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional)
">
/* If Nimbus (introduced in Java SE 6) is not available, stay with the default look
and feel.
* For details see
http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html
*/
try {
for (javax.swing.UIManager.LookAndFeelInfo info :
javax.swing.UIManager.getInstalledLookAndFeels()) {
if ("Nimbus".equals(info.getName())) {
javax.swing.UIManager.setLookAndFeel(info.getClassName());
break;
}
}
} catch (ClassNotFoundException ex) {
java.util.logging.Logger.getLogger(Smartcraw.class.getName()).log(java.util.logging.Lev
el.SEVERE, null, ex);
} catch (InstantiationException ex) {
} catch (IllegalAccessException ex) {
} catch (javax.swing.UnsupportedLookAndFeelException ex) {
}
//</editor-fold>
/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {
public void run() {
new Smartcraw().setVisible(true);
}
});
}
// Variables declaration - do not modify

private javax.swing.JButton jButton1;
private javax.swing.JLabel jLabel1;
private javax.swing.JMenu jMenu1;
private javax.swing.JMenu jMenu2;
private javax.swing.JMenuBar jMenuBar1;
private javax.swing.JMenuItem jMenuItem1;
private javax.swing.JMenuItem jMenuItem2;
private javax.swing.JPanel jPanel1;
private javax.swing.JProgressBar jProgressBar1;
private javax.swing.JScrollPane jScrollPane1;
private static javax.swing.JTextArea jTextArea1;
private javax.swing.JTextField jTextField1;
private javax.swing.JTextField jTextField2;
private javax.swing.JToggleButton jToggleButton1;
// End of variables declaration
}
2. Predict Value
package phishing_detection;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import static phishing_detection.Weburl.id;
import static phishing_detection.Weburl.name;
public class Showvalue extends javax.swing.JFrame {

Connection con1 = null;
Statement st, stmt;
ResultSet rs;
public static int[] data = new int[32];
public Showvalue() {
initComponents();
try {
Class.forName("com.mysql.jdbc.Driver");
con1 = DriverManager.getConnection("jdbc:mysql://localhost:3306/phishing",
"root", "");
stmt = con1.createStatement();
st = con1.createStatement();
}
}
@SuppressWarnings("unchecked")
// <editor-fold defaultstate="collapsed" desc="Generated Code">
private void initComponents() {

jScrollPane1 = new javax.swing.JScrollPane();
jTextArea1 = new javax.swing.JTextArea();
setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);

jLabel1.setText("Phishing Detection System Using Machine Learning");

jLabel2.setText("and Hadoop-MapReduce");

jButton3.setText("Back");
}
});

.addGap(186, 186, 186)
.addContainerGap(208, Short.MAX_VALUE))
.addGap(87, 87, 87)
.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED,
);
ADING)
.addContainerGap()
Short.MAX_VALUE))
);

jButton1.setText("Produce Result Values");
}
});
jTextArea1.setColumns(20);
jTextArea1.setRows(5);
jScrollPane1.setViewportView(jTextArea1);

jLabel3.setText("Attributes and its Values");

jButton2.setText("Next");
}
});

.addContainerGap()
ADING)
.addGap(0, 0, Short.MAX_VALUE))
.addComponent(jScrollPane1))
.addContainerGap())
.addGap(247, 247, 247)
Short.MAX_VALUE))
);
.addGroup(javax.swing.GroupLayout.Alignment.TRAILING,
jPanel3Layout.createSequentialGroup()
.addContainerGap(16, Short.MAX_VALUE)
.addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE,
293, javax.swing.GroupLayout.PREFERRED_SIZE)
.addGap(20, 20, 20)
.addContainerGap())
);

.addContainerGap()
ADING)
.addComponent(jPanel3, javax.swing.GroupLayout.Alignment.TRAILING,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))
.addContainerGap())
.addGap(198, 198, 198)
.addComponent(jButton1, javax.swing.GroupLayout.PREFERRED_SIZE, 180,
Short.MAX_VALUE))
);
.addContainerGap()
.addComponent(jPanel2, javax.swing.GroupLayout.PREFERRED_SIZE,
.addGap(39, 39, 39)
.addGap(38, 38, 38)
.addContainerGap())
);
javax.swing.GroupLayout layout = new

javax.swing.GroupLayout(getContentPane());
getContentPane().setLayout(layout);
layout.setHorizontalGroup(
);
layout.setVerticalGroup(
);
pack();
}// </editor-fold>

int i = 0, j = 2;
System.out.println(id);
try {
rs = stmt.executeQuery("SELECT * FROM webdata WHERE webid ='" + id +
"'");
while (rs.next()) {
for (int k = 1; k < 31; k++) {
data[k] = rs.getInt(k + 1);
jTextArea1.append(name[k] + " ............ " + data[k] + "\n");
}
}
}
}

new Showvalue().setVisible(false);
new Weburl().setVisible(true);
}

new Showvalue().setVisible(false);
new Mapping().setVisible(true);
}
/**
* @param args the command line arguments
*/
public static void main(String args[]) {
/* Set the Nimbus look and feel */
//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional)
">
/* If Nimbus (introduced in Java SE 6) is not available, stay with the default look
and feel.
* For details see
http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html
*/
try {
for (javax.swing.UIManager.LookAndFeelInfo info :
javax.swing.UIManager.getInstalledLookAndFeels()) {
if ("Nimbus".equals(info.getName())) {
javax.swing.UIManager.setLookAndFeel(info.getClassName());
break;
}
}
} catch (ClassNotFoundException ex) {
java.util.logging.Logger.getLogger(Showvalue.class.getName()).log(java.util.logging.Le
vel.SEVERE, null, ex);
} catch (InstantiationException ex) {
} catch (IllegalAccessException ex) {
} catch (javax.swing.UnsupportedLookAndFeelException ex) {
}
//</editor-fold>
/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {
public void run() {
new Showvalue().setVisible(true);
}
});
}
// Variables declaration - do not modify

private javax.swing.JScrollPane jScrollPane1;
private javax.swing.JTextArea jTextArea1;
// End of variables declaration
}
Sample Screenshot:
TESTING OF PRODUCT
Testing of Product:
System testing is the stage of implementation, which aimed at ensuring that

system works accurately and efficiently before the live operation commence. Testing is the
process of executing a program with the intent of finding an error. A good test case is one that
has a high probability of finding an error. A successful test is one that answers a yet
undiscovered error.
Testing is vital to the success of the system. System testing makes a logical
assumption that if all parts of the system are correct, the goal will be successfully achieved. The
candidate system is subject to variety of tests-on-line response, Volume Street, recovery and
security and usability test. A series of tests are performed before the system is ready for the user
acceptance testing. Any engineered product can be tested in one of the following ways.
Knowing the specified function that a product has been designed to from, test can be conducted
to demonstrate each function is fully operational. Knowing the internal working of a product,
tests can be conducted to ensure that “al gears mesh”, that is the internal operation of the product
performs according to the specification and all internal components have been adequately
exercised.
UNIT TESTING:
Unit testing is the testing of each module and the integration of the overall system is
done. Unit testing becomes verification efforts on the smallest unit of software design in the
module. This is also known as ‘module testing’. The modules of the system are tested
separately. This testing is carried out during the programming itself. In this testing step, each
model is found to be working satisfactorily as regard to the expected output from the module.
There are some validation checks for the fields. For example, the validation check is done for
verifying the data given by the user where both format and validity of the data entered is
included. It is very easy to find error and debug the system.
INTEGRATION TESTING:
Data can be lost across an interface, one module can have an adverse effect on the
other sub function, when combined, may not produce the desired major function. Integrated
testing is systematic testing that can be done with sample data. The need for the integrated test is
to find the overall system performance. There are two types of integration testing. They are:
i) Top-down integration testing.

ii) Bottom-up integration testing.
WHITE BOX TESTING:
White Box testing is a test case design method that uses the control structure of the
procedural design to drive cases. Using the white box testing methods, we derived test cases that
guarantee that all independent paths within a module have been exercised at least once.
BLACK BOX TESTING:
 Black box testing is done to find incorrect or missing function

 Interface error
 Errors in external database access
 Performance errors
 Initialization and termination errors
In ‘functional testing’, is performed to validate an application conforms to its specifications of

correctly performs all its required functions. So this testing is also called ‘black box testing’. It
tests the external behavior of the system. Here the engineered product can be tested knowing the
specified function that a product has been designed to perform, tests can be conducted to
demonstrate that each function is fully operational.
VALIDATION TESTING:
After the culmination of black box testing, software is completed assembly as a

package, interfacing errors have been uncovered and corrected and final series of software
validation tests begin validation testing can be defined as many, but a single definition is that
validation succeeds when the software functions in a manner that can be reasonably expected by
the customer.
USER ACCEPTANCE TESTING:
User acceptance of the system is the key factor for the success of the system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
prospective system at the time of developing changes whenever required.
OUTPUT TESTING:
After performing the validation testing, the next step is output asking the user about the
format required testing of the proposed system, since no system could be useful if it does not
produce the required output in the specific format. The output displayed or generated by the
system under consideration. Here the output format is considered in two ways. One is screen
and the other is printed format. The output format on the screen is found to be correct as the
format was designed in the system phase according to the user needs. For the hard copy also
output comes out as the specified requirements by the user. Hence the output testing does not
result in any connection in the system.
System Implementation:
Implementation of software refers to the final installation of the package in

its real environment, to the satisfaction of the intended users and the operation of the system. The
people are not sure that the software is meant to make their job easier.
 The active user must be aware of the benefits of using the system
 Their confidence in the software built up
 Proper guidance is impaired to the user so that he is comfortable in using the
application
Before going ahead and viewing the system, the user must know that for viewing the
result, the server program should be running in the server. If the server object is not running on
the server, the actual processes will not take place.
User Training:
To achieve the objectives and benefits expected from the proposed system it is essential
for the people who will be involved to be confident of their role in the new system. As system
becomes more complex, the need for education and training is more and more important.
Education is complementary to training. It brings life to formal training by

explaining the background to the resources for them. Education involves creating the right
atmosphere and motivating user staff. Education information can make training more interesting
and more understandable.
Training on the Application Software:
After providing the necessary basic training on the computer awareness, the users
will have to be trained on the new application software. This will give the underlying philosophy
of the use of the new system such as the screen flow, screen design, type of help on the screen,
type of errors while entering the data, the corresponding validation check at each entry and the
ways to correct the data entered. This training may be different across different user groups and
across different levels of hierarchy.
Operational Documentation:
Once the implementation plan is decided, it is essential that the user of the system is
made familiar and comfortable with the environment. A documentation providing the whole
operations of the system is being developed. Useful tips and guidance is given inside the
application itself to the user. The system is developed user friendly so that the user can work the
system from the tips given in the application itself.
System Maintenance:
The maintenance phase of the software cycle is the time in which software performs
useful work. After a system is successfully implemented, it should be maintained in a proper
manner. System maintenance is an important aspect in the software development life cycle. The
need for system maintenance is to make adaptable to the changes in the system environment.
There may be social, technical and other environmental changes, which affect a system which is
being implemented. Software product enhancements may involve providing new functional
capabilities, improving user displays and mode of interaction, upgrading the performance
characteristics of the system. So only thru proper system maintenance procedures, the system can
be adapted to cope up with these changes. Software maintenance is of course, far more than
“finding mistakes”.
Corrective Maintenance:
The first maintenance activity occurs because it is unreasonable to assume that

software testing will uncover all latent errors in a large software system. During the use of any
large program, errors will occur and be reported to the developer. The process that
includes the diagnosis and correction of one or more errors is called Corrective Maintenance.
Adaptive Maintenance:
The second activity that contributes to a definition of maintenance occurs because

of the rapid change that is encountered in every aspect of computing. Therefore Adaptive
maintenance termed as an activity that modifies software to properly interfere with a changing
environment is both necessary and commonplace.
Perceptive Maintenance:
The third activity that may be applied to a definition of maintenance occurs when
a software package is successful. As the software is used, recommendations for new capabilities,
modifications to existing functions, and general enhancement are received from users. To satisfy
requests in this category, Perceptive maintenance is performed. This activity accounts for the
majority of all efforts expended on software maintenance.
Preventive Maintenance:
The fourth maintenance activity occurs when software is changed to improve

future maintainability or reliability, or to provide a better basis for future enhancements. Often
called preventive maintenance, this activity is characterized by reverse engineering and re-
engineering techniques.
Conclusion:
Main goal of the system is to achieve speed up in existing anti-phishing system
by some means. Using Hadoop-MapReduce in integration with anti-phishing technique we
have achieved considerable time speedup. Even if the phishing webpage is not showing
phishing characteristics very clearly at first layer it might show characteristics in the next
layer so that no phishing webpage will pass through our system. This is the advantage of having
layered architecture of attributes. Hadoop-MapReduce will increase the response time of
the system considerably. This system is very effective in securing network from phishing
attach even at its best.
Future Enhancement:
There is a lot of scope for improvement of this system. One can improve the
performance of system by converting this as a cloud service. As per type of organization
we are protecting from phishing attach change the attributes to be considered for making
effective decision about the phishiness of the system.

Detection of Phishing Attack

Uploaded by

Copyright:

Available Formats

Detection of Phishing Attack

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Detection of Phishing Attack

Uploaded by

Copyright:

Available Formats

Detection of phishing attacks

Elementary schools have embraced computers as an effective means of engaging

In support of this realization we've seen explosive growth of information technology in

Computer networks present a new set of challenges to administrators and technical

Author: Fu, A.Y.; Liu Wenyin; Xiaotie Deng

Title 3: MapReduce: simplified data processing on large clusters

Author: J. Dean and S. Ghemawat

Our implementation of MapReduce runs on a large cluster of commodity machines and is

 The proposed e-banking phishing Website model showed the significance

 As it fulfilled all the requirements, we didn’t modify it, instead concentrated on

• O/S : Windows XP.

• IDE : Net Beans 6.9.1

• Data Base : My Sql

• System : Pentium IV 2.4 GHz

• Hard Disk : 160 GB

• Monitor : 15 VGA color

• Keyboard : 110 keys enhanced

• Preprocessing the dataset for adopt dataset.

Input Web Url

Get Source of URL

Get Mapped attributes

Input Web Collection

Input Web url

Mining the data

Predict the Rule

Trusted Doubted Phishing

public class Weburl extends javax.swing.JFrame {

jLabel1.setFont(new java.awt.Font("Times New Roman", 1, 18)); // NOI18N

jLabel2.setFont(new java.awt.Font("Times New Roman", 1, 18)); // NOI18N

jLabel3.setFont(new java.awt.Font("Times New Roman", 1, 14)); // NOI18N

jLabel4.setFont(new java.awt.Font("Times New Roman", 1, 14)); // NOI18N

javax.swing.GroupLayout jPanel1Layout = new javax.swing.GroupLayout(jPanel1);

jLabel6.setFont(new java.awt.Font("Times New Roman", 1, 12)); // NOI18N

jLabel7.setFont(new java.awt.Font("Times New Roman", 1, 12)); // NOI18N

jLabel8.setFont(new java.awt.Font("Times New Roman", 1, 12)); // NOI18N

jButton1.setFont(new java.awt.Font("Times New Roman", 1, 12)); // NOI18N

javax.swing.GroupLayout layout = new

private void jMenuItem2ActionPerformed(java.awt.event.ActionEvent evt) {

private void jToggleButton1ActionPerformed(java.awt.event.ActionEvent evt) {

/*logic to fetch urls out of body of webpage only */

long tEnd = System.currentTimeMillis();

private void jMenuItem1ActionPerformed(java.awt.event.ActionEvent evt) {

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {

public static void main(String args[]) {

/* Create and display the form */

// Variables declaration - do not modify

public class Showvalue extends javax.swing.JFrame {

jPanel1 = new javax.swing.JPanel();

jLabel1.setFont(new java.awt.Font("Times New Roman", 1, 18)); // NOI18N

jLabel2.setFont(new java.awt.Font("Times New Roman", 1, 18)); // NOI18N

jButton3.setFont(new java.awt.Font("Times New Roman", 1, 12)); // NOI18N

javax.swing.GroupLayout jPanel2Layout = new javax.swing.GroupLayout(jPanel2);

jButton1.setFont(new java.awt.Font("Times New Roman", 1, 14)); // NOI18N

jLabel3.setFont(new java.awt.Font("Times New Roman", 1, 14)); // NOI18N

jButton2.setFont(new java.awt.Font("Times New Roman", 1, 14)); // NOI18N

javax.swing.GroupLayout jPanel3Layout = new javax.swing.GroupLayout(jPanel3);

javax.swing.GroupLayout jPanel1Layout = new javax.swing.GroupLayout(jPanel1);

javax.swing.GroupLayout layout = new

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {

/logic to fetch urls out of body of webpage only /