Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Final Document

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 102

PROJECT REPORT

On
Short Text Summarization on Comment Streams from Social
Network Services
Submitted in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology
in
Department of Computer Science and Engineering.

By

B.SUNI PRIYA 12241 A0568


SUSHMITA THAPA 12241 A0556
A.AISHWARYA VALLI 12241 A0562
P.HARSHITHA 12H61A05K7

Under the Supervision of


Dr. P.VIJAYAPAL REDDY
Professor

Department of Computer Science and Engineering


GOKARAJU RANGARAJU
INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Autonomous under JNTUH, Hyderabad)
Bachupally, Kukatpally Hyderabad- 500090
2015-2016
PROJECT REPORT
On
Short Text Summarization on Comment Streams from Social
Network Services
Submitted in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology
in
Department of Computer Science and Engineering.

By

B.SUNI PRIYA 12241 A0568


SUSHMITA THAPA 12241 A0556
A.AISHWARYA VALLI 12241 A0562
P.HARSHITHA 12H61A05K7

Under the Supervision of


Dr. P.VIJAYAPAL REDDY
Professor

Department of Computer Science and Engineering


GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING
AND TECHNOLOGY
(Autonomous under JNTUH, Hyderabad)
Bachupally, Kukatpally Hyderabad- 500090
2015-2016
Department of Computer Science and Engineering
GOKARAJU RANGARAJU
INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Autonomous under JNTUH, Hyderabad)
Bachupally, Kukatpally Hyderabad- 500090

CERTIFICATE
This is to certify that the project entitled “Short Text Summarization on Comment Streams
from Social Network Services” is submitted by B.Sunipriya(12241A0568), Sushmita Thapa
(12241A0556), A.Aishwarya Valli(12241A0562) and P.Harshitha(12H61A05K7) in partial
fulfillment of the requirement for the award of the degree in BACHELOR OF
TECHNOLOGY in Computer Science and Engineering during academic year 2015-2016.

Supervisor Head of the Department


Dr.P.Vijayapal Reddy Dr. K.Anuradha
Professor Professor & HOD

External Examiner
DECLARATION
We hereby declare that the project entitled “Short Text Summarization on Comment Streams
from Social Network Services” is the work done during the period from 28-12-2015 to 21-04-
2016 and is submitted in the partial fulfillment of the requirements for the award of degree of
Bachelor of technology in Computer Science and Engineering from Gokaraju Rangaraju Institute
of Engineering and Technology (Autonomous under Jawaharlal Nehru Technology University,
Hyderabad). The results embodied in this project have not been submitted to any other university
or Institution for the award of any degree or diploma.

B.Suni Priya (12241A0568)

Sushmita Thapa (12241A0556)

A.Aishwarya Valli (12241A0562)

P.Harshitha (12H61A05K7)
ACKNOWLEDGEMENT

There are many people who helped me directly and indirectly to complete my project
successfully. I would like to take this opportunity to thank one and all.

First of all I would like to express my deep gratitude towards my Supervisor Dr.
P.Vijayapal Reddy, Professor, Department of CSE for his support in the completion of my
dissertation. I wish to express my sincere thanks to Dr. K. Anuradha , HOD, Dept of CSE and
also to our principal Dr. Jandhyala N Murthy for providing the facilities to complete the
dissertation.

I would like to thank all our faculty and friends for their help and constructive criticism
during the project period. Finally I am very much indebted to our parents for their moral support
and encouragement to achieve goals.

B.Suni Priya (12241A0568)

Sushmita Thapa (12241A0556)

A.Aishwarya Valli (12241A0562)

P.Harshitha (12H61A05K7)
Short Text Summarization on Comment Streams from Social
Network Services
Abstract:

This paper focuses on the problem of short text summarization on the comment stream of
a specific message from social network services (SNS). Due to the high popularity of SNS, the
quantity of comments may increase at a high rate right after a social message is published.
Motivated by the fact that users may desire to get a brief understanding of a comment stream
without reading the whole comment list, we attempt to group comments with similar content
together and generate a concise opinion summary for this message. Since distinct users will
request the summary at any moment, existing clustering methods cannot be directly applied and
cannot meet the real-time need of this application. In this paper, we model a novel incremental
clustering problem for comment stream summarization on SNS. Moreover, we propose IncreSTS
algorithm that can incrementally update clustering results with latest incoming comments in real
time. Furthermore, we design an at-a-glance visualization interface to help users easily and
rapidly get an overview summary. From extensive experimental results and a real case
demonstration, we verify that IncreSTS possesses the advantages of high efficiency, high
scalability, and better handling outliers, which justifies the practicability of IncreSTS on the
target problem.

Keywords: Real-time short text summarization, incremental clustering, comments streams,


social network services.
LIST OF CONTENTS

Page No

List of Figures viii


List of Tables ix
1. Introduction

1.1 Purpose

1.2 Scope

1.3 Motivation

1.3.1 Definitions

1.3.2 Abbreviations

1.3.3 Model Diagrams

1.4 Overview

2. Literature Survey

2.1 Introduction

2.2 History

2.3 Purpose

2.4 Requirements

2.5 Technology Used

2.6 Research Methodologies

3. Fundamental Concepts on (Domain)


3.1 Domain Fundamentals & Description

3.1 Existing concepts of fundamentals

3.2 Existing System Algorithms

3.3 Proposed System Fundamentals concepts

3.4 Proposed Algorithms

3.5 Performance analysis in between of existing system and proposed system

4. System Analysis

4.1 Existing System

4.1.1 Drawbacks

4.2 Problem statement

4.3 Proposed System

4.3.1 Advantages

4.4 Modules Description

4.5 Feasibility Study

4.5.1 Economic Feasibility

4.5.2 Operational Feasibility

4.5.3 Technical Feasibility

5. System Requirements Specification

5.1 Introduction

5.2 Purpose

5.3 Functional Requirements

5.4 Non Functional Requirements

5.5 Hardware Requirements

5.6 Software Requirements

6. System Design
6.1 System Specifications

6.2 System Components

6.3 DFD’s

6.4 UML Diagrams

6.5 Data Dictionaries and ER Diagram

7. Implementation

7.1 Technology Description

8. System Testing

8.1 Testing Methodologies

8.2 Test cases

8.3 Result Analysis

9. Conclusion and Future Enhancements

9.1 Conclusion

9.2 Scope for Future Enhancements

10. References

Appendix:

1. Sample code
2. screenshots
Introduction:

1.1 Purpose:
1. We may still desire to know what are they talking about and what are the opinions of
these discussion participants.
2. Moreover, celebrities and corporations will have high interest to understand how their
fans and customers reacting to certain topics and content.
1.2 Scope

In database research, solutions have been proposed, which given a keyword query, retrieve the
most relevant structured results , or simply, select the single most relevant databases . However,
these approaches are single-source solutions. They are not directly applicable to the web of
Linked Data, where results are not bounded by a single source but might encompass several
Linked Data sources. The goal is to produce routing plans, which can be used to compute results
from multiple sources.

1.3 Motivation
2 We are inspired to develop an advanced summarization technique targeting at comment
streams in SNS.
2. Literature Survey
Larger and larger amounts of data are collected and stored in databases increasing the
need for efficient and effective analysis methods to make use of the information contained
implicitly in the data. One of the primary data analysis tasks is cluster analysis which is intended
to help a user to understand the natural grouping or structure in a data set. Therefore, the
development of improved clustering algorithms has received a lot of attention in the last few
years. Roughly speaking, the goal of a clustering algorithm is to group the objects of a database
into a set of meaningful subclasses.
3. Fundamental Concepts on (Domain)

3.1 Domain Fundamentals & Description


Data mining, also called knowledge discovery in data bases , in computer sciences,
the process of discovering interesting and useful patterns and relationships in large volumes of
data. The field combines tools from statistics and artificial intelligence such as neural networks
and machine learning with database management to analyze large digital collections, known as
data sets. Data mining is widely used in business( insurance, banking , retail) , science research
(astronomy , medicine) , and government security(detection of criminals and terrorists).

3.1 Data Mining Overview


Data mining is emerging as one of the key features of many homeland security initiatives.
Often used as a means for detecting fraud, assessing risk, and product retailing, data mining
involves the use of data analysis tools to discover previously unknown, valid patterns and
relationships in large data sets. In the context of homeland security, data mining is often
viewed as a potential means to identify terrorist activities, such as money transfers and
communications, and to identify and track individual terrorists themselves, such as through
travel and immigration records.

While data mining represents a significant advance in the type of analytical tools currently
available, there are limitations to its capability. One limitation is that although data mining
can help reveal patterns and relationships, it does not tell the user the value or significance of
these patterns. These types of determinations must be made by the user. A second limitation
is that while data mining can identify connections between behaviours and/or variables, it does
not necessarily identify a causal relationship. To be successful, data mining still requires skilled
technical and analytical specialists who can structure the analysis and interpret the output that is
created.
Data mining is becoming increasingly common in both the private and public sectors. Industries
such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs,
enhance research, and increase sales. In the public sector, data mining applications initially
were used as a means to detect fraud and waste, but have grown to also be used for purposes
such as measuring and improving program performance. However, some of the homeland
security data mining applications represent a significant expansion in the quantity and scope of
data to be analyzed. Two efforts that have attracted a higher level of congressional interest
include the Terrorism Information Awareness (TIA) project (now-discontinued) and the
Computer-Assisted Passenger Pre-screening System II (CAPPS II) project (now- cancelled and
replaced by Secure Flight).

3.1.1 Data mining Applications:

Data mining is a process that analyzes the large amount of data to find the new and hidden
information that improves business efficiency. Various industries have been adopt data mining to
their mission-critical business processes to gain competitive advantages and help business grows.
This tutorial illustrates some data mining applications in sale/marketing, banking/finance, health
care and insurance, transportation and medicine.

Data Mining Applications in Sales/Marketing

Data mining enables the businesses to understand the patterns hidden inside past purchase
transactions, thus helping in plan and launch new marketing campaigns in prompt and cost
effective way. The following illustrates several data mining applications in sale and marketing.

 Data mining is used for market basket analysis to provides insight information on what
product combinations were purchased, when they were bought and in what sequence by
customers. This information helps businesses to promote their most profitable products to
maximize the profit. In addition, it encourages customers to purchase related products
that they may have been missed or overlooked.
 Retails companies uses data mining to identify customer’s behavior buying patterns.
Data Mining Applications in Banking / Finance

 Several data mining techniques such as distributed data mining has been researched,
modeled and developed to help credit card fraud detection.
 Data mining is used to identify customers loyalty by analyzing the data of customer’s
purchasing activities such as the data of frequency of purchase in a period of time, total
monetary value of all purchases and when was the last purchase. After analyzing those
dimensions, the relative measure is generated for each customer. The higher of the score,
the more relative loyal the customer is.

 To help bank to retain credit card customers, data mining is used. By analyzing the past
data, data mining can help banks to predict customers that likely to change their credit
card affiliation so they can plan and launch different special offers to retain those
customers.

 Credit card spending by customer groups can be identified by using data mining.

 The hidden correlations between different financial indicators can be discovered by


using data mining.

 From historical market data, data mining enable to identify stock trading rules.

Data Mining Applications in Health Care and Insurance

The growth of the insurance industry is entirely depends on the ability of converting data into the
knowledge, information or intelligence about customers, competitors and its markets. Data
mining is applied in insurance industry lately but brought tremendous competitive advantages to
the companies who have implemented it successfully. The data mining applications in insurance
industry are listed below:

 Data mining is applied in claims analysis such as identifying which medical procedures
are claimed together.
 Data mining enables to forecasts which customers will potentially purchase new policies.

 Data mining allows insurance companies to detect risky customers’ behavior patterns.
 Data mining helps detect fraudulent behavior.

Data Mining Applications in Transportation

 Data mining helps to determine the distribution schedules among warehouses and outlets
and analyze loading patterns.

Data Mining Applications in Medicine

 Data mining enables to characterize patient activities to see coming office visits.
 Data mining help identify the patterns of successful medical therapies for different
illnesses.

Data mining applications are continuously developing in various industries to provide more
hidden knowledge that enable to increase business efficiency and grow businesses.

Advantages of Data Mining


Marking/Retailing
Data mining can aid direct marketers by providing them with useful and accurate trends about
their customers’ purchasing behavior. Based on these trends, marketers can direct their marketing
attentions to their customers with more precision. For example, marketers of a software company
may advertise about their new software to consumers who have a lot of software purchasing
history. In addition, data mining may also help marketers in predicting which products their
customers may be interested in buying. Through this prediction, marketers can surprise their
customers and make the customer’s shopping experience becomes a pleasant one.
Retail stores can also benefit from data mining in similar ways. For example, through the trends
provide by data mining, the store managers can arrange shelves, stock certain items, or provide a
certain discount that will attract their customers.
Banking/Crediting
Data mining can assist financial institutions in areas such as credit reporting and loan
information. For example, by examining previous customers with similar attributes, a bank can
estimated the level of risk associated with each given loan. In addition, data mining can also
assist credit card issuers in detecting potentially fraudulent credit card transaction. Although the
data mining technique is not a 100% accurate in its prediction about fraudulent charges, it does
help the credit card issuers reduce their losses.
Law enforcement
Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these
criminals by examining trends in location, crime type, habit, and other patterns of behaviors.
Researchers
Data mining can assist researchers by speeding up their data analyzing process; thus, allowing
those more time to work on other projects.

3.3 Proposed System Fundamentals concepts

FILTERING RULES AND BLACKLIST MANAGEMENT


In this section, we introduce the rule layer adopted for filtering unwanted messages. We start by
describing FRs, then we illustrate the use of BLs.
In what follows, we model a social network as a directed graph, where each node corresponds to
a network user and edges denote relationships between two different users. In particular, each
edge is labeled by the type of the established relationship (e.g., friend of, colleague of, parent of)
and, possibly, the corresponding trust level, which represents how much a given user considers
trustworthy with respect to that specific kind of relationship the user with whom he/ she is
establishing the relationship. Without loss of generality, we suppose that trust levels are rational
numbers in the range =0;
1_. Therefore, there exists a direct relationship of a given type RT and trust value X between two
users, if there is an edge connecting them having the labels RT and X. Moreover, two users are in
an indirect relationship of a given type RT if there is a path of more than one edge connecting
them, such that all the edges in the path have label RT. In this paper, we do not address the
problem of trust computation for indirect relationships, since many algorithms have been
proposed in the literature that can be used in our scenario as well. Such algorithms mainly differ
on the criteria to select the paths on which trust computation should be based, when many paths
of the same type exist between two users (see [45] for a survey).
5.1 Filtering Rules
In defining the language for FRs specification, we consider three main issues that, in our opinion,
should affect a message filtering decision. First of all, in OSNs like in everyday life, the same
message may have different meanings and relevance based on who writes it. As a consequence,
FRs should allow users to state constraints on message creators. Creators on which a FR applies
can be selected on the basis of several different criteria, one of the most relevant is by imposing
conditions on their profile’s attributes. In such a way it is, for instance, possible to define rules
applying only to young creators or to creators with a given religious/political view. Given the
social network scenario, creators may also be identified by exploiting information on their social
graph. This implies to state conditions on type, depth, and trust values of the relationship(s)
creators should be involved in order to apply them the specified rules.
4. SYSTEM ANALYSIS
The Systems Development Life Cycle (SDLC), or Software Development Life
Cycle in systems engineering, information systems and software engineering, is the process of
creating or altering systems, and the models and methodologies that people use to develop these
systems.

In software engineering the SDLC concept underpins many kinds of software development
methodologies. These methodologies form the framework for planning and controlling the
creation of an information system the software development process.

SOFTWARE MODEL OR ARCHITECTURE ANALYSIS:

Structured project management techniques (such as an SDLC) enhance


management’s control over projects by dividing complex tasks into manageable sections. A
software life cycle model is either a descriptive or prescriptive characterization of how software
is or should be developed. But none of the SDLC models discuss the key issues like Change
management, Incident management and Release management processes within the SDLC
process, but, it is addressed in the overall project management. In the proposed hypothetical
model, the concept of user-developer interaction in the conventional SDLC model has been
converted into a three dimensional model which comprises of the user, owner and the developer.
In the proposed hypothetical model, the concept of user-developer interaction in the conventional
SDLC model has been converted into a three dimensional model which comprises of the user,
owner and the developer. The ―one size fits all‖ approach to applying SDLC methodologies is
no longer appropriate. We have made an attempt to address the above mentioned defects by using
a new hypothetical model for SDLC described elsewhere. The drawback of addressing these
management processes under the overall project management is missing of key technical issues
pertaining to software development process that is, these issues are talked in the project
management at the surface level but not at the ground level.
WHAT IS SDLC?
A software cycle deals with various parts and phases from planning to testing
and deploying software. All these activities are carried out in different ways, as per the needs.
Each way is known as a Software Development Lifecycle Model (SDLC). A software life cycle
model is either a descriptive or prescriptive characterization of how software is or should be
developed. A descriptive model describes the history of how a particular software system was
developed. Descriptive models may be used as the basis for understanding and improving
software development processes or for building empirically grounded prescriptive models.
SDLC models * The Linear model (Waterfall) - Separate and distinct phases of specification
and development. - All activities in linear fashion. - Next phase starts only when first one is
complete. * Evolutionary development - Specification and development are interleaved (Spiral,
incremental, prototype based, Rapid Application development). - Incremental Model (Waterfall
in iteration), - RAD(Rapid Application Development) - Focus is on developing quality product in
less time, - Spiral Model - We start from smaller module and keeps on building it like a spiral. It
is also called Component based development. * Formal systems development - A mathematical
system model is formally transformed to an implementation. * Agile Methods. - Inducing
flexibility into development. * Reuse-based development - The system is assembled from
existing components.
The General Model
Software life cycle models describe phases of the software cycle and the order in which those
phases are executed. There are tons of models, and many companies adopt their own, but all have
very similar patterns. Each phase produces deliverables required by the next phase in the life
cycle. Requirements are translated into design. Code is produced during implementation that is
driven by the design. Testing verifies the deliverable of the implementation phase against
requirements.
SDLC Methodology:

Spiral Model

The spiral model is similar to the incremental model, with more emphases placed on risk
analysis. The spiral model has four phases: Planning, Risk Analysis, Engineering and
Evaluation. A\ software project repeatedly passes through these phases in iterations (called
Spirals in this model). The baseline spiral, starting in the planning phase, requirements is
gathered and risk is assessed. Each subsequent spirals builds on the baseline spiral.
Requirements are gathered during the planning phase. In the risk analysis phase, a process is
undertaken to identify risk and alternate solutions. A prototype is produced at the end of the
risk analysis phase. Software is produced in the engineering phase, along with testing at
the end of the phase. The evaluation phase allows the customer to evaluate the output of the
project to date before the project continues to the next spiral. In the spiral model, the angular
component represents progress, and the radius of the spiral represents cost. Spiral Life Cycle
Model.

This document play a vital role in the development of life cycle (SDLC) as it describes the
complete requirement of the system. It means for use by developers and will be the basic during
testing phase. Any changes made to the requirements in the future will have to go through
formal change approval process.

SPIRAL MODEL was defined by Barry Boehm in his 1988 article, “A spiral Model of
Software Development and Enhancement. This model was not the first model to discuss iterative
development, but it was the first model to explain why the iteration models.

As originally envisioned, the iterations were typically 6 months to 2 years long. Each phase
starts with a design goal and ends with a client reviewing the progress thus far. Analysis and
engineering efforts are applied at each phase of the project, with an eye toward the end goal of
the project.

The steps for Spiral Model can be generalized as follows:

 The new system requirements are defined in as much details as possible. This usually
involves interviewing a number of users representing all the external or internal users
and other aspects of the existing system.

 A preliminary design is created for the new system.

 A first prototype of the new system is constructed from the preliminary design. This
is usually a scaled-down system, and represents an approximation of the
characteristics of the final product.
 A second prototype is evolved by a fourfold procedure:

1. Evaluating the first prototype in terms of its strengths, weakness, and risks.

2. Defining the requirements of the second prototype.

3. Planning an designing the second prototype.

4. Constructing and testing the second prototype.

 At the customer option, the entire project can be aborted if the risk is deemed too
great. Risk factors might involved development cost overruns, operating-cost
miscalculation, or any other factor that could, in the customer’s judgment, result in a
less-than-satisfactory final product.

 The existing prototype is evaluated in the same manner as was the previous prototype,
and if necessary, another prototype is developed from it according to the fourfold
procedure outlined above.

 The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.

 The final system is constructed, based on the refined prototype.

 The final system is thoroughly evaluated and tested. Routine maintenance is carried
on a continuing basis to prevent large scale failures and to minimize down time.
Fig -Spiral Model

Advantages

 High amount of risk analysis


 Good for large and mission-critical projects.

 Software is produced early in the software life cycle.


Existing System:
Numerous studies and systems have proposed techniques and mechanisms to generate various
types of summaries on comment streams. One major category aims to extract representative and
significant comments from messy discussion. Like YouTube and Face book, these popular
services allow users to determine whether a comment is useful or recommendable, and the
comments with the top-k most endorsements are displayed on the top of the list. This category
relies on user contributions and intends to leverage the wisdom of crowds. On the other hand,
some researchers model this problem as recommendation or classification tasks and employ
machine learning techniques to solve it. Moreover, sentiment analysis has been applied as well to
discover hidden emotions in messages. Furthermore, providing an informative presentation
interface is another active research field on the summarization of social messages. As will be
thoroughly surveyed in Section 2.2, despite some effort has been spent on solving this
information overload problem, a generalized approach for summarizing rapid-increasing
comment streams in SNS, based on text content, is yet to be fully explored.

Disadvantages:
1. Information overloaded problem.
2. More amounts of comments are displayed as a summarized content.
3. Similar comments are not removed.
4. Unstructured texts provided by the previous approaches.

Proposed System:

We explore the problem of incremental short text summarization on comment streams


from social network services. We model this problem as an incremental clustering task and
propose the IncreSTS (standing for Incremental Short Text Summarization) algorithm to discover
the top-k clusters including different groups of opinions towards one social message. For each
comment cluster, important and common terms will be extracted to construct a key-term cloud.
This key-term cloud provides an at-a glance presentation that users can easily and rapidly
understand the main points of similar comments in a cluster. Moreover, representative comments
in each group will also be identified. Our objective is to generate an informative, concise, and
impressive interface that can help users get an overview understanding without reading all
comments.

Advantages:
1. Provide effective short text summarization result.
2. High efficiency clustering results
3. Remove similar comments information.
4. Provide informative and impressive summarization results
ARCHITECTURE:

4.4 Modules Description:


Modules Description:
1. SYSTEM MODEL
2. TERM VECTOR MODEL PREPARATION
3. CONSINE SIMILARITY FUNCTION IMPLEMENTATION
4. INCREMENTAL SHORT TEXT SUMMARIZATION

SYSTEM MODEL :
We present the system model for comment stream summarization on SNS. We focus on the
comment stream added for one message on SNS and aim to generate the immediate summary of
comments. The problem we tackle is described as follows.
Once a message is posted on SNS, users can leave comments immediately and the number of
comments may rise quickly and continuously. Moreover, readers are usually unwilling to go over
the whole list of comments, but they may request to see the summary at any moment. This
indicates that the proposed approach should be able to generate the summary result at any time
point of a dynamic data stream. To satisfy this requirement, we model this problem as an
incremental clustering task.

TERM VECTOR MODEL PREPARATION:

Initially, for each word, the process of punctuation removal will be applied to eliminate
unnecessary punctuation marks connected with this word. In this example, the two exclamation
marks at the rear of “GaGa!!” will be removed. Moreover, we develop the heuristic process of
redundant character removal, designed for restoring words on SNS. It can be observed that
casual language style is commonly used on SNS. In particular, users often emphasize the
emotion by repeating characters in a word. This phenomenon certainly causes the problem of not
being able to correctly identify the original words. To cope with this problem, we examine each
word to find out whether there is any character consecutively appearing more than three times. If
this situation is detected, appended characters will be regarded as redundant, and only one
character will be retained. Thus, the word “soooooo” will be transformed into “so”. Meanwhile,
all upper-case letters will also be changed to lower-case letters. The following step is the
stemming process. We employ the standard Porter stemming algorithm [47] to reduce inflected
and derived words to their stem form (e.g., “loving” is turned into “love” in Fig. 3).
Subsequently, the process of n-gram terms extraction is carried out to extract terms that are used
for representing this comment. In the example of Fig. 3, n is set to 3, meaning that the comment
string will be scanned from left to right to draw out all 1- gram, 2-gram, and 3-gram terms.
Finally, stop words removal process is executed to delete terms entirely composed of stop words.
Note that as long as there is at least one non-stop word appearing in a term, this term will be
viewed as a valid one. For instance, although the word “you” is a stop word, the term “love you”
will not be deleted since “love” is a valid word. On the other hand, one term may appear more
than once in a comment. However, in general, since the length of a comment is short, a repeated
term does not always indicate more important influence. Therefore, the weights of terms are
defined to be equal.

CONSINE SIMILARITY FUNCTION IMPLEMENTATION:


The effect of function f(vi; Ce) is analogous to the inner product of the comment vi and the
center vce of cluster Ce. The difference is the product value in each dimension is. This design is
to avoid the bias caused by certain terms with large counts in the cluster. It is favorable to group
comments with more mutual terms. By limiting the radius of each cluster, we can ensure that the
comments in the same cluster express similar opinions. On the other hand, since our main
objective is to provide a concise and at-a-glance presentation rather than a long list, the number k
(set to 3 in the experiments) should be small for practical applications on SNS. Moreover, we can
imagine that there will be numerous outliers, which may greatly affect the clustering results, in
comment stream data on SNS. Therefore, it is not appropriate to adopt the optimization criterion
that minimizes the within-cluster sum of squares, indicating that existing partition-based
clustering methods cannot be applicable to achieve the goal. Furthermore, to enable the real-time
processing, which cannot be realized by most existing clustering methods?.

INCREMENTAL SHORT TEXT SUMMARIZATION

We aim to develop efficient approaches in discovering top-k groups of opinions towards a


specific message on SNS. Here three phases are available.
1. Batch STS algorithm
2. Incremental STS algorithm
3. Visualization interface

4.5 FEASIBILITY STUDY

Preliminary investigation examine project feasibility, the likelihood the system will
be useful to the organization. The main objective of the feasibility study is to test the Technical,
Operational and Economical feasibility for adding new modules and debugging old running
system. All system is feasible if they are unlimited resources and infinite time. There are aspects
in the feasibility study portion of the preliminary investigation:

 Technical Feasibility
 Operational Feasibility
 Economical Feasibility
4.5.1 ECONOMIC FEASIBILITY

A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs.

The system is economically feasible. It does not require any addition hardware or
software. Since the interface for this system is developed using the existing resources and
technologies available at NIC, There is nominal expenditure and economical feasibility for
certain.

4.5.2 OPERATIONAL FEASIBILITY

Proposed projects are beneficial only if they can be turned out into information system.
That will meet the organization’s operating requirements. Operational feasibility aspects of the
project are to be taken as an important part of the project implementation. Some of the important
issues raised are to test the operational feasibility of a project includes the following: -

 Is there sufficient support for the management from the users?


 Will the system be used and work properly if it is being developed and implemented?
 Will there be any resistance from the user that will undermine the possible application
benefits?
This system is targeted to be in accordance with the above-mentioned issues. Beforehand,
the management issues and user requirements have been taken into consideration. So there is no
question of resistance from the users that can undermine the possible application benefits.

The well-planned design would ensure the optimal utilization of the computer resources and
would help in the improvement of performance status.
4.5.3 TECHNICAL FEASIBILITY

The technical issue usually raised during the feasibility stage of the investigation includes
the following:

 Does the necessary technology exist to do what is suggested?


 Do the proposed equipments have the technical capacity to hold the data required to use the
new system?
 Will the proposed system provide adequate response to inquiries, regardless of the number or
location of users?
 Can the system be upgraded if developed?
 Are there technical guarantees of accuracy, reliability, ease of access and data security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure Implementation
System’. The current system developed is technically feasible. It is a web based user interface for
audit workflow at NIC-CSD. Thus it provides an easy access to the users. The database’s purpose
is to create, establish and maintain a workflow among various entities in order to facilitate all
concerned users in their various capacities or roles. Permission to the users would be granted
based on the roles specified. Therefore, it provides the technical guarantee of accuracy,
reliability and security. The software and hard requirements for the development of this project
are not many and are already available in-house at NIC or are available as free as open source.
5 System Requirements Specification

5.1 Introduction
A Software Requirements Specification (SRS) – a requirements specification for
a software system – is a complete description of the behavior of a system to be developed. It
includes a set of use cases that describe all the interactions the users will have with the software.
In addition to use cases, the SRS also contains non-functional requirements. Non-functional
requirements are requirements which impose constraints on the design or implementation (such
as performance engineering requirements, quality standards, or design constraints).
System requirements specification: A structured collection of information that embodies the
requirements of a system. A business analyst, sometimes titled system analyst, is responsible for
analyzing the business needs of their clients and stakeholders to help identify business problems
and propose solutions. Within the systems development life cycle domain, typically performs a
liaison function between the business side of an enterprise and the information technology
department or external service providers. Projects are subject to three sorts of requirements:
 Business requirements describe in business terms what must be delivered or
accomplished to provide value.
 Product requirements describe properties of a system or product (which could be one of
several ways to accomplish a set of business requirements.)
 Process requirements describe activities performed by the developing organization. For
instance, process requirements could specify specific methodologies that must be
followed, and constraints that the organization must obey.
Product and process requirements are closely linked. Process requirements often specify the
activities that will be performed to satisfy a product requirement. For example, a maximum
development cost requirement (a process requirement) may be imposed to help achieve a
maximum sales price requirement (a product requirement); a requirement that the product be
maintainable (a Product requirement) often is addressed by imposing requirements to follow
particular development styles
5.2 PURPOSE

An systems engineering, a requirement can be a description of what a system must do,


referred to as a Functional Requirement. This type of requirement specifies something that the
delivered system must be able to do. Another type of requirement specifies something about the
system itself, and how well it performs its functions. Such requirements are often called Non-
functional requirements, or 'performance requirements' or 'quality of service requirements.'
Examples of such requirements include usability, availability, reliability, supportability,
testability and maintainability.

A collection of requirements define the characteristics or features of the desired system. A 'good'
list of requirements as far as possible avoids saying how the system should implement the
requirements, leaving such decisions to the system designer. Specifying how the system should
be implemented is called "implementation bias" or "solution engineering". However,
implementation constraints on the solution may validly be expressed by the future owner, for
example for required interfaces to external systems; for interoperability with other systems; and
for commonality (e.g. of user interfaces) with other owned products.

In software engineering, the same meanings of requirements apply, except that the focus of
interest is the software itself.

4.3 FUNCTIONAL REQUIREMENTS


Send Request

Receiver

Post Message

Forward friend Request

Add new friend Accept Friend request

Apply Stemming

Text summarization data count


4.4 NON FUNCTIONAL REQUIREMENTS

The major non-functional Requirements of the system are as follows

Usability
The system is designed with completely automated process hence there is no or less user
intervention.

Reliability
The system is more reliable because of the qualities that are inherited from the chosen platform
java. The code built by using java is more reliable.

Performance
This system is developing in the high level languages and using the advanced front-end and
back-end technologies it will give response to the end user on client system with in very less
time.

Supportability
The system is designed to be the cross platform supportable. The system is supported on a wide
range of hardware and any software platform, which is having JVM, built into the system.
Implementation
The system is implemented in web environment using struts framework. The apache tomcat is
used as the web server and windows xp professional is used as the platform.
Interface the user interface is based on Struts provides HTML Tag

Software Requirements:
Language : JDK (1.7.0)
Frontend : JSP, Servlets
Backend : Oracle10g
IDE : my eclipse 8.6
Operating System : windows XP
Server : tomcat
Hardware Requirements:
Processor : Pentium IV
Hard Disk : 80GB
RAM : 2GB

6. System Design
6.1 Introduction
The purpose of the design phase is to plan a solution of the problem specified by the
requirement document. This phase is the first step in moving from the problem domain to the
solution domain. In other words, starting with what is needed, design takes us toward how to
satisfy the needs. The design of a system is perhaps the most critical factor affection the quality
of the software; it has a major impact on the later phase, particularly testing, maintenance. The
output of this phase is the design document. This document is similar to a blueprint for the
solution and is used later during implementation, testing and maintenance. The design activity is
often divided into two separate phases System Design and Detailed Design.
System Design also called top-level design aims to identify the modules that should be
in the system, the specifications of these modules, and how they interact with each other to
produce the desired results. At the end of the system design all the major data structures, file
formats, output formats, and the major modules in the system and their specifications are
decided.
During, Detailed Design, the internal logic of each of the modules specified in system
design is decided. During this phase, the details of the data of a module is usually specified in a
high-level design description language, which is independent of the target language in which the
software will eventually be implemented.
In system design the focus is on identifying the modules, where as during detailed design
the focus is on designing the logic for each of the modules. In other works, in system design the
attention is on what components are needed, while in detailed design how the components can be
implemented in software is the issue.

6.2 System Model


Introduction to UML
The unified Modeling Language (UML) is a standard language for writing software blueprints.
The UML may be used to visualize, specify , construct and document the artifacts of software-
intensive system.
The goal of UML is to provide a standard notation that can be used by all object - oriented
methods and to select and integrate the best elements .UML is itself does not prescribe or advice
on how to use that notation in a software development process or as part of an object - design
methodology. The UML is more than just bunch of graphical symbols. Rather , behind each
symbol in the UML notation is well-defined semantics.
The system development focuses on three different models of the system.
 Functional model
 Object model
 Dynamic model
Functional model in UML is represented with use case diagrams , describing the functionality
of the system from user point of view.

Object model in UML is represented with class diagrams , describing the structure of the system
in terms of objects , attributes , associations and operations.

Dynamic model in UML is represented with sequence diagrams , start chart diagrams and
activity diagrams describing the internal behaviour of the system.

6.3 Scenarios
A Use Case is an abstraction that all describes all possible scenarios involving the described
functionality . A scenario is an instance of a use case describing a concrete set of actions.
 The name of the scenario enables us to refer it ambiguously. The name of
scenario is underlined to indicate it is an instance.
 The Participating actor instance field indicates which actor instance are
involved in this scenario. Actor instance also have underlined names.
 The Flow of Events of scenario describe the sequence of events step by step.

6.3.1 Use Case Model


Use case diagrams represent the functionality of the system from a user point of view. A Use case
describes a function provided by the system that yields a visible result for an actor. an actor
describe any entity that interacts with the system. The identification of actors and use cases
results in the definition of the boundary of the system, which is , in differentiating the tasks
accomplished by the system and the tasks accomplished by its environment. The actors outside
the boundary of the system, where as the use cases are inside the boundary of the system
A Use case contains all the events that can occur between an actor and a set of scenarios that
explains the interactions as sequence of happenings.

Actors
Actors represent external entities that interact with the system. An actor can be human or external
system.
Actor are not part of the system. They represent anyone or anything that interact with the system.
An Actor may
 Only input information to the system.
 Only receive information from the system.
 Input and receive information from to and from the system.
During this activity , developers indentify the actors involved in this system are:

User:
User is an actor who uses the system and who performs the operations like data classifications
and execution performance that are required for him.

Use Cases:
Use cases are used during requirements elicitation and analysis to represent the functionality of
the system. Use case focus on the behaviour of the system from an external point of view. The
identification of actors and use cases results in the definition of the boundary of the system ,
which is , in differentiating the tasks accomplished by the system and the tasks accomplished by
its environment. The actors are outside the boundary of the system , where as the use cases are
inside the boundary of the system.

Use case diagram


Fig : Admin use case diagram

User :

Registraiton

login

forword friend req

add new Friend and accept req

forward or post msg

appy NPL classification

display redundant data

apply stemming

apply N-gram classification

User
display N-gram data

add ,remov e, stop words

clustering formation

select threshold v alue

display top K value

after merged cluster

enter value

text summarization count

logout
Fig : User Use diagram

6.3.2 Object model


Class Diagram
Class Diagrams are used to describe the structure of the system. Classes are
abstractions that specify the common structure and behaviour of a set of objects. Objects are
instances of classes that are created , modified and destroyed during the execution of a system.
An object has state that includes the values of its attributes and links with other objects.

The class diagram is used to refine the use cases diagrams and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an "is-a" or
"has-a" relationship. Each class in the class diagram may be capable of providing certain
functionalities. These functionalities provided by the class are termed "methods" of the classes.
Apart from this , each class may have certain "attributes" that uniquely indentify the class. In the
class diagram these classes are represented with boxes which contain three parts..

Class Diagram
UssrDAO
LoginDAO

+addFriend()
+addServices() +loginAction()
+changePwd() +destroy()
+checkUserId() +doGet()
+getCal() +doPost()
+rejectFriend()
+logout()

AdminBeanDAO MailDTO
MailDelete
+adminid +attachcount
+adminpwd +attachmentfile +usi
+admintype +bword
+cpword +deleteMails()
+city +insertAttachment()
+firstName +em
+empid +mailContacts()
+lastName +sendMail()
+getAdminId() +getAttachcount() +viewMail()
+getcity() +getAttachmentfile() +viewComment()
+getlastName() +getlogin()
+gefirstName() +getMessage()

Fig: Class diagram

6.3.3 Dynamic model


6.3.3.1 Sequence Diagram
Sequence diagrams are used to formalize the dynamic behaviour of the system and to visualize
the communication among the objects. They are useful for identifying the additional objects that
participate in the use case. Sequence diagram represent the objects participating in the interaction
horizontally and time vertically.
Sequence diagrams typically show a user or actor and the objects and the components they
interact with the execution of the use case. Each column represent an objects that participate in
the interaction. Message is shown by solid arrows. Labels on the solid arrows represent the
message names. Activations are depicted by vertical rectangles. The actor who initiates the
interaction is shown in the left most columns . The messages coming from the actor represent the
interactions described in the use case diagrams.
Sequence diagram:

Sender Apply NLP Classification DispRedundant data ApplyStemmingdata N-gram Clustering Top K Cluster Receiver

1 : Forward friend request()

2 : Add new fried and accept fwd req()

3 : post msg()

4 : remove redundant data()

5 : display data() 6 : apply stremming()

7 : apply N-gram()

8 : display N-gram data()

9 : clusterFormation()

10 : select threshold()

11 : Mergedcluster()

12 : display Top-K cluster()

13 : select value()

14 : Text summarization data count()

Fig : Sequence diagram


Collaboration Diagram :

Top K Cluster

11 : Mergedcluster()

10 : select threshold()

Receiver 13 : select value() Clustering

12 : display Top-K cluster() 9 : clusterFormation()

N-gram

2 : Add new fried and accept fwd req() 8 : display N-gram data()

7 : apply N-gram()
1 : Forward friend request()

ApplyStemmingdata

14 : Text summarization data count()

Apply NLP Classification 6 : apply stremming()

3 : post msg() 4 : remove redundant data()

5 : display data() DispRedundant data


Sender

Fig : Collaboration diagram

6.3.3.2 State Chart Diagram


UML State chart is notation for describing the sequence of states an object goes through in
response to external events. Objects have behaviour and state. The state of an object depends on
its current activity or condition. A state chart diagram shows the possible states of the object ad
the transitions that cause a change in state.
State chart describes the dynamic behaviour of an individual object as a number of states. A state
is a condition satisfied by attributes of objects. Given a state , a transition represents a future
state the object can move to and the conditions associated with the change of state.
A state is depicted by a rounded rectangle A transition is depicted by open arrows connecting two
states. States are labeled with their names. A small solid black circle indicates the initial state and
a circle surrounding the small solid circle indicates the final state.

State Chart Diagram

user

login

send

apply NLP classification

apply Stremming

N-gram

clustering

Receiver

State Chart Diagram


6.3.3.3 Activity Diagram
An Activity diagram describes the behavior of the system in terms of activities. Activities are
modeling elements that represent the execution of set of operations. The completion of these
operations triggers a transition to another activity. Activity diagrams similar to flowchart
diagrams in that they can be used to represent control flow and data flow . Activities are
represented by rounded rectangles and arrows are represented transition between activities .
Think bars represent the synchronization of the control flow.

Active Diagram:

User

Enter user ID and PWD

login fail

UserHome

Forward Req Accept Friend Req Forward Msg


ReceiveCommentPost View Profile

logout

Fig : Activity Diagram


Component Diagram:

add new friend and accept req

forward friend req

forward req , post req

Merged cluster
Server apply NLP clssification

display Top K cluster


Text summarization count

apply N-gram

Fig : Component diagram

Deployment Diagram:

Add new FriendReq


Forward friend req

System
Forward or post msg

merger cluster

apply stremming
diplayreduntant data

Fig : Deployment diagram


Flow Diagrams:
A graphical tool used to describe and analyze the moment of data through a system manual or

automated including the process, stores of data, and delays in the system. Data Flow Diagrams

are the central tool and the basis from which other components are developed. The

transformation of data from input to output, through processes, may be described logically and

independently of the physical components associated with the system. The DFD is also know as

a data flow graph or a bubble chart.

DFDs are the model of the proposed system. They clearly should show the requirements on

which the new system should be built. Later during design activity this is taken as the basis for

drawing the system’s structure charts. The Basic Notation used to create a DFD’s are as follows:

1. Dataflow: Data move in a specific direction from an origin to a destination.

2. Process: People, procedures, or devices that use or produce (Transform) Data. The physical

component is not identified.

3. Source: External sources or destination of data, which may be People, programs,

organizations or other entities.


4. Data Store: Here data are stored or referenced by a process in the System.

CONTEXT LEVEL 0 DIAGRAM:

Context level1 Diagram:

Login DFD

Context level 2:
Context level 3 Diagram:

6.5 Data Dictionaries and ER Diagram


Fig : ER diagram
DATABASE TABLES:

ADMIN_CONTACTS

ADMIN_PERSONAL

HOME_POTOS

HTMLDB_PLAN_TABLE
INBOX_MAILS

INBOX_MAIL_ATTACHMENT

LOGIN_DETAILS
OUTBOX_MAILS

OUTBOX_MAIL_ATTACHMENT

USER_COMMENTS
USER_CONTACTS
7. Implementation

7.1 Introduction
Implementation is the stage where the theoretical design is turned in to working system.
The most crucial stage is achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.

The system can be implemented only after through testing is done and if it found to work
according to the specification. It involves careful planning, investigation of the current system
and its constraints on implementation, design of methods to achieve the change over and an
evaluation of change over methods a part from planning. Two major tasks of preparing the
implementation are education and training of the users and testing of the system.

The more complex the system being implemented, the more involved will be the systems
analysis and design effort required just for implementation. The implementation phase comprises
of several activities. The required hardware and software acquisition is carried out. The System
may require some hardware and software acquisition is carried out. The system may require
some software to be developed. For this, programs are written and tested. The user then changes
over to his new fully tested system and the old system is discontinued.

Implementation is the process of having systems personnel check out and put new
equipment in to use, train users, install the new application, and construct any files of data
needed to it.

Depending on the size of the organization that will be involved in using the application
and the risk associated with its use, system developers may choose to test the operation in only
one area of the firm, say in one department or with only one or two persons. Sometimes they
will run the old and new systems together to compare the results. In still other situations,
developers will stop using the old system one-day and begin using the new one the next. As we
will see, each implementation strategy has its merits, depending on the business situation in
which it is considered. Regardless of the implementation strategy used, developers strive to
ensure that the system’s initial use in trouble-free.
7.1 Technology Description

About the Java Technology


The Java platform consists of the Java application programming interfaces (APIs)
and the Java virtual machine (JVM).

The following Java technology lets developers, designers, and business partners develop and
deliver a consistent user experience, with one environment for applications on mobile and
embedded devices. Java meshes the power of a rich stack with the ability to deliver customized
experiences across such devices.

Java APIs are libraries of compiled code that you can use in your programs. They let you add
ready-made and customizable functionality to save you programming time.
Java programs are run (or interpreted) by another program called the Java Virtual Machine.
Rather than running directly on the native operating system, the program is interpreted by the
Java VM for the native operating system. This means that any computer system with the Java
VM installed can run Java programs regardless of the computer system on which the applications
were originally developed.

In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris TM Operating System (Solaris OS),
Linux, or Mac OS.

Java technology is both a programming language and a platform.

The Java Programming Language

The Java programming language is a high-level language that can be characterized by all of the
following buzzwords:

 Simple  Architecture neutral

 Object oriented  Portable

 Distributed  High performance

 Multithreaded  Robust

 Dynamic  Secure

Each of the preceding buzzwords is explained in The Java Language Environment , a white
paper written by James Gosling and Henry McGilton.

In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine 1 (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
An overview of the software development process.

Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris™ Operating System (Solaris OS), Linux,
or Mac OS. Some virtual machines, such as the Java HotSpot virtual machine, perform
additional steps at runtime to give your application a performance boost. This include various
tasks such as finding performance bottlenecks and recompiling (to native code) frequently used
sections of code

Through the Java VM, the same application is capable of running on multiple platforms.

Servlet and JSP technology


Servlet and JSP technology has become the technology of choice for developing online stores,
interactive
A Servlet’s Job
Servlets are Java programs that run on Web or application servers, acting as a middle layer
between requests coming from Web browsers or other HTTP clients and databases or
applications on the HTTP server. Their job is to perform the following tasks,
as illustrated in Figure 1–1.

1. Read the explicit data sent by the client.


The end user normally enters this data in an HTML form on a Web page. However, the data
could also come from an applet or a custom HTTP client program. Chapter 4 discusses how
servlets read this data.
2. Read the implicit HTTP request data sent by the browser.
Figure 1–1 shows a single arrow going from the client to the Web server (the layer where servlets
and JSP execute), but there are really two varieties of data: the explicit data that the end user
enters in a form and the behind-the-scenes HTTP information. Both varieties are critical. The
HTTP information includes cookies, information about media types and compression schemes
the browser understands,

3. Generate the results.


This process may require talking to a database, executing an RMI or EJB call, invoking a Web
service, or computing the response directly. Your real data may be in a relational database. Fine.
But your database probably doesn’t speak HTTP or return results in HTML, so the Web browser
can’t talk directly to the database. Even if it could, for security reasons, you probably would not
want it to. The same argument applies to most other applications. You need the Web middle layer
to extract the incoming data from the HTTP stream, talk to the application, and embed the results
inside a document.
4. Send the explicit data (i.e., the document) to the client.
This document can be sent in a variety of formats, including text (HTML or XML), binary (GIF
images), or even a compressed format like gzip that is layered on top of some other underlying
format. But, HTML is by far the most common format, so an important servlet/JSP task is to
wrap the results inside of HTML.
5. Send the implicit HTTP response data.
Figure 1–1 shows a single arrow going from the Web middle layer (the servlet or JSP page) to
the client. But, there are really two varieties of data sent: the document itself and the behind-the-
scenes HTTP information. Again, both varieties are critical to effective development. Sending
HTTP response data involves telling the browser or other client what type of document is being
returned (e.g., HTML), setting cookies and caching parameters

The Advantages of Servlets Over “Traditional” CGI


Java servlets are more efficient, easier to use, more powerful, more portable, safer, and cheaper
than traditional CGI and many alternative CGI-like technologies. With traditional CGI, a new
process is started for each HTTP request. If the CGI program itself is relatively short, the
overhead of starting the process can dominate the execution time. With servlets, the Java virtual
machine stays running and handles each request with a lightweight Java thread, not a
heavyweight operating system process. Similarly, in traditional CGI, if there are N requests to the
same CGI program, the code for the CGI program is loaded into memory N times. With servlets,
however, there would be N threads, but only a single copy of the servlet class would be
loaded. This approach reduces server memory requirements and saves time by instantiating fewer
objects. Finally, when a CGI program finishes handling a request, the program terminates. This
approach makes it difficult to cache computations, keep database connections open, and perform
other optimizations that rely on persistent data. Servlets, however, remain in memory even after
they complete a response, so it is straightforward to store arbitrarily complex data between client
requests.
Convenient
Servlets have an extensive infrastructure for automatically parsing and decoding HTML
form data, reading and setting HTTP headers, handling cookies, tracking sessions, and many
other such high-level utilities. In CGI, you have to do much of this yourself. Besides, if you
already know the Java programming language, why learn Perl too? You’re already convinced
that Java technology makes for more reliable and reusable code than does Visual Basic,
VBScript, or C++. Why go back to those languages for server-side programming?
Powerful
Servlets support several capabilities that are difficult or impossible to accomplish with
regular CGI. Servlets can talk directly to the Web server, whereas regular CGI programs cannot,
at least not without using a server-specific API. Communicating with the Web server makes it
easier to translate relative URLs into concrete path names, for instance. Multiple servlets can
also share data, making it easy to implement database connection pooling and similar resource-
sharing optimizations. Servlets can also maintain information from request to request,
simplifying techniques like session tracking and caching of previous computations.

Portable
Servlets are written in the Java programming language and follow a standard API. Servlets are
supported directly or by a plug-in on virtually every major Web server. Consequently, servlets
written for, say, Macromedia Run can run virtually unchanged on Apache Tomcat, Microsoft
Internet Information Server (with a separate plug-in), IBM Web Sphere, planet Enterprise Server,
Oracle9i AS, or Star Nine Webster. They are part of the Java 2 Platform, Enterprise Edition
(J2EE; see http://java.sun.com/j2ee/), so industry support for servlets is becoming even more
pervasive.
Inexpensive
A number of free or very inexpensive Web servers are good for development use or deployment
of low- or medium-volume Web sites. Thus, with servlets and JSP you can start with a free or
inexpensive server and migrate to more expensive servers with high-performance capabilities or
advanced administration utilities only after your project meets initial success. This is in contrast
to many of the other CGI alternatives, which require a significant initial investment for the
purchase of a proprietary package. Price and portability are somewhat connected. For example,
Marty tries to keep track of the countries of readers that send him questions by email. India was
near the top of the list, probably #2 behind the U.S. Marty also taught one of his JSP and servlet
training courses (see http://courses.coreservlets.com/) in Manila, and there was great interest in
servlet and JSP technology there. Now, why are India and the Philippines both so interested? We
surmise that the answer is twofold. First, both countries have large pools of well-educated
software developers.

Secure
One of the main sources of vulnerabilities in traditional CGI stems from the fact that the
programs are often executed by general-purpose operating system shells. So, the CGI
programmer must be careful to filter out characters such as backquotes and semicolons that are
treated specially by the shell. Implementing this precaution is harder than one might think, and
weaknesses stemming from this problem are constantly being uncovered in widely used CGI
libraries. A second source of problems is the fact that some CGI programs are processed by
languages that do not automatically check array or string bounds. For example, in C and C++ it
is perfectly legal to allocate a 100-element array and then write into the 999th “element,” which
is really some random part of program memory. So, programmers who forget to perform this
check open up their system to deliberate or accidental buffer overflow attacks. Servlets suffer
from neither of these problems. Even if a servlet executes a system call (e.g., with Runtime. Exec
or JNI) to invoke a program on the local operating system, it does not use a shell to do so. And,
of course, array bounds checking and other memory protection features are a central part of the
Java programming language.
Mainstream
There are a lot of good technologies out there. But if vendors don’t support them and developers
don’t know how to use them, what good are they? Servlet and JSP technology is supported by
servers from Apache, Oracle, IBM, Sybase, BEA, Macromedia, Caucho, Sun/planet, New
Atlanta, ATG, Fujitsu, Ultras, Silver stream, the World Wide Web Consortium (W3C), and many
others. Several low-cost plugins add support to Microsoft IIS and Zeus as well. They run on
Windows, Unix/Linux, Maces, VMS, and IBM mainframe operating systems. They are the single
most popular application of the Java programming language. They are arguably the most popular
choice for developing medium to large Web applications. They are used by the airline
industry (most United Airlines and Delta Airlines Web sites), e-commerce (ofoto.com), online
banking (First USA Bank, Blanco Popular de Puerto Rico), Web search engines/portals
(excite.com), large financial sites (American Century Investments), and hundreds of other sites
that you visit every day. Of course, popularity alone is no proof of good technology. Numerous
counter-examples abound. But our point is that you are not experimenting with a
new and unproven technology when you work with server-side Java.

The Role of JSP


A somewhat oversimplified view of servlets is that they are Java programs with HTML
embedded inside of them. A somewhat oversimplified view of JSP documents is that they are
HTML pages with Java code embedded inside of them. For example, compare the sample servlet
shown earlier (Listing 1.1) with the JSP page shown below (Listing 1.2). They look totally
different; the first looks mostly like a regular Java class, whereas the second looks mostly like a
normal HTML page. The interesting thing is that, despite the huge apparent difference, behind
the scenes they are the same. In fact, a JSP document is just another way of writing a servlet. JSP
pages get translated into servlets, the servlets get compiled, and it is the servlets that run at
request time. So, the question is, If JSP technology and servlet technology are essentially
equivalent in power, does it matter which you use? The answer is, Yes, yes, yes! The issue is not
power, but convenience, ease of use, and maintainability. For example, anything you can do in
the Java programming language you could do in assembly language. Does this mean that it does
not matter which you use? Hardly. JSP is discussed in great detail starting in Chapter 10. But, it
is worthwhile mentioning now how servlets and JSP fit together. JSP is focused on simplifying
the creation and maintenance of the HTML. Servlets are best at invoking the business logic and
performing complicated operations. A quick rule of thumb is that servlets are best for tasks
oriented toward processing, whereas JSP is best for tasks oriented toward presentation. For some
requests, servlets are the right choice. For other requests, JSP is a better option. For still others,
neither servlets alone nor JSP alone is best, and a combination of the two (see Chapter 15,
“Integrating Servlets and JSP: The Model View Controller (MVC) Architecture”) is best. But the
point is that you need both servlets and JSP in your overall project: almost no project will consist
entirely of servlets or entirely of JSP. You want both.
8. System Testing
8.1 Testing Methodologies
Testing is the process of finding differences between the expected behavior
specified by system models and the observed behavior implemented system. From modeling
point of view , testing is the attempt of falsification of the system with respect to the system
models. The goal of testing is to design tests that exercise defects in the system and to reveal
problems.
The process of executing a program with intent of finding errors is called testing. During testing ,
the program to be tested is executed with a set of test cases , and the output of the program for
the test cases is evaluated to determine if the program is performing as expected . Testing forms
the first step in determining the errors in the program. The success of testing in revealing errors
in program depends critically on test cases.

Strategic Approach to Software Testing:


The software engineering process can be viewed as a spiral. Initially system engineering defines
the role of software and leads to software requirements analysis where the information domain ,
functions , behavior , performance , constraints and validation criteria for software are
established. moving inward along the spiral , we come to design and finally to coding . To
develop computer software we spiral in along streamlines that decreases the level of abstraction
on each item.
A Strategy for software testing may also be viewed in the context of the spiral. Unit testing
begins at the vertex of the spiral and concentrates on each unit of the software as implemented in
source code. Testing will progress by moving outward along the spiral to integration testing ,
where the focus on the design and the concentration of the software architecture. Talking another
turn on outward on the spiral we encounter validation testing where requirements established as
part of software requirements analysis are validated against the software that has been
constructed . Finally we arrive at system testing , where the software and other system elements
are tested as a whole .

UNUNI
UNIT TESTING
MODULE

Component
SUB-SYSTEM

SYSTEM TESTING
Integration Testing

User Testing
ACCEPTANCE

Different Levels of Testing

Client Needs Acceptance Testing


Requirements System Testing
Design Integration Testing
Code Unit Testing

Testing is the process of finding difference between the expected behavior specified by system
models and the observed behavior of the implemented system.

8.2 Testing Activities


Different levels of testing are used in the testing process , each level of testing aims to test
different aspects of the system. the basic levels are:
Unit testing
Integration testing
System testing
Acceptance testing

Unit Testing
Unit testing focuses on the building blocks of the software system, that is, objects and sub system
. There are three motivations behind focusing on components. First, unit testing reduces the
complexity of the overall tests activities, allowing us to focus on smaller units of the system.
Second , unit testing makes it easier to pinpoint and correct faults given that few components are
involved in this test . Third , Unit testing allows parallelism in the testing activities , that is each
component can be tested independently of one another . Hence the goal is to test the internal
logic of the module.

Integration Testing
In the integration testing, many test modules are combined into sub systems , which are then
tested . The goal here is to see if the modules can be integrated properly, the emphasis being on
testing module interaction.
After structural testing and functional testing we get error free modules. These modules are to be
integrated to get the required results of the system. After checking a module, another module is
tested and is integrated with the previous module. After the integration, the test cases are
generated and the results are tested.

System Testing
In system testing the entire software is tested . The reference document for this process is the
requirement document and the goal is to see whether the software meets its requirements. The
system was tested for various test cases with various inputs.

Acceptance Testing
Acceptance testing is sometimes performed with realistic data of the client to demonstrate that
the software is working satisfactory. Testing here focus on the external behavior of the system ,
the internal logic of the program is not emphasized . In acceptance testing the system is tested for
various inputs.
8.3 Types of Testing
1. Black box or functional testing
2. White box testing or structural testing

Black box testing


This method is used when knowledge of the specified function that a product has been designed
to perform is known . The concept of black box is used to represent a system whose inside
workings are not available to inspection . In a black box the test item is a "Black" , since its logic
is unknown , all that is known is what goes in and what comes out , or the input and output.
Black box testing attempts to find errors in the following categories:
Incorrect or missing functions
Interface errors
Errors in data structure
Performance errors
Initialization and termination errors

As shown in the following figure of Black box testing , we are not thinking of the internal
workings , just we think about
What is the output to our system?
What is the output for given input to our system?

White box testing


White box testing is concerned with testing the implementation of the program. the intent of

?
structural is not to exercise all the inputs or outputs but to exercise the different programming
and data structure used in the program. Thus structural testing aims to achieve test cases that will
force the desire coverage of different structures . Two types of path testing are statement testing
coverage and branch testing coverage.

Input Output
INTERNAL
WORKING
The White Box testing strategy , the internal workings

8.4 Test Plan


Testing process starts with a test plan. This plan identifies all the testing related activities that
must be performed and specifies the schedules , allocates the resources , and specified guidelines
for testing . During the testing of the unit the specified test cases are executed and the actual
result compared with expected output. The final output of the testing phase is the test report and
the error report.

Test Data:
Here all test cases that are used for the system testing are specified. The goal is to test the
different functional requirements specified in Software Requirements Specifications (SRS)
document.

Unit Testing:
Each individual module has been tested against the requirement with some test data.

Test Report:
The module is working properly provided the user has to enter information. All data entry forms
have tested with specified test cases and all data entry forms are working properly.

Error Report:
If the user does not enter data in specified order then the user will be prompted with error
messages. Error handling was done to handle the expected and unexpected errors.

8.7 Test cases

A Test case is a set of input data and expected results that exercises a component
with the purpose of causing failure and detecting faults . test case is an explicit set of instructions
designed to detect a particular class of defect in a software system , by bringing about a failure .
A Test case can give rise to many tests.
TEST CASES:

Test cases can be divided in to two types. First one is Positive test cases and second one is
negative test cases. In positive test cases are conducted by the developer intention is to get the
output. In negative test cases are conducted by the developer intention is to don’t get the output.

POSITIVE TEST CASES

S .No Test case Description Actual value Expected value Result

1 Create new user Enter the Update personal True


registration process personal info and info and address
address info. info in to oracle
database
successfully

2 Enter the username and Verification of Login True


password login details. Successfully

3 Send Request Fill all fields Request sent True


successfully

NEGATIVE TEST CASES

S .No Test case Description Actual value Expected value Result

1 Create the new user Enter the Personal info and False
registration process personal info and address info its
address info. not update into
database
successfully.

2 Enter the username and Verification of Login failed False


password login details.

3 Send friend request Verification of all Request faild False


field

9. Conclusion and Future Enhancements©

9.1 Conclusion
Cloud services are experiencing rapid development and the services based on multi-cloud also
become prevailing. One of the most concerns, when moving services into clouds, is capital
expenditure. So, in this paper, we design a novel storage scheme CHARM, which guides
customers to distribute data among clouds cost-effectively. CHARM makes fine-grained
decisions about which storage mode to use and which clouds to place data in. The evaluation
proves the efficiency of CHARM.

REFERENCES
[1] P. Mika and G. Tummarello, “Web semantics in the clouds,” IEEE Intell. Syst., vol. 23,
no. 5, pp. [2] T. Preis, H. S. Moat, and E. H. Stanley, “Quantifying trading behavior in
financial markets using Google trends,” Sci. Rep., vol. 3, p. 1684, 2013.

[3] H. Choi and H. Varian, “Predicting the present with Google trends,” Econ. Rec., vol. 88,
no. s1, pp. 2–9, 2012.

[4] C.-T. Ho, R. Agrawal, N. Megiddo, and R. Srikant,, “Range queries in OLAP data
cubes,” ACM SIGMOD Rec., vol. 26, no. 2, pp. 73–88, 1997.

[5] G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data:
Twitter’s real-time related query suggestion architecture,” in Proc. ACM SIGMOD Int.
Conf. Manage. Data, 2013, pp. 1147–1158.

[6] W. Liang, H. Wang, and M. E. Orlowska, “Range queries in dynamic OLAP data cubes,”
Data Knowl. Eng., vol. 34, no. 1, pp. 21–38, Jul. 2000.
[7] J. M. Hellerstein, P. J. Haas, and H. J. Wang, “Online aggregation,” ACM SIGMOD
Rec., vol. 26, no. 2, 1997, pp. 171–182.

[8] P. J. Haas and J. M. Hellerstein, “Ripple joins for online aggrega-


tion,”inACMSIGMODRec.,vol.28,no.2,pp.287–298,1999.

[9] E. Zeitler and T. Risch, “Massive scale-out of expensive continu- ous queries,” Proc.
VLDB Endowment, vol. 4, no. 11, pp. 1181–1188, 2011.

[10] N. Pansare, V. Borkar, C. Jermaine, and T. Condie, “Online aggre- gation for large
MapReduce jobs,” Proc. VLDB Endowment, vol. 4, no. 11, pp. 1135–1145, 2011.

[11] T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, J. Gerth, J. Tal- bot, K. Elmeleegy,


and R. Sears, “Online aggregation and continu- ous query support in MapReduce,” in
Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 1115–1118.

[12] Y. Shi, X. Meng, F. Wang, and Y. Gan, “You can stop early with cola: Online processing
of aggregate queries in the cloud,” in Proc. 21st ACM Int. Conf. Inf. Know. Manage.,
2012, pp. 1223–1232. [13] K. Bilal, M. Manzano, S. Khan, E. Calle, K. Li, and A.
Zomaya, “On the characterization of the structural robustness of data center

Appendix

Sample code:

package package com.socialnet.dao;


import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.sql.Blob;
import java.sql.CallableStatement;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.Types;
import java.util.ArrayList;
import java.util.Vector;

import com.socialnet.bean.UserBean;
import com.socialnet.conections.AbstractDataAccessObject;
import com.socialnet.util.DateWrapper;

public class UserDAO extends AbstractDataAccessObject {

Connection con = null;


PreparedStatement ps = null;
Statement st = null;
CallableStatement cstmt = null;
ResultSet rs = null;

public UserDAO() {
con = getConnection();
try {
con.setAutoCommit(false);
} catch (Exception e) {
System.out.println(e);

}
}

public UserBean getUserContacts(String uname) {


UserBean ubean = new UserBean();

try {
cstmt = con
.prepareCall("{call
GetUserContacts(?,?,?,?,?,?,?,?,?,?,?,?,?)}");

cstmt.setString(1, uname);

cstmt.registerOutParameter(2, Types.VARCHAR);
cstmt.registerOutParameter(3, Types.VARCHAR);
cstmt.registerOutParameter(4, Types.VARCHAR);
cstmt.registerOutParameter(5, Types.VARCHAR);
cstmt.registerOutParameter(6, Types.VARCHAR);
cstmt.registerOutParameter(7, Types.VARCHAR);
cstmt.registerOutParameter(8, Types.VARCHAR);
cstmt.registerOutParameter(9, Types.VARCHAR);
cstmt.registerOutParameter(10, Types.VARCHAR);
cstmt.registerOutParameter(11, Types.VARCHAR);
cstmt.registerOutParameter(12, Types.VARCHAR);
cstmt.registerOutParameter(13, Types.DATE);

cstmt.execute();

ubean.setFirstname(cstmt.getString(2));
ubean.setMidlename(cstmt.getString(3));
ubean.setLastname(cstmt.getString(4));
ubean.setMobile(cstmt.getString(5));
ubean.setCity(cstmt.getString(6));
ubean.setState(cstmt.getString(7));
ubean.setCountry(cstmt.getString(8));
ubean.setPin(cstmt.getString(9));
ubean.setMail(cstmt.getString(10));
ubean.setSex(cstmt.getString(11));
ubean.setVillage(cstmt.getString(12));
ubean.setDob(cstmt.getString(13));

} catch (Exception e) {
System.out
.println("............exception rised in getusecontacts
fun:");
e.printStackTrace();
} finally {
try {
con.close();
cstmt.close();
} catch (Exception e) {
e.printStackTrace();
}
}

System.out.println(ubean);
return ubean;
}

public boolean updateUserContacts(UserBean ubean) {


String f = "";
boolean flag = false;

try {
cstmt = con
.prepareCall("{call
UpdateUserContacts(?,?,?,?,?,?,?,?,?,?,?,?,?,?)}");
cstmt.setString(1, ubean.getUserid());

cstmt.setString(2, ubean.getFirstname());
cstmt.setString(3, ubean.getMidlename());
cstmt.setString(4, ubean.getLastname());
cstmt.setString(5, ubean.getMobile());
cstmt.setString(6, ubean.getCity());
cstmt.setString(7, ubean.getState());
cstmt.setString(8, ubean.getCountry());
cstmt.setString(9, ubean.getPin());
cstmt.setString(10, ubean.getMail());
cstmt.setString(11, ubean.getSex());
cstmt.setString(12, ubean.getVillage());
cstmt.setString(13, DateWrapper.parseDate(ubean.getDob()));
cstmt.registerOutParameter(14, Types.VARCHAR);

cstmt.execute();
f = cstmt.getString(14);
if (f.equalsIgnoreCase("true")) {
flag = true;
con.commit();
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
con.close();
cstmt.close();
} catch (Exception e) {
e.printStackTrace();
}
}
return flag;
}
1. Screen Shots:
LoginHome:
Login Page:

Fig : Login Page


Registration

Fig: Registration
Fig : Add Friend
Fig : write welcome message
Login New User

Fig: Login new user


Fig: Post product
Fig : view Posted Content
Fig: post rating
Fig : Content graph
Fig : user comment
Fig : NPL
Fig : After Removing Redundant Characters
Fig : After Stemming Process Removal
Fig : N-gram
Fig : view Cluster
Fig : Top-K Cluster
Fig : Merged Clusters
Fig : Text Summatization Graph
Fig : Text Summarization Graph
Admin Home :

Fig : Admin Home


Fig : Post product comments
Fig : Update Profile
Fig : Change Password

You might also like