Final Document
Final Document
Final Document
On
Short Text Summarization on Comment Streams from Social
Network Services
Submitted in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology
in
Department of Computer Science and Engineering.
By
By
CERTIFICATE
This is to certify that the project entitled “Short Text Summarization on Comment Streams
from Social Network Services” is submitted by B.Sunipriya(12241A0568), Sushmita Thapa
(12241A0556), A.Aishwarya Valli(12241A0562) and P.Harshitha(12H61A05K7) in partial
fulfillment of the requirement for the award of the degree in BACHELOR OF
TECHNOLOGY in Computer Science and Engineering during academic year 2015-2016.
External Examiner
DECLARATION
We hereby declare that the project entitled “Short Text Summarization on Comment Streams
from Social Network Services” is the work done during the period from 28-12-2015 to 21-04-
2016 and is submitted in the partial fulfillment of the requirements for the award of degree of
Bachelor of technology in Computer Science and Engineering from Gokaraju Rangaraju Institute
of Engineering and Technology (Autonomous under Jawaharlal Nehru Technology University,
Hyderabad). The results embodied in this project have not been submitted to any other university
or Institution for the award of any degree or diploma.
P.Harshitha (12H61A05K7)
ACKNOWLEDGEMENT
There are many people who helped me directly and indirectly to complete my project
successfully. I would like to take this opportunity to thank one and all.
First of all I would like to express my deep gratitude towards my Supervisor Dr.
P.Vijayapal Reddy, Professor, Department of CSE for his support in the completion of my
dissertation. I wish to express my sincere thanks to Dr. K. Anuradha , HOD, Dept of CSE and
also to our principal Dr. Jandhyala N Murthy for providing the facilities to complete the
dissertation.
I would like to thank all our faculty and friends for their help and constructive criticism
during the project period. Finally I am very much indebted to our parents for their moral support
and encouragement to achieve goals.
P.Harshitha (12H61A05K7)
Short Text Summarization on Comment Streams from Social
Network Services
Abstract:
This paper focuses on the problem of short text summarization on the comment stream of
a specific message from social network services (SNS). Due to the high popularity of SNS, the
quantity of comments may increase at a high rate right after a social message is published.
Motivated by the fact that users may desire to get a brief understanding of a comment stream
without reading the whole comment list, we attempt to group comments with similar content
together and generate a concise opinion summary for this message. Since distinct users will
request the summary at any moment, existing clustering methods cannot be directly applied and
cannot meet the real-time need of this application. In this paper, we model a novel incremental
clustering problem for comment stream summarization on SNS. Moreover, we propose IncreSTS
algorithm that can incrementally update clustering results with latest incoming comments in real
time. Furthermore, we design an at-a-glance visualization interface to help users easily and
rapidly get an overview summary. From extensive experimental results and a real case
demonstration, we verify that IncreSTS possesses the advantages of high efficiency, high
scalability, and better handling outliers, which justifies the practicability of IncreSTS on the
target problem.
Page No
1.1 Purpose
1.2 Scope
1.3 Motivation
1.3.1 Definitions
1.3.2 Abbreviations
1.4 Overview
2. Literature Survey
2.1 Introduction
2.2 History
2.3 Purpose
2.4 Requirements
4. System Analysis
4.1.1 Drawbacks
4.3.1 Advantages
5.1 Introduction
5.2 Purpose
6. System Design
6.1 System Specifications
6.3 DFD’s
7. Implementation
8. System Testing
9.1 Conclusion
10. References
Appendix:
1. Sample code
2. screenshots
Introduction:
1.1 Purpose:
1. We may still desire to know what are they talking about and what are the opinions of
these discussion participants.
2. Moreover, celebrities and corporations will have high interest to understand how their
fans and customers reacting to certain topics and content.
1.2 Scope
In database research, solutions have been proposed, which given a keyword query, retrieve the
most relevant structured results , or simply, select the single most relevant databases . However,
these approaches are single-source solutions. They are not directly applicable to the web of
Linked Data, where results are not bounded by a single source but might encompass several
Linked Data sources. The goal is to produce routing plans, which can be used to compute results
from multiple sources.
1.3 Motivation
2 We are inspired to develop an advanced summarization technique targeting at comment
streams in SNS.
2. Literature Survey
Larger and larger amounts of data are collected and stored in databases increasing the
need for efficient and effective analysis methods to make use of the information contained
implicitly in the data. One of the primary data analysis tasks is cluster analysis which is intended
to help a user to understand the natural grouping or structure in a data set. Therefore, the
development of improved clustering algorithms has received a lot of attention in the last few
years. Roughly speaking, the goal of a clustering algorithm is to group the objects of a database
into a set of meaningful subclasses.
3. Fundamental Concepts on (Domain)
While data mining represents a significant advance in the type of analytical tools currently
available, there are limitations to its capability. One limitation is that although data mining
can help reveal patterns and relationships, it does not tell the user the value or significance of
these patterns. These types of determinations must be made by the user. A second limitation
is that while data mining can identify connections between behaviours and/or variables, it does
not necessarily identify a causal relationship. To be successful, data mining still requires skilled
technical and analytical specialists who can structure the analysis and interpret the output that is
created.
Data mining is becoming increasingly common in both the private and public sectors. Industries
such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs,
enhance research, and increase sales. In the public sector, data mining applications initially
were used as a means to detect fraud and waste, but have grown to also be used for purposes
such as measuring and improving program performance. However, some of the homeland
security data mining applications represent a significant expansion in the quantity and scope of
data to be analyzed. Two efforts that have attracted a higher level of congressional interest
include the Terrorism Information Awareness (TIA) project (now-discontinued) and the
Computer-Assisted Passenger Pre-screening System II (CAPPS II) project (now- cancelled and
replaced by Secure Flight).
Data mining is a process that analyzes the large amount of data to find the new and hidden
information that improves business efficiency. Various industries have been adopt data mining to
their mission-critical business processes to gain competitive advantages and help business grows.
This tutorial illustrates some data mining applications in sale/marketing, banking/finance, health
care and insurance, transportation and medicine.
Data mining enables the businesses to understand the patterns hidden inside past purchase
transactions, thus helping in plan and launch new marketing campaigns in prompt and cost
effective way. The following illustrates several data mining applications in sale and marketing.
Data mining is used for market basket analysis to provides insight information on what
product combinations were purchased, when they were bought and in what sequence by
customers. This information helps businesses to promote their most profitable products to
maximize the profit. In addition, it encourages customers to purchase related products
that they may have been missed or overlooked.
Retails companies uses data mining to identify customer’s behavior buying patterns.
Data Mining Applications in Banking / Finance
Several data mining techniques such as distributed data mining has been researched,
modeled and developed to help credit card fraud detection.
Data mining is used to identify customers loyalty by analyzing the data of customer’s
purchasing activities such as the data of frequency of purchase in a period of time, total
monetary value of all purchases and when was the last purchase. After analyzing those
dimensions, the relative measure is generated for each customer. The higher of the score,
the more relative loyal the customer is.
To help bank to retain credit card customers, data mining is used. By analyzing the past
data, data mining can help banks to predict customers that likely to change their credit
card affiliation so they can plan and launch different special offers to retain those
customers.
Credit card spending by customer groups can be identified by using data mining.
From historical market data, data mining enable to identify stock trading rules.
The growth of the insurance industry is entirely depends on the ability of converting data into the
knowledge, information or intelligence about customers, competitors and its markets. Data
mining is applied in insurance industry lately but brought tremendous competitive advantages to
the companies who have implemented it successfully. The data mining applications in insurance
industry are listed below:
Data mining is applied in claims analysis such as identifying which medical procedures
are claimed together.
Data mining enables to forecasts which customers will potentially purchase new policies.
Data mining allows insurance companies to detect risky customers’ behavior patterns.
Data mining helps detect fraudulent behavior.
Data mining helps to determine the distribution schedules among warehouses and outlets
and analyze loading patterns.
Data mining enables to characterize patient activities to see coming office visits.
Data mining help identify the patterns of successful medical therapies for different
illnesses.
Data mining applications are continuously developing in various industries to provide more
hidden knowledge that enable to increase business efficiency and grow businesses.
In software engineering the SDLC concept underpins many kinds of software development
methodologies. These methodologies form the framework for planning and controlling the
creation of an information system the software development process.
Spiral Model
The spiral model is similar to the incremental model, with more emphases placed on risk
analysis. The spiral model has four phases: Planning, Risk Analysis, Engineering and
Evaluation. A\ software project repeatedly passes through these phases in iterations (called
Spirals in this model). The baseline spiral, starting in the planning phase, requirements is
gathered and risk is assessed. Each subsequent spirals builds on the baseline spiral.
Requirements are gathered during the planning phase. In the risk analysis phase, a process is
undertaken to identify risk and alternate solutions. A prototype is produced at the end of the
risk analysis phase. Software is produced in the engineering phase, along with testing at
the end of the phase. The evaluation phase allows the customer to evaluate the output of the
project to date before the project continues to the next spiral. In the spiral model, the angular
component represents progress, and the radius of the spiral represents cost. Spiral Life Cycle
Model.
This document play a vital role in the development of life cycle (SDLC) as it describes the
complete requirement of the system. It means for use by developers and will be the basic during
testing phase. Any changes made to the requirements in the future will have to go through
formal change approval process.
SPIRAL MODEL was defined by Barry Boehm in his 1988 article, “A spiral Model of
Software Development and Enhancement. This model was not the first model to discuss iterative
development, but it was the first model to explain why the iteration models.
As originally envisioned, the iterations were typically 6 months to 2 years long. Each phase
starts with a design goal and ends with a client reviewing the progress thus far. Analysis and
engineering efforts are applied at each phase of the project, with an eye toward the end goal of
the project.
The new system requirements are defined in as much details as possible. This usually
involves interviewing a number of users representing all the external or internal users
and other aspects of the existing system.
A first prototype of the new system is constructed from the preliminary design. This
is usually a scaled-down system, and represents an approximation of the
characteristics of the final product.
A second prototype is evolved by a fourfold procedure:
1. Evaluating the first prototype in terms of its strengths, weakness, and risks.
At the customer option, the entire project can be aborted if the risk is deemed too
great. Risk factors might involved development cost overruns, operating-cost
miscalculation, or any other factor that could, in the customer’s judgment, result in a
less-than-satisfactory final product.
The existing prototype is evaluated in the same manner as was the previous prototype,
and if necessary, another prototype is developed from it according to the fourfold
procedure outlined above.
The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.
The final system is thoroughly evaluated and tested. Routine maintenance is carried
on a continuing basis to prevent large scale failures and to minimize down time.
Fig -Spiral Model
Advantages
Disadvantages:
1. Information overloaded problem.
2. More amounts of comments are displayed as a summarized content.
3. Similar comments are not removed.
4. Unstructured texts provided by the previous approaches.
Proposed System:
Advantages:
1. Provide effective short text summarization result.
2. High efficiency clustering results
3. Remove similar comments information.
4. Provide informative and impressive summarization results
ARCHITECTURE:
SYSTEM MODEL :
We present the system model for comment stream summarization on SNS. We focus on the
comment stream added for one message on SNS and aim to generate the immediate summary of
comments. The problem we tackle is described as follows.
Once a message is posted on SNS, users can leave comments immediately and the number of
comments may rise quickly and continuously. Moreover, readers are usually unwilling to go over
the whole list of comments, but they may request to see the summary at any moment. This
indicates that the proposed approach should be able to generate the summary result at any time
point of a dynamic data stream. To satisfy this requirement, we model this problem as an
incremental clustering task.
Initially, for each word, the process of punctuation removal will be applied to eliminate
unnecessary punctuation marks connected with this word. In this example, the two exclamation
marks at the rear of “GaGa!!” will be removed. Moreover, we develop the heuristic process of
redundant character removal, designed for restoring words on SNS. It can be observed that
casual language style is commonly used on SNS. In particular, users often emphasize the
emotion by repeating characters in a word. This phenomenon certainly causes the problem of not
being able to correctly identify the original words. To cope with this problem, we examine each
word to find out whether there is any character consecutively appearing more than three times. If
this situation is detected, appended characters will be regarded as redundant, and only one
character will be retained. Thus, the word “soooooo” will be transformed into “so”. Meanwhile,
all upper-case letters will also be changed to lower-case letters. The following step is the
stemming process. We employ the standard Porter stemming algorithm [47] to reduce inflected
and derived words to their stem form (e.g., “loving” is turned into “love” in Fig. 3).
Subsequently, the process of n-gram terms extraction is carried out to extract terms that are used
for representing this comment. In the example of Fig. 3, n is set to 3, meaning that the comment
string will be scanned from left to right to draw out all 1- gram, 2-gram, and 3-gram terms.
Finally, stop words removal process is executed to delete terms entirely composed of stop words.
Note that as long as there is at least one non-stop word appearing in a term, this term will be
viewed as a valid one. For instance, although the word “you” is a stop word, the term “love you”
will not be deleted since “love” is a valid word. On the other hand, one term may appear more
than once in a comment. However, in general, since the length of a comment is short, a repeated
term does not always indicate more important influence. Therefore, the weights of terms are
defined to be equal.
Preliminary investigation examine project feasibility, the likelihood the system will
be useful to the organization. The main objective of the feasibility study is to test the Technical,
Operational and Economical feasibility for adding new modules and debugging old running
system. All system is feasible if they are unlimited resources and infinite time. There are aspects
in the feasibility study portion of the preliminary investigation:
Technical Feasibility
Operational Feasibility
Economical Feasibility
4.5.1 ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs.
The system is economically feasible. It does not require any addition hardware or
software. Since the interface for this system is developed using the existing resources and
technologies available at NIC, There is nominal expenditure and economical feasibility for
certain.
Proposed projects are beneficial only if they can be turned out into information system.
That will meet the organization’s operating requirements. Operational feasibility aspects of the
project are to be taken as an important part of the project implementation. Some of the important
issues raised are to test the operational feasibility of a project includes the following: -
The well-planned design would ensure the optimal utilization of the computer resources and
would help in the improvement of performance status.
4.5.3 TECHNICAL FEASIBILITY
The technical issue usually raised during the feasibility stage of the investigation includes
the following:
5.1 Introduction
A Software Requirements Specification (SRS) – a requirements specification for
a software system – is a complete description of the behavior of a system to be developed. It
includes a set of use cases that describe all the interactions the users will have with the software.
In addition to use cases, the SRS also contains non-functional requirements. Non-functional
requirements are requirements which impose constraints on the design or implementation (such
as performance engineering requirements, quality standards, or design constraints).
System requirements specification: A structured collection of information that embodies the
requirements of a system. A business analyst, sometimes titled system analyst, is responsible for
analyzing the business needs of their clients and stakeholders to help identify business problems
and propose solutions. Within the systems development life cycle domain, typically performs a
liaison function between the business side of an enterprise and the information technology
department or external service providers. Projects are subject to three sorts of requirements:
Business requirements describe in business terms what must be delivered or
accomplished to provide value.
Product requirements describe properties of a system or product (which could be one of
several ways to accomplish a set of business requirements.)
Process requirements describe activities performed by the developing organization. For
instance, process requirements could specify specific methodologies that must be
followed, and constraints that the organization must obey.
Product and process requirements are closely linked. Process requirements often specify the
activities that will be performed to satisfy a product requirement. For example, a maximum
development cost requirement (a process requirement) may be imposed to help achieve a
maximum sales price requirement (a product requirement); a requirement that the product be
maintainable (a Product requirement) often is addressed by imposing requirements to follow
particular development styles
5.2 PURPOSE
A collection of requirements define the characteristics or features of the desired system. A 'good'
list of requirements as far as possible avoids saying how the system should implement the
requirements, leaving such decisions to the system designer. Specifying how the system should
be implemented is called "implementation bias" or "solution engineering". However,
implementation constraints on the solution may validly be expressed by the future owner, for
example for required interfaces to external systems; for interoperability with other systems; and
for commonality (e.g. of user interfaces) with other owned products.
In software engineering, the same meanings of requirements apply, except that the focus of
interest is the software itself.
Receiver
Post Message
Apply Stemming
Usability
The system is designed with completely automated process hence there is no or less user
intervention.
Reliability
The system is more reliable because of the qualities that are inherited from the chosen platform
java. The code built by using java is more reliable.
Performance
This system is developing in the high level languages and using the advanced front-end and
back-end technologies it will give response to the end user on client system with in very less
time.
Supportability
The system is designed to be the cross platform supportable. The system is supported on a wide
range of hardware and any software platform, which is having JVM, built into the system.
Implementation
The system is implemented in web environment using struts framework. The apache tomcat is
used as the web server and windows xp professional is used as the platform.
Interface the user interface is based on Struts provides HTML Tag
Software Requirements:
Language : JDK (1.7.0)
Frontend : JSP, Servlets
Backend : Oracle10g
IDE : my eclipse 8.6
Operating System : windows XP
Server : tomcat
Hardware Requirements:
Processor : Pentium IV
Hard Disk : 80GB
RAM : 2GB
6. System Design
6.1 Introduction
The purpose of the design phase is to plan a solution of the problem specified by the
requirement document. This phase is the first step in moving from the problem domain to the
solution domain. In other words, starting with what is needed, design takes us toward how to
satisfy the needs. The design of a system is perhaps the most critical factor affection the quality
of the software; it has a major impact on the later phase, particularly testing, maintenance. The
output of this phase is the design document. This document is similar to a blueprint for the
solution and is used later during implementation, testing and maintenance. The design activity is
often divided into two separate phases System Design and Detailed Design.
System Design also called top-level design aims to identify the modules that should be
in the system, the specifications of these modules, and how they interact with each other to
produce the desired results. At the end of the system design all the major data structures, file
formats, output formats, and the major modules in the system and their specifications are
decided.
During, Detailed Design, the internal logic of each of the modules specified in system
design is decided. During this phase, the details of the data of a module is usually specified in a
high-level design description language, which is independent of the target language in which the
software will eventually be implemented.
In system design the focus is on identifying the modules, where as during detailed design
the focus is on designing the logic for each of the modules. In other works, in system design the
attention is on what components are needed, while in detailed design how the components can be
implemented in software is the issue.
Object model in UML is represented with class diagrams , describing the structure of the system
in terms of objects , attributes , associations and operations.
Dynamic model in UML is represented with sequence diagrams , start chart diagrams and
activity diagrams describing the internal behaviour of the system.
6.3 Scenarios
A Use Case is an abstraction that all describes all possible scenarios involving the described
functionality . A scenario is an instance of a use case describing a concrete set of actions.
The name of the scenario enables us to refer it ambiguously. The name of
scenario is underlined to indicate it is an instance.
The Participating actor instance field indicates which actor instance are
involved in this scenario. Actor instance also have underlined names.
The Flow of Events of scenario describe the sequence of events step by step.
Actors
Actors represent external entities that interact with the system. An actor can be human or external
system.
Actor are not part of the system. They represent anyone or anything that interact with the system.
An Actor may
Only input information to the system.
Only receive information from the system.
Input and receive information from to and from the system.
During this activity , developers indentify the actors involved in this system are:
User:
User is an actor who uses the system and who performs the operations like data classifications
and execution performance that are required for him.
Use Cases:
Use cases are used during requirements elicitation and analysis to represent the functionality of
the system. Use case focus on the behaviour of the system from an external point of view. The
identification of actors and use cases results in the definition of the boundary of the system ,
which is , in differentiating the tasks accomplished by the system and the tasks accomplished by
its environment. The actors are outside the boundary of the system , where as the use cases are
inside the boundary of the system.
User :
Registraiton
login
apply stemming
User
display N-gram data
clustering formation
enter value
logout
Fig : User Use diagram
The class diagram is used to refine the use cases diagrams and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an "is-a" or
"has-a" relationship. Each class in the class diagram may be capable of providing certain
functionalities. These functionalities provided by the class are termed "methods" of the classes.
Apart from this , each class may have certain "attributes" that uniquely indentify the class. In the
class diagram these classes are represented with boxes which contain three parts..
Class Diagram
UssrDAO
LoginDAO
+addFriend()
+addServices() +loginAction()
+changePwd() +destroy()
+checkUserId() +doGet()
+getCal() +doPost()
+rejectFriend()
+logout()
AdminBeanDAO MailDTO
MailDelete
+adminid +attachcount
+adminpwd +attachmentfile +usi
+admintype +bword
+cpword +deleteMails()
+city +insertAttachment()
+firstName +em
+empid +mailContacts()
+lastName +sendMail()
+getAdminId() +getAttachcount() +viewMail()
+getcity() +getAttachmentfile() +viewComment()
+getlastName() +getlogin()
+gefirstName() +getMessage()
Sender Apply NLP Classification DispRedundant data ApplyStemmingdata N-gram Clustering Top K Cluster Receiver
3 : post msg()
7 : apply N-gram()
9 : clusterFormation()
10 : select threshold()
11 : Mergedcluster()
13 : select value()
Top K Cluster
11 : Mergedcluster()
10 : select threshold()
N-gram
2 : Add new fried and accept fwd req() 8 : display N-gram data()
7 : apply N-gram()
1 : Forward friend request()
ApplyStemmingdata
user
login
send
apply Stremming
N-gram
clustering
Receiver
Active Diagram:
User
login fail
UserHome
logout
Merged cluster
Server apply NLP clssification
apply N-gram
Deployment Diagram:
System
Forward or post msg
merger cluster
apply stremming
diplayreduntant data
automated including the process, stores of data, and delays in the system. Data Flow Diagrams
are the central tool and the basis from which other components are developed. The
transformation of data from input to output, through processes, may be described logically and
independently of the physical components associated with the system. The DFD is also know as
DFDs are the model of the proposed system. They clearly should show the requirements on
which the new system should be built. Later during design activity this is taken as the basis for
drawing the system’s structure charts. The Basic Notation used to create a DFD’s are as follows:
2. Process: People, procedures, or devices that use or produce (Transform) Data. The physical
Login DFD
Context level 2:
Context level 3 Diagram:
ADMIN_CONTACTS
ADMIN_PERSONAL
HOME_POTOS
HTMLDB_PLAN_TABLE
INBOX_MAILS
INBOX_MAIL_ATTACHMENT
LOGIN_DETAILS
OUTBOX_MAILS
OUTBOX_MAIL_ATTACHMENT
USER_COMMENTS
USER_CONTACTS
7. Implementation
7.1 Introduction
Implementation is the stage where the theoretical design is turned in to working system.
The most crucial stage is achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.
The system can be implemented only after through testing is done and if it found to work
according to the specification. It involves careful planning, investigation of the current system
and its constraints on implementation, design of methods to achieve the change over and an
evaluation of change over methods a part from planning. Two major tasks of preparing the
implementation are education and training of the users and testing of the system.
The more complex the system being implemented, the more involved will be the systems
analysis and design effort required just for implementation. The implementation phase comprises
of several activities. The required hardware and software acquisition is carried out. The System
may require some hardware and software acquisition is carried out. The system may require
some software to be developed. For this, programs are written and tested. The user then changes
over to his new fully tested system and the old system is discontinued.
Implementation is the process of having systems personnel check out and put new
equipment in to use, train users, install the new application, and construct any files of data
needed to it.
Depending on the size of the organization that will be involved in using the application
and the risk associated with its use, system developers may choose to test the operation in only
one area of the firm, say in one department or with only one or two persons. Sometimes they
will run the old and new systems together to compare the results. In still other situations,
developers will stop using the old system one-day and begin using the new one the next. As we
will see, each implementation strategy has its merits, depending on the business situation in
which it is considered. Regardless of the implementation strategy used, developers strive to
ensure that the system’s initial use in trouble-free.
7.1 Technology Description
The following Java technology lets developers, designers, and business partners develop and
deliver a consistent user experience, with one environment for applications on mobile and
embedded devices. Java meshes the power of a rich stack with the ability to deliver customized
experiences across such devices.
Java APIs are libraries of compiled code that you can use in your programs. They let you add
ready-made and customizable functionality to save you programming time.
Java programs are run (or interpreted) by another program called the Java Virtual Machine.
Rather than running directly on the native operating system, the program is interpreted by the
Java VM for the native operating system. This means that any computer system with the Java
VM installed can run Java programs regardless of the computer system on which the applications
were originally developed.
In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris TM Operating System (Solaris OS),
Linux, or Mac OS.
The Java programming language is a high-level language that can be characterized by all of the
following buzzwords:
Multithreaded Robust
Dynamic Secure
Each of the preceding buzzwords is explained in The Java Language Environment , a white
paper written by James Gosling and Henry McGilton.
In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine 1 (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
An overview of the software development process.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris™ Operating System (Solaris OS), Linux,
or Mac OS. Some virtual machines, such as the Java HotSpot virtual machine, perform
additional steps at runtime to give your application a performance boost. This include various
tasks such as finding performance bottlenecks and recompiling (to native code) frequently used
sections of code
Through the Java VM, the same application is capable of running on multiple platforms.
Portable
Servlets are written in the Java programming language and follow a standard API. Servlets are
supported directly or by a plug-in on virtually every major Web server. Consequently, servlets
written for, say, Macromedia Run can run virtually unchanged on Apache Tomcat, Microsoft
Internet Information Server (with a separate plug-in), IBM Web Sphere, planet Enterprise Server,
Oracle9i AS, or Star Nine Webster. They are part of the Java 2 Platform, Enterprise Edition
(J2EE; see http://java.sun.com/j2ee/), so industry support for servlets is becoming even more
pervasive.
Inexpensive
A number of free or very inexpensive Web servers are good for development use or deployment
of low- or medium-volume Web sites. Thus, with servlets and JSP you can start with a free or
inexpensive server and migrate to more expensive servers with high-performance capabilities or
advanced administration utilities only after your project meets initial success. This is in contrast
to many of the other CGI alternatives, which require a significant initial investment for the
purchase of a proprietary package. Price and portability are somewhat connected. For example,
Marty tries to keep track of the countries of readers that send him questions by email. India was
near the top of the list, probably #2 behind the U.S. Marty also taught one of his JSP and servlet
training courses (see http://courses.coreservlets.com/) in Manila, and there was great interest in
servlet and JSP technology there. Now, why are India and the Philippines both so interested? We
surmise that the answer is twofold. First, both countries have large pools of well-educated
software developers.
Secure
One of the main sources of vulnerabilities in traditional CGI stems from the fact that the
programs are often executed by general-purpose operating system shells. So, the CGI
programmer must be careful to filter out characters such as backquotes and semicolons that are
treated specially by the shell. Implementing this precaution is harder than one might think, and
weaknesses stemming from this problem are constantly being uncovered in widely used CGI
libraries. A second source of problems is the fact that some CGI programs are processed by
languages that do not automatically check array or string bounds. For example, in C and C++ it
is perfectly legal to allocate a 100-element array and then write into the 999th “element,” which
is really some random part of program memory. So, programmers who forget to perform this
check open up their system to deliberate or accidental buffer overflow attacks. Servlets suffer
from neither of these problems. Even if a servlet executes a system call (e.g., with Runtime. Exec
or JNI) to invoke a program on the local operating system, it does not use a shell to do so. And,
of course, array bounds checking and other memory protection features are a central part of the
Java programming language.
Mainstream
There are a lot of good technologies out there. But if vendors don’t support them and developers
don’t know how to use them, what good are they? Servlet and JSP technology is supported by
servers from Apache, Oracle, IBM, Sybase, BEA, Macromedia, Caucho, Sun/planet, New
Atlanta, ATG, Fujitsu, Ultras, Silver stream, the World Wide Web Consortium (W3C), and many
others. Several low-cost plugins add support to Microsoft IIS and Zeus as well. They run on
Windows, Unix/Linux, Maces, VMS, and IBM mainframe operating systems. They are the single
most popular application of the Java programming language. They are arguably the most popular
choice for developing medium to large Web applications. They are used by the airline
industry (most United Airlines and Delta Airlines Web sites), e-commerce (ofoto.com), online
banking (First USA Bank, Blanco Popular de Puerto Rico), Web search engines/portals
(excite.com), large financial sites (American Century Investments), and hundreds of other sites
that you visit every day. Of course, popularity alone is no proof of good technology. Numerous
counter-examples abound. But our point is that you are not experimenting with a
new and unproven technology when you work with server-side Java.
UNUNI
UNIT TESTING
MODULE
Component
SUB-SYSTEM
SYSTEM TESTING
Integration Testing
User Testing
ACCEPTANCE
Testing is the process of finding difference between the expected behavior specified by system
models and the observed behavior of the implemented system.
Unit Testing
Unit testing focuses on the building blocks of the software system, that is, objects and sub system
. There are three motivations behind focusing on components. First, unit testing reduces the
complexity of the overall tests activities, allowing us to focus on smaller units of the system.
Second , unit testing makes it easier to pinpoint and correct faults given that few components are
involved in this test . Third , Unit testing allows parallelism in the testing activities , that is each
component can be tested independently of one another . Hence the goal is to test the internal
logic of the module.
Integration Testing
In the integration testing, many test modules are combined into sub systems , which are then
tested . The goal here is to see if the modules can be integrated properly, the emphasis being on
testing module interaction.
After structural testing and functional testing we get error free modules. These modules are to be
integrated to get the required results of the system. After checking a module, another module is
tested and is integrated with the previous module. After the integration, the test cases are
generated and the results are tested.
System Testing
In system testing the entire software is tested . The reference document for this process is the
requirement document and the goal is to see whether the software meets its requirements. The
system was tested for various test cases with various inputs.
Acceptance Testing
Acceptance testing is sometimes performed with realistic data of the client to demonstrate that
the software is working satisfactory. Testing here focus on the external behavior of the system ,
the internal logic of the program is not emphasized . In acceptance testing the system is tested for
various inputs.
8.3 Types of Testing
1. Black box or functional testing
2. White box testing or structural testing
As shown in the following figure of Black box testing , we are not thinking of the internal
workings , just we think about
What is the output to our system?
What is the output for given input to our system?
?
structural is not to exercise all the inputs or outputs but to exercise the different programming
and data structure used in the program. Thus structural testing aims to achieve test cases that will
force the desire coverage of different structures . Two types of path testing are statement testing
coverage and branch testing coverage.
Input Output
INTERNAL
WORKING
The White Box testing strategy , the internal workings
Test Data:
Here all test cases that are used for the system testing are specified. The goal is to test the
different functional requirements specified in Software Requirements Specifications (SRS)
document.
Unit Testing:
Each individual module has been tested against the requirement with some test data.
Test Report:
The module is working properly provided the user has to enter information. All data entry forms
have tested with specified test cases and all data entry forms are working properly.
Error Report:
If the user does not enter data in specified order then the user will be prompted with error
messages. Error handling was done to handle the expected and unexpected errors.
A Test case is a set of input data and expected results that exercises a component
with the purpose of causing failure and detecting faults . test case is an explicit set of instructions
designed to detect a particular class of defect in a software system , by bringing about a failure .
A Test case can give rise to many tests.
TEST CASES:
Test cases can be divided in to two types. First one is Positive test cases and second one is
negative test cases. In positive test cases are conducted by the developer intention is to get the
output. In negative test cases are conducted by the developer intention is to don’t get the output.
1 Create the new user Enter the Personal info and False
registration process personal info and address info its
address info. not update into
database
successfully.
9.1 Conclusion
Cloud services are experiencing rapid development and the services based on multi-cloud also
become prevailing. One of the most concerns, when moving services into clouds, is capital
expenditure. So, in this paper, we design a novel storage scheme CHARM, which guides
customers to distribute data among clouds cost-effectively. CHARM makes fine-grained
decisions about which storage mode to use and which clouds to place data in. The evaluation
proves the efficiency of CHARM.
REFERENCES
[1] P. Mika and G. Tummarello, “Web semantics in the clouds,” IEEE Intell. Syst., vol. 23,
no. 5, pp. [2] T. Preis, H. S. Moat, and E. H. Stanley, “Quantifying trading behavior in
financial markets using Google trends,” Sci. Rep., vol. 3, p. 1684, 2013.
[3] H. Choi and H. Varian, “Predicting the present with Google trends,” Econ. Rec., vol. 88,
no. s1, pp. 2–9, 2012.
[4] C.-T. Ho, R. Agrawal, N. Megiddo, and R. Srikant,, “Range queries in OLAP data
cubes,” ACM SIGMOD Rec., vol. 26, no. 2, pp. 73–88, 1997.
[5] G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data:
Twitter’s real-time related query suggestion architecture,” in Proc. ACM SIGMOD Int.
Conf. Manage. Data, 2013, pp. 1147–1158.
[6] W. Liang, H. Wang, and M. E. Orlowska, “Range queries in dynamic OLAP data cubes,”
Data Knowl. Eng., vol. 34, no. 1, pp. 21–38, Jul. 2000.
[7] J. M. Hellerstein, P. J. Haas, and H. J. Wang, “Online aggregation,” ACM SIGMOD
Rec., vol. 26, no. 2, 1997, pp. 171–182.
[9] E. Zeitler and T. Risch, “Massive scale-out of expensive continu- ous queries,” Proc.
VLDB Endowment, vol. 4, no. 11, pp. 1181–1188, 2011.
[10] N. Pansare, V. Borkar, C. Jermaine, and T. Condie, “Online aggre- gation for large
MapReduce jobs,” Proc. VLDB Endowment, vol. 4, no. 11, pp. 1135–1145, 2011.
[12] Y. Shi, X. Meng, F. Wang, and Y. Gan, “You can stop early with cola: Online processing
of aggregate queries in the cloud,” in Proc. 21st ACM Int. Conf. Inf. Know. Manage.,
2012, pp. 1223–1232. [13] K. Bilal, M. Manzano, S. Khan, E. Calle, K. Li, and A.
Zomaya, “On the characterization of the structural robustness of data center
Appendix
Sample code:
import com.socialnet.bean.UserBean;
import com.socialnet.conections.AbstractDataAccessObject;
import com.socialnet.util.DateWrapper;
public UserDAO() {
con = getConnection();
try {
con.setAutoCommit(false);
} catch (Exception e) {
System.out.println(e);
}
}
try {
cstmt = con
.prepareCall("{call
GetUserContacts(?,?,?,?,?,?,?,?,?,?,?,?,?)}");
cstmt.setString(1, uname);
cstmt.registerOutParameter(2, Types.VARCHAR);
cstmt.registerOutParameter(3, Types.VARCHAR);
cstmt.registerOutParameter(4, Types.VARCHAR);
cstmt.registerOutParameter(5, Types.VARCHAR);
cstmt.registerOutParameter(6, Types.VARCHAR);
cstmt.registerOutParameter(7, Types.VARCHAR);
cstmt.registerOutParameter(8, Types.VARCHAR);
cstmt.registerOutParameter(9, Types.VARCHAR);
cstmt.registerOutParameter(10, Types.VARCHAR);
cstmt.registerOutParameter(11, Types.VARCHAR);
cstmt.registerOutParameter(12, Types.VARCHAR);
cstmt.registerOutParameter(13, Types.DATE);
cstmt.execute();
ubean.setFirstname(cstmt.getString(2));
ubean.setMidlename(cstmt.getString(3));
ubean.setLastname(cstmt.getString(4));
ubean.setMobile(cstmt.getString(5));
ubean.setCity(cstmt.getString(6));
ubean.setState(cstmt.getString(7));
ubean.setCountry(cstmt.getString(8));
ubean.setPin(cstmt.getString(9));
ubean.setMail(cstmt.getString(10));
ubean.setSex(cstmt.getString(11));
ubean.setVillage(cstmt.getString(12));
ubean.setDob(cstmt.getString(13));
} catch (Exception e) {
System.out
.println("............exception rised in getusecontacts
fun:");
e.printStackTrace();
} finally {
try {
con.close();
cstmt.close();
} catch (Exception e) {
e.printStackTrace();
}
}
System.out.println(ubean);
return ubean;
}
try {
cstmt = con
.prepareCall("{call
UpdateUserContacts(?,?,?,?,?,?,?,?,?,?,?,?,?,?)}");
cstmt.setString(1, ubean.getUserid());
cstmt.setString(2, ubean.getFirstname());
cstmt.setString(3, ubean.getMidlename());
cstmt.setString(4, ubean.getLastname());
cstmt.setString(5, ubean.getMobile());
cstmt.setString(6, ubean.getCity());
cstmt.setString(7, ubean.getState());
cstmt.setString(8, ubean.getCountry());
cstmt.setString(9, ubean.getPin());
cstmt.setString(10, ubean.getMail());
cstmt.setString(11, ubean.getSex());
cstmt.setString(12, ubean.getVillage());
cstmt.setString(13, DateWrapper.parseDate(ubean.getDob()));
cstmt.registerOutParameter(14, Types.VARCHAR);
cstmt.execute();
f = cstmt.getString(14);
if (f.equalsIgnoreCase("true")) {
flag = true;
con.commit();
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
con.close();
cstmt.close();
} catch (Exception e) {
e.printStackTrace();
}
}
return flag;
}
1. Screen Shots:
LoginHome:
Login Page:
Fig: Registration
Fig : Add Friend
Fig : write welcome message
Login New User