Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Conf Fair

Download as pdf or txt
Download as pdf or txt
You are on page 1of 165

Various Fractal Mechanisms in Image Processing-A Survey

R.Siva, Asso.Prof, R.Vidhya, AP, S.Venkatesan AP


Dept of CSE, Dept of MCA, Dept. of CSE
KCG College of Technology, KCG College of Technology,Gojan School of Business &Tech
Chennai. Chennai. Chennai
sivavb6@yahoo.co.in vidhyars@yahoo.com Selvamvenkatesan@gmail.com

Abstract-Fractal technique is an old but a
very power full in image processing. In
image processing various fields are there
like image signal processing, image
compression, image security, image
segmentation, image extraction, image
motion etc. Hence I am going to implement
the fractal techniques for image
segmentation. This technique some key
concepts are there using that technique I am
going to segment the image.
I. INTRODUCTION TO FRACTAL
In the most generalized terms, a fractal
demonstrates a limit. Fractals model
complex physical processes and dynamical
systems. The underlying principle of fractals
is that a simple process that goes through
infinitely much iteration becomes a very
complex process. Fractals attempt to model
the complex process by searching for the
simple process underneath. Most fractals
operate on the principle of a feedback loop.
A simple operation is carried out on a piece
of data and then fed back in again. This
process is repeated infinitely many times.
The limit of the process
Produced is the fractal. Almost all fractals
are at least partially self-similar [2]. This
means that a part of the fractal is identical to
the entire fractal itself except smaller.
Fractals can look very complicated. Yet,
usually they are very simple processes that
produce complicated results. And this
property transfers over to Chaos Theory. If
something has complicated results, it does
not necessarily mean that it had a
complicated input. Chaos may have crept in
producing complicated results. Fractal
Dimensions are used to measure the
complexity of objects. We now have ways
of measuring things that were traditionally
meaningless or impossible to measure.
Finally, Fractal research is a fairly new field
of interest. We can now generate and decode
fractals with graphical representations. One
of the hot areas of research today seems to
be Fractal Image Compression. Many web
sites devote themselves to discussions of it.
The main disadvantage with Fractal Image
Compression and Fractals in general is the
computational power needed to encode and
at times decode them. As personal
computers become faster, we may begin to
see mainstream programs that will fractally
compress images.
II. SELF-SIMILARITY
In mathematics, a self-similar object is
exactly or approximately similar to a part of
itself. Many objects in the real world, such
as coastlines, are statistically self-similar:
parts of them show the same statistical
properties at many scales [1]. Self-similarity
is a typical property of fractals. Scale
invariance is an exact form of self-similarity
where at any magnification there is a smaller
piece of the object that is similar to the
whole. For instance, a side of the Koch
snowflake is both symmetrical and scale-
invariant; it can be continually magnified 3x
without changing shape.
Page 1 of 165
Definition
A compact topological space X is self-
similar if there exists a finite set S indexing
a set of non-surjective homeomorphisms
for which
(1)
If , we call X self-similar if it is
the only non-empty subset of Y such that the
equation above holds for . We call
(2)
a self-similar structure.
On Networks
The homeomorphisms may be iterated,
resulting in an iterated function system. The
composition of functions creates the
algebraic structure of a monoid. When the
set S has only two elements, the monoid is
known as the dyadic monoid. The dyadic
monoid can be visualized as an infinite
binary tree; more generally, if the set S has p
elements, then the monoid may be
represented as a p-adic tree [3]. The
automorphisms of the dyadic monoid is the
modular group; the automorphisms can be
pictured as hyperbolic rotations of the binary
tree. Self-similarity has important
consequences for the design of computer
networks, as typical network traffic has self-
similar properties. For example, in teletraffic
engineering, packet switched data traffic
patterns seem to be statistically self-similar.
This property means that simple models
using a Poisson distribution are inaccurate,
and networks designed without taking self-
similarity into account are likely to function
in unexpected ways. Similarly, stock market
movements are described as displaying self-
affinity, i.e. they appear self-similar when
transformed via an appropriate affine
transformation for the level of detail being
shown.
On Image Processing
Some very natural self-similar objects are
plants. The image on the right is a self
similar, albeit mathematically generated.
True ferns, however, will be extremely close
to true self similarity. Other plants, such as
Romanesco broccoli, are extremely self-
similar.
III. FRACTAL DIMENSION
We know the dimension of a line, a square,
and a cube. They are one, two, and three
respectively. And, we can measure the
distance, area, and volume of those objects
as well. However, what is the dimension of
the inside of a kidney or the brain, and how
do we measure their surface area? How
about a piece of brocolli or cauliflower?
This is where fractal dimension can help us
out. Fractal Dimension allows us to measure
the complexity of an object. The classic
example of this is trying to measure the
coastline of Great Britain. In actuality, it is
impossible to precisely measure the length
of the coastline. The tide is always coming
in or going out which means that the
coastline itself is constantly changing [4].
Therefore, any ordinary measurement is
meaningless. Fractal Dimension allows us to
measure the degree of complexity by
evaluating how fast our measurements
increase or decrease as our scale becomes
larger or smaller. We will discuss two types
of fractal dimension: self-similarity
dimension and box-counting dimension.
There are many different kinds of
dimension. It is important to note that not all
types of dimension measurement will give
the same answer to a single problem.
However, our dimension measurements will
give the same answer. Before explaining
dimension measuring algorithms, we must
explain how a power law works. Essentially,
data behave with a power law relationship if
they fit the following equation:
y=c*x^d, (3)
Page 2 of 165
Where c is a constant.
One way to determine if data fit a power law
relationship is to plot the log(y) versus the
log(x). If the plot is a straight line, then it is
a power law relationship with slope d.It
turns out that the methods we are going to
discuss for measuring fractal dimension rely
heavily on the power law.
Self-Similarity Dimension
To Measure the Self-Similar
Dimension, the picture must be self-
similar. The power law holds and in
this case is
a = 1/(s^D)
where a is the number of pieces, s is
the reduction factor, and D is the
self-similar dimension measure.
Box-Counting Dimension
To calculate the box-counting
dimension, we need to place the
picture on a grid. The x-axis of the
grid is s where s=1/(width of the
grid). For example, if the grid is 240
blocks tall by 120 blocks wide,
s=1/120. Then, count the number of
blocks that the picture touches.
Label this number N(s). Now, resize
the grid and repeat the process. Plot
the values found on a graph where
the x-axis is the log(s) and the y-
axis is the log(N(s)). Draw in the
line of best fit and find the slope.
The box-counting dimension
measure is equal to the slope of that
line.
The Box-counting dimension is
much more widely used than the
self-similarity dimension since the
box-counting dimension can
measure pictures that are not self-
similar.
IV. ITERATED FUNCTION
SYSTEMS
The Iterated Function Systems is most
popular way to build fractals. Iterated
Function Systems are discussed and
pictorially demonstrated in detail.
This sort of process forms the basis of how
IFS works.
The basics of IFS
An Iterated Function System is comprised of
a set of transformations, w1,w2...wn. Each
of these transformations can be just about
any normal affine transformation. The only
restriction is that they must be a contracting
transformation. Meaning that if you apply
the transformation is it moves two points
closer together. The transformations can be
written in matrix notation as:
| X | | a b | | x | | e |
W | | = | | | | + | |
| Y | | c d | | y | | f |
Each of those transformation w1,w2...wn
each has a receptive value p1,p2...pn that
represents the probability that a particular
transformation is chosen. The sum of
p1,p2...pn must be equal to 1.
Now it is a simple matter to just plug and
chug - Pick any initial point as your starting
value to begin the feedback loop. Depending
on a random number, pick your next
transformation, calculate the next value and
repeat as long as you want [4]. For each new
point you calculate place a pixel on your
display at that location. Instead of just using
a single pixel, you could instead draw a
polygon that has undergone that same
transformation. This process can also very
Page 3 of 165
easily extend into three dimensions by using
a 3D transformation instead of a 2D one.
Generating a particular picture with IFS
The Fern IFS, originally found by Michael
Barnsley, is a well known and simple IFS
that generates the picture of a Black
Spleenwort fern.
The Fern is created using the following 4
affine transformation:
Given the following affine
transformation:
| x | | r*cos(a) -s*sin(b) | | x | | h |
w1 | | = | | | | + | |
| y | | r*sin(a) s*cos(b) | | y | | k |
Finding the IFS to fit your image
So far we have looked at some interesting
ways to generate some fractal images but
what do we do if we have an image of a
particular plant, or cload in mind and want
to find the IFS to generate an approximation
of it? This is actually not as difficult as it
might at first seem. Using the following
algorithm one can generate an IFS that can
produce your desired image.
The idea is to take the image you are
looking to create an IFS for, say a leaf. Take
the image of the leaf, and scale it down,
rotate and translate it so that this smaller
version of the image fits in its entirety
within the larger version [5]. This
transformation you just perform becomes
the first of your set of transformation, w1.
Now again take the original image, scale,
rotate and translate it so that fits in a subset
of the original that wasn't covered by your
previous transform. It is ok if they overlap
however minimizing the overlap is better.
Repeat this process till the entire original
image is made up of smaller copies of itself.
The set of transformation that you use
becomes the set of transforms for your IFS.
V. CHAOS AND ITERATED
FUNCTION SYSTEMS
This chaos and IFS can produce fractals
with much less computing time and better
quality. As we have seen from the section on
Iterated Function Systems, the Barnsley
Fern is impossible to generate by drawing
triangles. There are too many triangles to
draw and too many transformations to be
applied [6]. There must be another way.
Instead of using and transforming triangles,
let us transform points. It is computationally
much easier to draw and transform a point
than a triangle.
A more algorithmic approach to this would
be
1. Pick the total number of iterations to
perform and call it i.
2. Set the total number of
transformations possible to n.
3. Label each transformation an integer
from 1 to n.
4. Pick a random point and call it a.
5. Generate a random integer from 1 to
n.
6. Apply the transition labeled by that
random number to a, generating a
new point a.
7. Plot a.
8. Go to step #5, repeating i times.
Page 4 of 165
VI. CONCLUSION
This paper is analysis various fractals
techniques of an image segmentation. From
the experimental result we can find the some
combination of techniques. Those
techniques are chaos and IFS. The chaos and
IFS can produce fractals with much less
computing time and better quality.
REFERENCES
[1] Mandelbrot B B. The Fractal Geometry of
Nature. Freeman, San Francisco, 1982.
[2] Baznsley M F, Sloan A D. A better way to
compress images. BYTE Magazine, 1988, 13(1):
215-223.
[3] Jacquin A E. A fractal theory of iterated
Markov operators with applications to digital
image
coding. Ph.D. Dissertation, Georgia Institute of
Technology, Aug. 1989.
[4] Jacquin A E. Image coding based on a fractal
theory of iterated contractive image
transformations.
IEEE Trans. on Image Processing, 1992, 1(1):
18-30.
[5] Jacobs E W, Fisher Y, Boss R D. Image
compression: A study of the iterated transform
method. Signal Processing, 1992, 29(3): 251-
263.
[6] Fang Yudong, Yu Yinglin. A quick fractal
image compression coding method. (in Chinese)
Acta Electronica Sinica, 1996, 24(1): 29-34.
[7] Wang Zhou, Yu Yinglin. A novel fractal
image coding approach. (in Chinese) Journal of
China
Institute of Communications, 1996, 17(3): 84-90.
Page 5 of 165
Design of A highly developed Communication Architecture for Hybrid Peer-to-
Peer Botnet
S.Aravindh S.Michael
M.Tech, II Year Asst.Professor
Dept.of Computer Science&Engg. Dept.of Computer Science &Engg.
Bharath University Bharath University
Chennai Chennai
Abstract: A botnet consists of a network of
compromised computers controlled by an
attacker (botmaster). Recently botnets have
become the root cause of many Internet attacks.
To be well prepared for future attacks, it is not
enough to study how to detect and defend
against the botnets that have appeared in the past.
More importantly, we should study advanced
botnet designs that could be developed by
botmasters in the near future. In this paper, we
present the design of an advanced hybrid peer-
to-peer botnet. Compared with current botnets,
the proposed botnet is harder to be shut down,
monitored, and hijacked. It provides robust
network connectivity, individualized encryption
and control traffic dispersion, limited botnet
exposure by each bot, and easy monitoring and
recovery by its botmaster. Possible defenses
against this advanced botnet are suggested.
I. INTRODUCTION
In the last several years, Internet malware
attacks have evolved into better organized and
more profit-centered endeavors. Email spam,
extortion through denial-of-service attacks [1],
and click fraud [2] represent a few examples of
this emerging trend. Botnets are a root cause of
these problems [3], [4], [5]. A botnet consists
of a network of compromised computers (bots)
connected to the Internet that is controlled by a
remote attacker (botmaster) [6], [5]. Since a
botmaster could scatter attack tasks over
hundreds or even tens of thousands of computers
distributed across the Internet, the enormous
cumulative bandwidth and large number of attack
sources make botnet-based attacks extremely
dangerous and hard to defend against. Compared
to other Internet malware, the unique feature of a
botnet lies in its control communication network.
Most botnets that have appeared until now have
had a common centralized architecture. That is,
bots in the botnet connect directly to some
special hosts (called command-and-control
servers, or C&C servers). These C&C servers
receive commands from their botmaster and
forward them to the other bots in the network.
From now on we will call a botnet with such
control communication architecture a C&C
botnet. Fig. 1 shows the basic control
communication architecture for a typical C&C
botnet (in reality, a C&C botnet usually has more
than two C&C servers). Arrows represent the
directions of network connections .As botnet-
based attack s become popular and dangerous,
security researchers have studied how to detect,
monitor, and defend against them [3], [6], [1],
[4], [7], [5]. Most of the current research has
focused upon the C&C botnets that have
appeared in the past, especially Internet Relay
Chat (IRC) based botnets. It is necessary to
conduct such research in order to deal with the
threat we are facing today. However, it is
equally important to conduct research on
advanced botnet designs that could be developed
by attackers in the near future. Otherwise, we
will remain susceptible to the next generation of
internet malware attacks. From a botmasters
perspective, the C&C servers are the fundamental
weak points in current botnet architectures. First,
a botmaster will lose control of his or her botnet
once the limited number of C&C servers are shut
down by defenders. Second, defenders could
easily obtain the identities (e.g., IP addresses) of
all C&C servers based on their service traffic to
a large number of bots [7], or simply from one
single captured bot (which contains the list of
C&C servers). Third, an entire botnet may be
exposed once a C&C server in the botnet is
hijacked or captured by defenders [4]. As
network security practitioners put more resources
Page 6 of 165
and effort into defending against botnet attacks,
hackers will develop and deploy the next
generation of botnets with different control
architecture.
A. Current P2P Botnets and Their
Weaknesses
Considering the above weaknesses inherent to
the centralized architecture of current C&C
botnets, it is a natural strategy for botmasters to
design a peer-to-peer (P2P) control mechanism
into their botnets. In the last several years,
botnets such as Slapper [8], Sinit [9], Phatbot
[10] and Nugache [11] have implemented
different kinds of P2P control architectures. They
have shown several advanced designs. For
example, in order to remove the bootstrap
process which is easily exploited by defenders to
shut down a botnet, the Slapper worm builds a
list of known bots for each infected computer
during propagation [8]. Sinit likewise lacks a
bootstrap process and uses public key
cryptography for update authentication [9].
Nugache attempts to thwart detection by
implementing an encrypted/obfuscated control
channel [11].Nevertheless, simply migrating
available P2P protocols will not generate a sound
botnet, and the P2P designs in those botnets
appeared before are not mature and have many
weak- nesses. A Sinit bot uses random probing
to find other Sinit bots to communicate with.
This results in poor connectivity for the
constructed botnet and easy detection due to the
extensive probing traffic [9]. Phatbot utilizes
Gnutella cache servers for its bootstrap process.
This also makes the botnet easy to shut down. In
addition, its underlying WASTE peer-to- peer
protocol is not scalable across a large network
[10]. Nugaches weakness lies in its reliance on
a seed list of 22IP addresses during its bootstrap
process [11]. Slapper fails to
Fig. 1. Command and control architecture of a
C&C botnet
Fig. 2.Command and control architecture of the
proposed hybrid P2P botnet
Implement encryption and command
authentication enabling it to be easily hijacked by
others. In addition, its list of known bots contains
all (or almost all) members of the botnet. Thus,
one single captured bot would expose the entire
botnet to defenders [8]. Furthermore, its
complicated communication mechanism
generates lot traffic, rendering it susceptible to
monitoring via network flow analysis.
Some other available robust distributed systems
include censorship-resistant system and
anonymous P2P system. However, their design
goal of robustness is different from a botnet. For
example, these robust distributed systems try to
hide the source node of a message within a crowd
of nodes. However, they do not bother to hide the
identities of this crowd. On the other hand, a
botnet needs to try it best to hide IP addresses
of all bots in it.
Page 7 of 165
B. Proposed Hybrid
P2P Botnet
Considering the problems encountered by C&C
botnets and previous P2P botnets, the design of
an advanced botnet, from our understanding,
should consider the following practical
challenges faced by botmasters: (1). How to
generate a robust botnet capable of maintaining
control of its remaining bots even after a
substantial portion of the botnet population has
been removed by defenders? (2). How to prevent
significant exposure of the network topology
when some bots are captured by defenders? (3).
How to easily monitor and obtain the complete
information of a botnet by its botmaster? (4).
How to prevent (or make it harder) defenders
from detecting bots via their communication
traffic patterns? In addition, the design should
also consider many network related issues such
as dynamic or private IP addresses and the
diurnal online/offline property of bots [4].By
considering all the challenges listed above, in this
paper, we present our research on the possible
design of an advanced hybrid P2P botnet. The
proposed hybrid P2P botnet has the following
features:
The botnet requires no bootstrap procedure.
The botnet communicates via the peer list
contained in each bot. However, unlike Slapper
[8], each bot has a fixed and limited size peer
list and does not reveal its peer list to other bots.
In this way, when a bot is captured by defenders,
only the limited number of bots in its peer list are
exposed.
A botmaster could easily monitor the entire
botnet by issuing a report command. This
command instructs all (or partial) bots to report to
a specific compromised machine (which is called
a sensor host) that is controlled by the botmaster.
The IP address of the sensor host, which is
specified in the report command, will change
every time a report command is issued to prevent
defenders from capturing or blocking the sensor
host beforehand.
After collecting information about the
botnet through the above report command, a
botmaster, if she thinks necessary, could issue
an update command to actively let all bots
contact a sensor host to update their peer lists.
This effectively reorganizes the botnet such
that it has a balanced and robust connectivity,
and/or reconnects a broken botnet.
Only bots with static global IP addresses
that are accessible from the Internet are
candidates for being in peer lists (they are
called servent bots according to P2P
terminologies [12] since they behave with
both client and server features). This design
ensures that the peer list in each bot has a
long lifetime.
Each servent bot listens on a self-
determined service port for incoming
connections from other bots and uses a
self-generated symmetric encryption key
for incoming traffic. This individualized
encryption and individualized service port
design makes it very hard for the botnet to
be detected through network flow analysis of
the botnet communication traffic.
C. Paper Organization
The rest of the paper is organized as follows.
Section II introduces related studies. Section III
introduces the control communication
architecture of the proposed botnet. Section IV
discusses the designs to ensure the authentication
and security of command communication. In
Section V, we present how a botmaster is able to
monitor his or her botnet easily. We present how
to construct the proposed botnet in Section VI
and study its robustness against defense in
Section VII. We present possible defenses
against the botnet in Section VIII. We give a
few discussions in Section IX and finally
conclude the paper in Section X.
II. RELATED WORK
Botnets are an active research topic in recent years.
In 2003, Puri [13] presented an overview of bots
and botnets, and McCarty [14] discussed how to
use a honeynet to monitor botnets. Arce and Levy
presented a good analysis of how the Slapper
worm built its P2P botnet. Barford and
Yegneswaran [15] gave a detailed and systematic
dissection of many well- known botnets that have
appeared in the past. Current research on botnets is
mainly focused on monitoring and detection. [6],
[3], [16], [17] presented comprehensive studies on
using honey pots to join botnets in order to
Page 8 of 165
monitor botnet activities in the Internet. With the
help from Dynamic DNS service providers, [4]
presented a botnet monitoring system by
redirecting the DNS mapping of a C&C server to
a botnet monitor. Ramachandran et al. [5]
presented how to passively detect botnets by
finding botmasters queries to spam DNS-based
blackhole list servers (DNSBL).Since most
botnets nowadays use Internet Relay Chat (IRC)
for their C&C servers, many people have studied
how to detect them by detecting their IRC
channels or traffic. Binkley and Singh [7]
attempted to detect them through abnormal IRC
channels. Strayer [18] used machine-learning
techniques to detect botnet IRC-based control
traffic and tested the system on trace-driven
network data. Chen [19] presented a system to
detect botnet IRC traffic on high-speed network
routers.Nevertheless, few people have studied
how bot mas t er s mi ght improve their attack
techniques. [8], [9], [10], [11], [15] only
introduced the attack techniques already
implemented in several botnets appearing in the
past. Zou and Cunningham [20] studied how
botmasters might improve their botnets to avoid
being monitored by a honey pot. Our research
presented in this paper belongs to this
category.Our research is conducted at the same
time and independent with the work done by Vogt
et al. [21]. In [21], the authors presented a super-
botnet, which is a super-size botnet by inter-
connecting many small botnets together in a peer-
to- peer fashion. However, 21] largely ignored
two important practical issues that have been
addressed in our work: (1). The majority of
compromised computers cannot be used as C&C
servers since they are either behind firewall or
have dynamic IP addresses; (2). The robust
botnet control topology cannot be set up through
reinfection mechanism, if a botnet does not have
substantive reinfections during its built-up, which
is the case for most botnets in reality.
III. PROPOSED HYBRID P2P BOTNET
ARCHITECTURE
A.Two Classes of
Bots
The bots in the proposed P2P botnet are
classified into two groups. The first group
contains bots that have static, non-private IP
addresses and are accessible from the global
Internet. Bots in the first group are called servent
bots since they behave as both clients and
servers
1
. The second group
1
In a traditional
peer-to-peer file sharing system, all hosts behave
both as clients and servers and are called
servents [22].contains the remaining bots,
including: (1). Bots with dynamically
allocated IP addresses; (2). Bots with private
IP addresses; (3). Bots behind firewalls such
that they cannot be connected from the global
Internet. The second group of bots is called client
bots since they will not accept incoming
connections. Only servent bots are candidates in
peer lists. All bots, including both client bots
and servent bots, actively contact the servent bots
in their peer lists to retrieve commands. Because
servent bots normally do not change their IP
addresses, this design increases the network
stability of a botnet. This bot classification will
become more important in the future as a larger
proportion of computers will sit behind firewall,
or use DHCP or private IP addresses due to
shortage of IP space. A bot could easily
determine the type of IP address used by its
host machine. For example, on a Windows
machine, a bot could run the command
ipconfig /all. Not all bots with static global IP
addresses are qualified to be servent botssome
of them may stay behind firewall, inaccessible
from the global Internet. A botmaster could rely
on the col laboration between bots to determine
such bots. For example, a bot runs its server
program and requests the servent bots in its peer
list to initiate connections to its service port. If
the bot could receive such test connections, it
labels itself as a servent bot. Otherwise; it labels
itself as a client bot.
B. Botnet Command and Control
Architecture
Fig. 2 illustrates the command and control
architecture of the proposed botnet. The
illustrative botnet shown in this figure has 5
servent bots and 3 client bots. The peer list
size is 2 (i.e. each bots peer list contains the IP
addresses of 2 servent bots). An arrow from bot A
to bot B represents bot A initiating a connection
to bot B.A botmaster injects his or her
commands through any bot(s) in the botnet.
Both client and servent bots actively and
Page 9 of 165
periodically connect to the servent bots in their
peer lists in order to retrieve commands issued by
their botmaster. When a bot receives a new
command that it has never seen before (e.g., each
command has a unique ID), it immediately
forwards the command to all servent bots in its
peer list. This description of command
communication means that, in terms of
command forwarding, the proposed botnet has an
undirected graph topology. A botmasters
command could pass via the links shown in Fig.
2 in both directions. If the size of the botnet peer
list is denoted by M, then this design makes sure
that each bot has at least M venues to receive
commands.
C. Relationship between Traditional C&C
Botnets and the Proposed Botnet
Compared to a C&C botnet (see Fig. 1), it is
easy to see that the proposed hybrid P2P botnet
shown in Fig. 2 is actually an extension of a
C&C botnet. The hybrid P2P botnet is equivalent
to a C&C botnet where servent bots take the
role of C&C servers: the number of C&C
servers (servent bots) is greatly enlarged, and
they interconnect with each other. Indeed, the
large number of servent bots is the primary
reason why the proposed hybrid P2P botnet is
very hard to be shutdown. We will explain these
properties in detail later in SectionVI and Section
VII.
IV. BOTNET COMMAND AND
CONTROL
The essential component of a botnet is its
command and control communication. Compared
to a C&C botnet, the pro- posed botnet has a
more robust and complex communication
architecture. The major design challenge is to
generate a botnet that is difficult to be shut down,
or monitored by defenders or other attackers.
A. Command
Authentication
Compared with a C&C botnet, because bots in
the proposed botnet do not receive commands
from predefined places, it is especially
important to implement a strong command
authentication. A standard public-key
authentication would be sufficient. A botmaster
generates a pair of public/private keys, K
+
, K

, and hard codes the public key K
+
into the
bot program before releasing and building the
botnet. There is no need for key distribution
because the public key is hard-coded in bot
program. Later, the command messages sent from
the botmaster could be digitally signed by the
private key K

to ensure their authentication
and integrity. This public-key based
authentication could also be readily deployed by
current C&C botnets. So botnet hijacking is not
a major issue.
B.Individualized
Encryption Key
In the proposed botnet, each servent bot i
randomly gener- ates its symmetric encryption
key K
i.
Suppose the peer list on bot A is
denoted by L
A
. It will not only contain the IP
addresses of M servent bots, but also the
symmetric keys used by these servent bots. Thus,
the peer list on bot A is:
L
A
= {(IP
i1
, K
i1
), (IP
i2
, K
i2
), (IP
iM
, K
iM
)} (1)
where (IP
ij
, K
ij
) are the IP address and
symmetric key used by servent bot i
j
. With
such a peer list design, each servent bot uses its
own symmetric key for incoming connections
from any other bot. This is applicable because if
bot B connects to a servent bot A, bot B must
have (IP
A
, K
A
) in its peer list.This
individualized encryption guarantees that if
defenders capture one bot, they only obtain keys
used by M servent bots in the captured bots peer
list. Thus the encryption among the remaining
botnet will not be compromised.
C.Individualized
Service Port
The peer-list based architecture also enables the
proposed botnet to disperse its communication
traffic in terms of service port. Since a servent
bot needs to accept connections from other
bots, it must run a server process listening on a
service port. The service port number on servent
bot i, denoted by P
i
, could be randomly picked
by the bot. Considering this, a peer list needs to
contain the service port information as well. For
Page 10 of 165
example, the peer list on bot A is:
L
A
= {(IP
i1
, K
i1
, P
i1
), , (IP
iM
,
K
iM
, P
iM
)}
(2)The individualized service port design
has two benefits for botmasters:
Dispersed network traffic: Since service
port is a critical parameter in classifying
network traffic, this individual- ized port
design makes it extremely hard for defenders
to detect a botnet based on monitored
network traffic. When combined with the
individualized encryption design, a P2P
botnet has a strong resistance against most
(if not all) network traffic flow based
detection systems, such as the ones
introduced in [19], [18].
Secret backdoor: The individualized port
design also ensures that servent bots in a P2P
botnet keep their backdoors secret.
Otherwise, defenders could scan the specific
port used by a botnet to detect potential
servent bots, or monitor network traffic
targeting this service port to facilitate their
botnet detection.
A randomly-generated service port may not
always be good for botnets since network traffic
going to a rarely used port is abnormal. To
overcome this, a botmaster can specify a set of
service ports for each bot to choose, preferably
choosing from those standard encrypted ports
such as port 22 (SSH), port 443 (HTTPS), or port
993 (IMAPS). Furthermore, a sophisticated
botmaster could even program bot code to mimic
the protocol format of the service port as what
honeyd [23] does.
V. BOTNET MONITORING BY ITS
BOTMASTER
Another major challenge in botnet design is
making sure that a botnet is difficult to be
monitored by defenders, but at the same time,
easily monitored by its botmaster. With
detailed botnet information, a botmaster could
(1). Conduct attacks more effectively according
to the bot population, distribution, bandwidth,
on/off status, IP address types, etc; (2). Keep
tighter control over the botnet when facing
various counter attacks from defenders. In this
section, we present a simple but effective way
for botmasters to monitor their botnets whenever
they want, and at the same time, resist being
monitored by others.
A. Monitoring Via a Dynamically
Changeable Sensor
To monitor the proposed hybrid P2P botnet, a
botmaster issues a special command, called a
report command, to the botnet thereby
instructing every bot to send its information to a
specified machine that is compromised and
controlled by the botmaster. This data collection
machine is called a sensor.
The IP address (or domain name) of the
centralized sensor host is specified in the
report command. Every round of report
command issued by a botmaster could potentially
utilize a different sensor host. This would prevent
defenders from knowing the identity of the sensor
host before seeing the actual report command.
After a report command has been sent out by a
botmaster, it is possible that defenders could
quickly know the identity of the sensor host (e.g.,
through honeypot joining the botnet [3], [6]),
and then either shut it down or monitor the
sensor host. To deal with this threat, a botmaster
may implement any of the following procedures:
Use a popular Internet service, such as
HTTP or Email, for report to a sensor. The
sensor is chosen such that it normally provides
such a service to avoid exhibiting abnormal
network traffic.
Use several sensor machines instead of a
single sensor.
Select sensor hosts that are harder to be shut
down or monitored, for example, compromised
machines in other countries with minimum
Internet security and International collaboration.
Manually verify the selected sensor
machines are not honey pots (see further
discussion in Section IX).
Wipe out the hard drive on a sensor host
immediately after retrieving the report data.
Specify expiration time in report command to
prevent any bot exposing itself after that time.
Issue another command to the botnet to cancel
the previ- ous report command once the botmaster
knows that the sensor host has been captured by
defenders.If a botmaster simply wants to know the
Page 11 of 165
current size of a botnet, a probabilistic report
would be preferred: each bot uses a small
probability p specified in a report command to
decide whether to report. Then the botnet has
roughly X/p bots if X bots report. Such a
probabilistic report could minimize the telltale
traffic to the report sensor. Each bot could use its
hard-coded public key K
+
to ensure the
confidentiality of its report data. In addition, a
botmaster could use several compromised
machines as stepping stones in retrieving the data
on sensors. These are standard practices so we will
not explain more.
B. Additional Monitoring
Information
From our understanding, there are three
additional measure- ments directly affecting the
efficiency of botnet attacks: attack bandwidth, IP
address type, and diurnal dynamics. First, to
conduct an effective denial-of-service attack, a
botmaster may want to measure the actual
bandwidth from each bot to a target machine. It
could be done by letting each bot to have a
couple of normal connections with the target
machine, based on any available bandwidth
measurement techniques.Second, each bot could
have its randomly-generated unique ID
2
and
report its ID with other information to its
botmasters sensor. In this way, a botmaster could
obtain an accurate report, and also know the
properties of bots with DHCP or NAT
addresses. With this information, a botmaster
could conduct fine-tuned attacks, e.g., only letting
bots that frequently change their IP addresses to
send out email spam in order to avoid being
blocked by DNSBL-based spam filter [5].Third,
as pointed out by [4], the online population of
a botnet exhibits a clear diurnal dynamics due
to many users shutting down their computers at
night. In a time zone, the peak online
population of a botnet could be as much as four
times of the bottom level online population.
To maximize botnet attack strength, a
botmaster may launch a denial- of-service
attack at the right time when the botnet online
population reaches its peak level, or spreadnew
malware
2
To make sure each bot has a unique
ID, a simple way is to let every bot randomly
generate a very large number for its I Done or
two collisions do not matter much.
at an optimal release time to maximize its
propagation speed [4]. Since bots could function
as spyware, its not hard for a bot to obtain its
host machines diurnal dynamics.
VI. BOTNET CONSTRUCTION
A. Basic construction
procedures
Botnet connectivity is solely determined by the
peer list in each bot. A natural way to build peer
lists is to construct them during propagation. To
make sure that a constructed botnet is connected,
the initial set of bots should contain some servent
bots whose IP addresses are in the peer list on
every initial bot. Suppose the size of peer list in
each bot is configured to be M . As a bot
program propagates, the peer list in each bot is
constructed according to the following
procedures:
New infection: Bot A passes its peer list to a
vulnerable host B when compromising it. If B
is a servent bot, A adds B into its peer list (by
randomly replacing one bot if its peer list is full).
Similarly, if A is a servent bot, B adds A into its
peer list in the same way.
Reinfection: If reinfection is possible and
bot A reinfects bot B, bot B will then replace R
(R M 1) randomly- selected bots in its peer
list with R bots from the peer list provided by
A. Again, bot A and B will add each other into
their respective peer lists if the other one is a
servent bot.
The reinfection procedure makes it harder for
defenders to infer the infection time order
(traceback) among bots based on captured peer
lists. In this process, a bot does not provide its
peer list to those who reinfect it. This is
important, because, if not, defenders could
recursively infect (and monitor) all servent bots
in a botnet based on a captured bot in their
honeypot in the following way: Defenders use a
firewall redirecting the outgoing infection
attempts from captured bot A to reinfect the
servent bots in As peer list; then subsequently get
the peer lists from these servent bots and reinfect
servent bots in these peer lists in turn.In order to
study a constructed botnet topology and its ro-
Page 12 of 165
bustness via simulations, we first need to
determine simulation settings. First, Bhagwan et
al. [24] studied P2P file sharing systems and
observed that around 50% of computers changes
their IP addresses within four to five days. So we
expect the fraction of bots with dynamic
addresses is around the similar range. In
addition, some other bots are behind firewalls
or NAT boxes so that they cannot accept
Internet connections. We cannot find a good
source specifying this statistics, so in this paper
we assume that 25% of bots are servent bots.
Second, as pointed out in [25], [26], botnets in
recent years have dropped their sizes to an
average of 20,000, even though the potential
vulnerable population is much larger. Thus we
assume a botnet has a potential vulnerable
population of500,000, but stops growing after it
reaches the size of 20,000. In addition, we
assume that the peer list has a size of M = 20
and that there are 21 initial servent hosts to
start the spread of the botnet. In this way, the
peer list on every bot is always full.From our
simulation experiments, we find that a botnet
constructed only with the above two procedures
is not robust enough. Because a botnet stops
growing after reaching the size of 20,000, the
reinfection events rarely happen (around 600).
Due to this phenomenon, connections to
servent bots are extremely unbalanced: more
than 80% (4000) of servent bots have degrees
less than 30, while each of the 21 initial servent
bots have a degree between 14,000 and 17,500.
This is not an ideal botnet. The constructed
hybrid P2P botnet is approximately degraded to a
C&C botnet where the initial set of servent bots
behave as C&C servers.Vogt et al. [21]
constructed a super-botnet only with the
algorithms that are similar to the new infection
and re- infection procedures presented above.
Although authors in [21] showed that their
constructed super-botnet is robust, they have
an implicit assumption that the super-botnet
will have abundant reinfections during its
construction period. We believe this assumption
is incorrect in a real world scenario botmasters
would want their botnets generating as few as
possible reinfections to avoid wasting infection
power and being detected by defenders.To
illustrate this argument, we have simulated
another botnet scenario where the potential
vulnerable population is20,000 instead of
500,000 used in the previous simulation. The
botnet stops propagation after all vulnerable hosts
have been infected. In this simulation, 210,000
reinfection events happened. This time, because
there are plenty of reinfections, the constructed
botnet has a well-balanced connectivitythe
degree distribution of all servent bots roughly
follows normal distribution, and 80% of servent
bots have degrees between30 and 150.
C. Advanced
construction
procedure
One intuitive way to improve the network
connectivity would be letting bots keep
exchanging and updating their peer lists
frequently. However, such a design makes it very
easy for defenders to obtain the identities of all
servent bots, if one or several bots are captured
by defenders.As introduced in Section V, a
botmaster could monitor his botnet easily
whenever he wants by issuing a report command.
With the detailed botnet information, a botmaster
could easily update the peer list in each bot to
have a strong and balanced connectivity. The
added new procedure is:
Peer-list updating: After a botnet spreads
out for a while, a botmaster issues a report
command to obtain the information of all
currently available servent bots. These servent bots
are called peer-list updating servent bots. Then,
the botmaster issues another command, called
update command, enabling all bots to obtain an
updated peer list from a specified sensor host.
Entries in the updated peer list in each bot are
randomly chosen from those peer-list updating
servent bots. botmaster could run this procedure
once or a few times during or after botnet
propagation stage. After each run of this
procedure, all current bots will have uniform and
balanced connections to peer-list updating servent
bots.When and how often should this peer-list
updating proce-
dure be run? First, this procedure
should be executed once
from removing all
initial servent bots. Second, as a botnet spreads
out, each round of this updating procedure
Page 13 of 165
#
o
f
s
e
r
v
e
n
t
b
o
t
s
makes the constructed botnet have a stronger
and more balanced connectivity, but at the
same time, it incurs an increasing risk of
exposing the botnet to defenders. It is therefore
up to a botmaster to strike a comfortable balance.
In addition, a botmaster could run this procedure
to conveniently reconnect a broken botnet.
4000
3000
2000
1000
0
2 4 6 8 10
log (# of degrees)
Fig. 3. Servent bot degree distribution
Fig. 3 shows the degree distribution for servent
bots (client bots always have a degree of M )
when a botnet uses all three construction
procedures. We assume the peer-list updating
procedure is executed just once when 1,000 (25%
of) servent bots have been infected. This
figure shows that although those 4000 servent
bots infected after the peer-list updating
procedure still have small degrees, the first 1000
servent bots used in peer-list updating have large
and balanced connection degrees, ranging from
300 to 500. They form the robust backbone,
connecting the hybrid P2P botnet tightly
together.
VII. BOTNET ROBUSTNESS STUDY
Next, we study the robustness property of a
constructed hybrid P2P botnet. Two factors
affect the connectivity of a botnet: (1). Some
bots are removed by defenders; and (2). Some
bots are off-line (for example, due to the diurnal
phenomenon [4]). These two factors, even though
completely different, have the same impact on
botnet connectivity when the botnet is used by its
botmaster at a specific time. For this reason, we
do not distinguish them in the following study.
Since servent bots, especially the servent bots
used in peer- list updating procedure, are the
backbone connecting a botnet together, we study
botnet connectivity when a certain fraction of
peer-list updating servent bots are removed (that
is to say, either removed by defenders or off-
line). Let C (p) denote the connected ratio and
D(p) denote the degree ratio after removing top
p fraction of mostly-connected bots among those
peer-list updating servent botsthis is the most
efficient and aggressive defense that could be
done when defenders have the complete
knowledge (topology, bot IP addresses ...) of the
botnet. C(p) and D(p) are defined as:
# of bots
in the largest connected graph
shortly after the
release of a botnet to prevent defenders C(p)
= (3)
# of remaining bots
D (p) =
Average degree of the largest
connected graph / Average
degree of the
original botnet (4) Probability p to be removed.
Thus, the probability that a bot is disconnected
is p
M
. Therefore, any remaining bot has the
The metric C (p) shows how well a botnet
survives a defense action; the metric D (p)
exhibits how densely the remaining botnet is
connected together. Same probability 1p
M
to
stay connected, i.e., the mean value Of C(p) is
(in case of random removal):
C(p) = 1 p
M
(5)
Fig. 4. Botnet robustness study
Fig. 5. Comparison of the analytical formula
(5) and simulation results
Page 14 of 165
Fig. 4 shows the robustness study. The botnet is
the one shown in Fig. 3 that has a vulnerable
population of 500,000 and runs the peer-list
updating procedure once when 1,000 servent
bots are infected. As shown in this figure, if
all 1000 peer-list updating servent bots are
removed, the botnet will be completely broken.
This result shows the importance of the peer-list
updating procedure. The botnet will largely stay
connected (C(p) > 95%) if less than 700
of those 1000 peer-list updating servent bots are
removed, although it has a gradually decreasing
connectivity with further removal (as exhibited
by D(p)). This experiment shows the strong
resistance of the proposed botnet against defense,
even if defenders know the identities of all bots
and the complete botnet topology.
A.Robustness Mathematical
Analysis
We provide a simple analytical study of the
botnet robust- ness. Assume that each peer list
contains M servent bots. It is hard to provide a
formula when removing the top p fraction of
mostly-connected nodes. However, we could
provide the formula of C (p) when randomly
removing p fraction of peer- list updating servent
bots.As we discussed before, the servent bots not
used in peer- list updating procedure have very
few extra links besides the M links given by
their own peer lists. We simplify the analysis by
assuming that each bot in the botnet connects
only to peer- list updating servent bots. Then,
when we consider removing a fraction of peer-
list updating servent bots, more links will be
removed compared to the original botnet
network. Because of this bias, the analytical
formula presented below slightly underestimates
C (p) in the case of random removal.A bot is
disconnected from the others when all M servent
bots in its peer list have been removed. Because
of the random removal, each peer-list updating
servent bot has the equal
0 20 40 60 80 100
Size of peer list: M
Fig. 5 shows the analytical result from (5),
comparing with the simulation result C(p) of the
random removal, and the sim- ulation result C(p)
of the removal of top p fraction of mostly-
connected peer-list updating servent bots. The
analytical curve lies between those two simulated
robustness metrics. It shows that the analytical
formula indeed has a small underestimation bias
compared with the random removal. Because
removing top p fraction will remove more links
from the botnet network than a random removal,
the simulation results C(p) from the top removal
scenario are slightly lower than the derived results
from (5). In summary, this figure shows that,
even though the analytical formula (5) is not very
accurate, it provides a good first-hand estimate of
the robustness of a botnet.This figure also shows
that the proposed botnet does not need a large
peer list to achieve a strong robustness.The
robustness study presented here is a static study
and analysis. We have not considered how a
botnet behave if bots are removed by defenders
gradually, or when bots are removed as the botnet
spreads. For this reason, the botnet infection rate
and spreading speed does not matter to our
robustness study, and we will study these issues
in the future.
VIII. DEFENSE AGAINST THE
PROPOSED HYBRID P2P BOTNET
A. Annihilating
We introduce possible annihilating defense in
three ways. First, the proposed hybrid P2P botnet
relies on servent bots in constructing its
communication network. If the botnet is unable
to acquire a large number of servent bots, the
botnet will be degraded to a traditional C&C
botnet (the relationship of these two botnets is
discussed in Section III-C), which is much easier
to shut down. For this reason, defenders should
focus their defense effort on computers with
static global IP addresses, preventing them
from being compromised, or removing
compromised ones quickly.Second, as shown in
Section VI, before a botmaster issues an update
command for the first time, a botnet is in
its most vulnerable state since it is mainly
connected through the small set of initial servent
bots. Therefore, defenders should develop quick
detection and response systems, enabling them to
quickly shut down the initial set of servent bots in
a newly created botnet before its botmaster issues
Page 15 of 165
the first update command.Third, defenders could
try to poison the communication channel of a
P2P botnet based on honeypot techniques. If they
let their infected honeypots join the botnet and
claim to have static global IP addresses (these
honeypots are configured to accept connections
from other bots on their claimed global IP
addresses), they will be treated as servent
bots. As a result, they will occupy positions in
peer lists in many bots, decreasing the number of
valid communication channels in the hybrid P2P
botnet.As discussed in Section VII, the strong
robustness of the proposed botnet relies heavily
on the peer-list updating pro- cedure. Servent
bots used in the peer-list updating procedure form
the backbone of the communication network of a
botnet. Therefore, the best strategy to disrupt the
communication channel of a botnet, perhaps, is to
poison the peer-list updating procedure with the
following steps. First, once a honeypot is
infected by a bot program, defenders quickly let
the bot program to infect many other honeypots
(for example, by redirecting the bots outgoing
infection traffic to other hon- eypots). Then,
when receiving a report command from the
botmaster, all honeypot bots report as servent
bots so that they will be used in the peer-list
updating procedure. Defenders would achieve
better poisoning defense if they have distributed
honeypots and a large number of IP addresses.
B. Monitoring
In this area, defenders hold a better position
with the help from honeypots. If they utilize a
honeypot on a large IP space, they may be able to
trap a large number of botnet infection attempts.
If the bot program cannot detect the honeypot
and passes its peer list in each infection attempt,
the defenders could get many copies of peer lists,
obtaining the important information (IP
addresses, encryption key, service port) of
many servent bots in a botnet.Second, based on
honeypot bots, defenders may be able to obtain
the plain text of commands issued by a
botmaster. Once the meaning of the commands is
understood, defenders are able to: (1). Quickly
find the sensor machines used by a botmaster in
report commands. If a sensor machine can be
captured by defenders before the collected
information on it is erased by its botmaster, they
might be able to obtain detailed information of
the entire botnet; (2). Know the target in an attack
command so that they could implement
corresponding countermeasures quickly right
before (or as soon as) the actual attack
begins.Another honeypot-based monitoring
opportunity happens during peer-list updating
procedure. First, defenders couldlet their
honeypot bots claim to be servent bots in peer-
list updating. By doing this, these honeypots
will be connected by many bots in the botnet.
Second, during peer-list updating, each honeypot
bot could get a fresh peer list, which means the
number of bots revealed to each honeypot could
be doubled. For the simulated botnet shown in
Fig. 3, we conduct another set of simulations
where one of its servent bot is a honeypot. If
the honeypot is one of the initial servent bot,
the honeypot knows the identity of on average
20% of bots in the botnet after the botnet
propagation stops (the botnet has 20,000 bots).
If the honeypot joins in the botnet before the
peer-list updating procedure (when the botnet
infects 500 servent bots), it knows on average
2.3% of bots in the botnet after the botnet
propagation stops. If the honeypot joins in the
botnet right after the one and only one peer-list
updating procedure, the honeypot could only
know around 30 bots in the botnet.
A possible weakness point of the proposed
botnet is its centralized monitoring sensor. If
defenders have set up a good traffic logging
system, it is possible that they could capture the
traffic to a botnet sensor. We call such a
monitoring system as a botnet sensor monitor.
Even though defenders may not be able to
capture a botnet sensor before its botmaster
destroying the sensor (after completing
botmasters monitoring task), they still could use
the captured traffic log to figure out the IP
addresses of potential bots who contacted the
sensor in the past. In this way, defenders could
get a relatively complete picture of a botnet.
Not like the other monitoring methods, the
above traffic logging and analysis approach does
not rely on honeypot systems. This makes it
important to conduct further research on this
approach since we must be prepared in case a
future smart botnet can detect and disable
honeypot.
Page 16 of 165
IX. DISCUSSIONS
From the defense discussion in previous
section, we see that honeypot plays a critical role
in most defense methods against the proposed
hybrid P2P botnet. Botmasters might de- sign
countermeasures against honeypot defense
systems. Such countermeasures might include
detecting honeypots based on software or
hardware fingerprinting [27], [28], [29], or
exploiting the legal and ethical constraints held
by honeypot owners [20]. Most of current botnets
do not attempt to avoid honeypotsit is simply
because attackers have not feel the threat from
honeypot defense yet. As honeypot-based defense
becomes popular and being widely deployed, we
believe botmasters will eventually add honeypot
detection mechanisms in their botnets. The war
between honeypot-based defense and honeypot-
aware botnet attack will come soon and intensify
in the near future.For botnet defense, current
research shows that it is not very hard to
monitor Internet botnets [4], [15], [30]. The hard
problem is: how to defend against attacks sent
from botnets, since it is normally very hard to
shut down a botnets control? Because of legal
and ethical reason, we as security defenders
cannot actively attack or compromise a remote
bot machine or a botnet C&C server, even if we
are sure a remote machine is installed with a bot
program. For example, the well-known good
worm approach is not practical in the real world.
The current practice of collaborating with the
ISPs containing bot- infected machines is slow
and resource-consuming. There are still
significant challenges in botnet defense research
in this aspect.
X. CONCLUSION
To be well prepared for future botnet attacks,
we should study advanced botnet attack
techniques that could be de- veloped by
botmasters in the near future. In this paper, we
present the design of an advanced hybrid peer-to-
peer botnet. Compared with current botnets, the
proposed one is much harder to be shut down or
monitored. It provides robust net- work
connectivity, individualized encryption and
control traffic dispersion, limited botnet exposure
by each captured bot, and easy monitoring and
recovery by its botmaster. To defend against
such an advanced botnet, we point out that
honeypot may play an important role. We should,
therefore, invest more research into determining
how to deploy honeypots efficiently and avoid
their exposure to botnets and botmasters.
REFERENCES
[1] S. Kandula, D. Katabi, M. Jacob, and A.
Berger, Botz-4-sale: Surviving organized ddos
attacks that mimic flash crowds, in 2nd
Symposium on Networked Systems Design and
Implementation (NSDI), May 2005.
[2] C. T. News, Expert: Botnets no. 1 emerging
internet threat, 2006,
[3] F. Freiling, T. Holz, and G. Wicherski,
Botnet tracking: Exploring a root-cause
methodology to prevent distributed denial-of-
service attacks, CS Dept. of RWTH Aachen
University, Tech. Rep. AIB-2005-07, April
2005.
[4] D. Dagon, C. Zou, and W. Lee, Modeling
botnet propagation using time zones, in
Proceedings of 13th Annual Network and
Distributed System Security Symposium (NDSS),
Feburary 2006, pp. 235249.
[5] A. Ramachandran, N. Feamster, and D.
Dagon, Revealing botnet membership using
dnsbl counter-intelligence, in USENIX 2nd
Workshop on Steps to Reducing Unwanted Traffic
on the Internet (SRUTI 06), June2006.
[6] E. Cooke, F. Jahanian, and D. McPherson,
The zombie roundup: Understanding, detecting,
and disrupting botnets, in Proceedings of
SRUTI: Steps to Reducing Unwanted Traffic on
the Internet, July 2005.
7] J. R. Binkley and S. Singh, An algorithm
for anomaly-based botnet detection, in
USENIX 2nd Workshop on Steps to Reducing
Unwanted Traffic on the Internet (SRUTI 06),
June 2006.
[8] I. Arce and E. Levy, An analysis of the
slapper worm, IEEE Security & Privacy
Magazine, Jan.-Feb. 2003.
[9] Sinit P2P trojan analysis.
[10] Phatbot Trojan analysis.
[11] R. Lemos. (2006, May) Bot software
looks to improve peerage.
[12] Servent,
http://en.wikipedia.org/wiki/Serv
ent. [13] R. Puri, Bots &
Page 17 of 165
botnet: An overview, 2003,
[14] B. McCarty, Botnets: Big and bigger,
IEEE Security & Privacy Magazine, vol. 1, no.
4, July 2003.
[15] P. Barford and V. Yegneswaran, An Inside
Look at Botnets, To appear in Series:
Advances in Information Security. Springer,
2006.
[16] H. Project, Know your enemy:
Tracking botnets, 2005,
[17] F. Monrose. (2006) Longitudinal
analysis of botnet dynamics.
ARO/DARPA/DHS Special Workshop on
Botnet.
[18] T. Strayer. (2006) Detecting botnets with
tight command and control.
ARO/DARPA/DHS Special Workshop on
Botnet.
[19] Y. Chen. (2006) IRC-based botnet
detection on high-speed routers.
ARO/DARPA/DHS Special Workshop on
Botnet.
[20] C. Zou and R. Cunningham, Honeypot-
aware advanced botnet con- struction and
maintenance, in Proceedings of
International Conference on Dependable
Systems and Networks (DSN), June 2006.
[21] R. Vogt, J. Aycock, and M. Jacobson,
Army of botnets, in Proceedings of 14th
Annual Network and Distributed System
Security Symposium(NDSS), month =
Feburary, year=2007.
[22] E. K. Lua, J. Crowcroft, M. Pias, R.
Sharma, and S. Lim, A survey and
comparison of peer-to-peer overlay network
schemes, IEEE Com- munications Surveys
and Tutorials, vol. 7, no. 2, 2005.
[23] N. Provos, A virtual honeypot
framework, in Proceedings of 13thUSENIX
Security Symposium, August 2004.
[24] R. Bhagwan, S. Savage, and G. M. Voelker,
Understanding availability, in Proceedings
of the 2nd International Workshop on
Peer-to-Peer Systems (IPTPS), Feb 2003.
[25] C. News. (2005, November) Bots
slim down to get tough.
[26] (2006, February) Washington
post: The botnet trackers.
http://www.washingtonpost.com/w
p-dyn/
content/article/2006/02/16/AR2006
021601388.html.
[27] K. Seifried, Honeypotting with
VMware basics, 2002,
http://www.seifried.org/security/ind
ex.php/
Honeypotting With VMWare Basics.
[28] J. Corey, Advanced honey pot
identification and exploitation, 2004,
http://www.phrack.org/fakes/p63/p63-
0x09.txt.
[29] Honeyd security advisory 2004-001:
Remote detection via simple probe packet,
2004, http://www.honeyd.org/adv.2004-
01.asc.
[30] M. Rajab, J. Zarfoss, F. Monrose, and A.
Terzis, A multifaceted approach to
understanding the botnet phenomenon, in
Internet Mea-surement Conference, October
2006.
Page 18 of 165
NANOMEDICINE FOR ARTERY BLOCKAGE
R.SUGANTHA LAKSHMI
1
& J.AMBIKA
2

Senior Lecturers, Kings College of Engineering,
Punalkulam, Thanjavur.
E-mail:suganthi83@gmail.com,ambi.nagu08@gmail.com

ABSTRACT
For centuries, man has searched for miracle
cures to end suffering caused by disease and
injury. Many researchers believe
nanotechnology applications in medicine
may be mankinds first 'giant step' toward
this goal. Much research has been done in to
finding possible ways of treating the
symptoms of heart disease. Nanotechnology
is an exciting, new, upcoming field that has
already has, and will continue to have, vast
opportunities for the future of medicine. The
central idea of this paper is to investigate
how nanotechnology is used in the treatment
of heart disease. For this purpose, a new
particles which they call nanoronfler can
cling to damaged artery walls and slowly
release drugs in which nanoparticles is
designed to clear cardiovascular disease.
Like burrs, the seeds that get caught on your
clothes outside, "nanoronfler" bristle with
tiny hooks, making them adept at clinging to
exposed surfaces of wounded arteries. We
hope to conclude from this paper that the
development of Nanotechnology will greatly
increase advancements in medical treatment
and hopefully lead to cures for many of the
ailments especially Heart disease we suffer
from day-today life.
Key words:
Nanotechnology; Atherosclerosis;
Stents; Angioplasty

1. INTRODUCTION
Heart disease
Heart disease is one of the most
common causes of death in our society. In
009 approximately a third of all deaths in
males were due to cardiovascular disease.
This figure has decreased since 1961, where
almost 50 % of all male deaths were due to
cardiovascular disease. Despite this,
cardiovascular disease is one of the largest
causes of death in males, equal only to
cancer. High cholesterol levels are
especially influential as a cause of
atherosclerosis, the condition caused by a
buildup of cholesterol, fibres and dead
muscle cells as plaques (atheromas) on the
lining of artery walls. Atherosclerosis causes
the arteries to become narrower Fig 1.1, thus
leading to such other cardiovascular diseases
as thrombosis and myocardial infarction.

Figure 1 Normal vs. Narrow Artery
Numerous treatments have been
developed for atherosclerotic vascular
diseases and for those patients with severe
artery blockage, coronary artery bypass
grafting or coronary artery stenting are
required. Vascular stents Figure are
spring-like cylindrical and hollow metal-
based implantable devices for the treatment
of vessel related blockages.
Page 19 of 165

Figure 2 Vascular stents


Vascular stenting is the procedure of
implanting a thin metal such as Ti, stainless
steel, Nitinol or CoCr alloys tube into the
part of the artery blocked by plaque
accumulation in order to prop it open and re-
establish blood flow. A standard treatment
for clogged and damaged arteries is
implanting a vascular stent, which holds the
artery open and releases drugs such as
paclitaxel.

2. ATHEROSCLEROSIS vs.
NANORONFLER
Atherosclerosis is being approached
in a number of ways.
Prevention
Surgery
Prevention is used for patients who
appear to be at high risk of cardiovascular
disease and atherosclerosis. Depending on
the severity of the risk increase, patients are
either advised to change their lifestyle or
prescribed medicines to decrease some of
the risk factors (e.g. statins for high
cholesterol levels).
Surgery is used when critical arteries
become affected by atherosclerosis, mostly
the coronary artery. Again, this is done in
two ways.
Coronary Angioplasty - where a
balloon is fed using a catheter to the
narrowed artery via either the arm or groin
and expanded, thus flattening the atheroma
and widening the artery.
Coronary Bypass surgery - where a
portion of a healthy blood vessel is used to
create an alternative pathway for blood to
flow to the heart, thus bypassing the narrow
vessel.
Although these are only a few
methods of treating atherosclerosis (another
being stents), all of these have issues.
Prevention is only viable for those who have
an increased risk of atherosclerosis, or who
have mild cases of it. It cannot help those
with serious cases. The problem with the
treatment for serious cases is that they are
both invasive surgeries. There is an inherent
risk with any surgery and the ideal solution
would be to find a way to treat the 4 more
serious cases of atherosclerosis in a non-
invasive fashion. We therefore propose the
use of nanotechnology to achieve this.
Much research has been done in to
finding possible ways of treating the
symptoms of heart disease, and
nanotechnology has been proved to work in
treating heart diseases. It has been known to
help treat defective heart valves and detect
and treat arterial plaque.
Firstly, valves can become too rigid
or too soft causing the valve to become
floppy. This makes it much harder for the
heart to pump blood through to the different
chamber of the heart and to the body which
increases the blood pressure and could lead
to a myocardial infarction. With the use of
nanotechnology we are able to make
Nanorods which are usually made up of
nanomaterials and in most instances are
minute gold rods. This technology could
have broad applications across other
Page 20 of 165
important diseases, including cancer and
inflammatory diseases where vascular
permeability or vascular damage is
commonly observed.

3. FABRICATION OF NANORONFLER
A short peptide sequences of seven-amino-
acid as called C-11 bind to molecules on the
surface of the basement membrane. These
were later used to coat the outer layer of the
nanoparticles. The inner core of the 60-
nanometer- diameter particles carries the
drug, were bound to a polymer chain called
Poly Lactic Acid (PLA).

Figure 3 nanoronfler
A middle layer of soybean lecithin, a fatty
material, lies between the core and the
outer shell, consisting of a polymer called
Polyethylene Glycol (PEG) that protects
the particles as they travel through the
bloodstream.
Figure 4 Fabrication of nanoronfler
The drug can only be released when
it detaches from the PLA polymer chain,
which occurs gradually by a reaction called
ester hydrolysis. The longer the polymer
chain, the longer this process takes and thus
the timing of the drug s release can be
changed by altering the chain length of
polymer.

4. INJECTION OF NANORONFLER
There are two techniques to inject the
nanoronfler
In vivo
Ex vivo
In Vivo
In the in vivo technique, the
nanoronfler were injected and left to circulate
for 1 hour. Four times as many nanoronfler
were in the injured left carotid artery than in
the uninjured right one. This showed that the
nanoronfler were effective in targeting the
damaged tissue in the carotid artery. To check
that systematic delivery would be possible, the
nanoronfler are injected again into the body,
this time via another vein. Similarly positively
findings were achieved, with twice as many
nanoronfler found in the injured carotid artery
than in the uninjured one.
Ex Vivo
The ex vivo technique used balloon-
injured aortas to test the effectiveness of the
nano burrs. The nanoronfler were incubated
with the abdominal aortas for 5 minutes under
constant pressure and then washed out to
ensure any nanoronfler that had not attached
themselves to the arterial wall did not show up
on the fluorescent imaging. Twice the number
of nanoronfler attached themselves to the
arterial wall than the scrambled peptide and
non-targeted did. Similarly, when repeating
the experiment with uninjured aortic tissue,
four times as many nanoronfler attached
themselves to the injured tissue than the
uninjured tissue.
Page 21 of 165
5. TACTICS OF NANORONFLER
The nanoronfler latch onto injured
arterial walls because they're decorated
with peptides pulled from bacterial
phages, viruses that infect bacteria. It was
found through research that one peptide
latches onto the collagen that makes up
the basement membrane of arteries which
makes this peptide useful in preferentially
binding to the basement membrane of the
artery wall that gets exposed whenever the
artery is injured, such as during an
Angioplasty, where the inflated angio
balloon squeezes against the arterial wall,
pulling off the top layer of cells.
The nanoburr's stickiness means
these tiny hybrid-polymeric particles are
much more likely to hit the treatment
target than nanoparticles lacking the
protein hooks. In the current study, done
in both arterial cell cultures in a dish and in
the carotid arteries of living rats the burred
nanoparticles were between two and four
times as likely to glom onto injured arterial
tissue as non-burred varieties.

6. WORKING OF NANORONFLER
Process to keep the nanoronfler
sticking and circulating in the blood for a
long time is necessary for cardiovascular
disease patient because as soon as it enters
the patient body then body's natural
defenses will quickly muster attacks
against it, treating as a foreign particle
for the body .
To prevent this, nanoronfler are to be
sheathed in soy lecithin, a fatty substance
and then later should be coated with
polyethylene glycol (PEG) because PEG is
an inert hydrophilic substance and is able to
evade much of the body's defenses.
7. THE REASON FOR NANOPARTICLES
AS A DRUG DELIVERY SYSTEM
1. Particle size and surface characteristics of
Nanoparticles can be easily manipulated to
achieve both passive and active drug
targeting after parenteral administration.
. They control and sustain release of the
drug during the transportation and at the site
of localization, altering organ distribution of
the drug and subsequent clearance of the
drug so as to achieve increase in drug
therapeutic efficacy and reduction in side
effects.
3.Controlled release and particle degradation
characteristics can be readily modulated by
the choice of matrix constituents.
Drug loading is relatively high and
drugs can be incorporated into the systems
without any chemical reaction; this is an
important factor for preserving the drug
activity.
4. Site-specific targeting can be achieved by
attaching targeting ligands to surface of
particles or use of magnetic guidance.
5. The system can be used for various routes
of administration including oral, nasal,
parenteral, intra-ocular etc.

8. DISCUSSIONS
In the future, the hope is to use nanoronfler
alongside stents or in lieu of them to
treat damage located in areas not well suited
to stents, such as near a fork in the artery
(bifurcation lesions), diffuse lesions, larger
arteries, and also already-stented arteries
which may have more than one lesion. There
are also conditions which do not allow for
drug-eluting stent placement, such as in
patients with renal failure, diabetes and
hypertension, or patients who cannot take
the dual drug regimen of clopidogrel
.
Page 22 of 165
(antiplatelet agent used to inhibit blood clots
in coronary artery disease) and aspirin that is
required with this treatment.
The nanoronfler consist of spheres
60 nanometres across, more than 100 times
smaller than a red blood cell, and at its core
the particle has a drug designed to combat
narrowing of blood vessels bound to a
chain-like polymer molecule. The time it
takes for the drug to be released is controlled
by varying the length of the polymer.
The longer the chain, the longer the
duration of the release, which occurs
through a reaction called ester hydrolysis
whereby the drug becomes detached from
the polymer. Controlling drug release is
already promising to be a very important
development; as of today, drug release has
been a maximum of 1days. The hope is to
be able to control it more easily, and with
the speed of development currently, the wait
promises to be short.
The advantage of this is, because the
particles can deliver drugs over a longer
period of time and can also be injected
intravenously, patients would not have to
endure repeated and surgically invasive
injections directly into the area that requires
treatment. This would not only increase
patient comfort and satisfaction but reduce
the time in hospital for recovery, producing
only benefits for those involved.

9. FUTURE ENHANCEMENTS
There is now testing in rats over a
two-week period to determine the most
effective dose for treating damaged vascular
tissue. There are hopes that the particles may
also prove useful in delivering drugs to
tumours and could have broad applications
across other diseases including cancer. This
highlights the truly promising hopes for the
nanoronfler, and the changes they could
provide for medicine.
Again this would not only directly
improve the treatment of the patient, but also
indirectly improve research and other areas
by having this greater control of treatment.

10. CONCLUSION
Overall we believe that
Nanotechnology is of vital importance to
future developments in medicine because
one third of the population dies every year
due to heart disease. In our paper we have
discussed how Nanotechnology is currently
used in the treatment of Heart diseases using
nanoronfler. This method is not only used
for treating heart disease but also for curing
cancer and inflammatory diseases. We have
also looked at ongoing developments in the
field of Nanotechnology used in medicine.
The development of nanoparticles coated
with a thin layer of gold is currently in
development. This will allow doctors to
inject a patient with these nanoparticles,
once the nanoparticles have reached the area
of the body, the nanoparticles can be heated
using infrared radiation. The heat will
irreparably damage the foreign body
therefore curing the patient.
11. REFERENCES
1. http://discoverysedge.mayo.edu
. http://www.ncbi.nlm.nih.gov
3. http://www.understandingnano.com
4. http://www.actionbioscience.org
5. http://iopscience.iop.org
6. http://scienceray.com
7. http://www.medlinkuk.org
8. http://en.wikipedia.org
Page 23 of 165

A preliminary version of this material appeared at IEEE PIMRC 2008.


Removal of Network Jamming Using Port-Folio
Selection Criteria
Thiyagarajan M
1
, Sriram R
2
,Manojkumaar.S
3
Department of Computer Science and Engineering
>

'^d

Abstract
Multiple-path source routing protocols allow a data source
node to distribute the total traffic among available paths. In this
article, we consider the problem of jamming-aware source routing
in which the source node performs traffic allocation based on
empirical jamming statistics at individual network nodes. We
formulate this traffic allocation as a lossy network flow
optimization problem using portfolio selection theory from
financial statistics. We show that in multi-source networks,
this centralized optimization problem can be solved using a
distributed algorithm based on decomposition in network utility
maximization (NUM). We demonstrate the networks ability to
estimate the impact of jamming and incorporate these estimates
into the traffic allocation problem. Finally, we simulate the
achievable throughput using our proposed traffic allocation
method in several scenarios.
Index TermsJamming, Multiple path routing, Portfolio se-
lection theory, Optimization, Network utility maximization
I. INTRODUCTION
Jamming point-to-point transmissions in a wireless mesh
network [1] or underwater acoustic network [2] can have
debilitating effects on data transport through the network. The
effects of jamming at the physical layer resonate through
the protocol stack, providing an effective denial-of-service
(DoS) attack [3] on end-to-end data communication. The
simplest methods to defend a network against jamming attacks
comprise physical layer solutions such as spread-spectrum
or beamforming, forcing the jammers to expend a greater
resource to reach the same goal. However, recent work has
demonstrated that intelligent jammers can incorporate cross-
layer protocol information into jamming attacks, reducing
resource expenditure by several orders of magnitude by tar-
geting certain link layer and MAC implementations [4][6]
as well as link layer error detection and correction protocols
[7]. Hence, more sophisticated anti-jamming methods and
This work was supported in part by the following grants:
ARO PECASE, W911NF-05-1-0491; ONR, N000-07-1-0600;
ONR, N00014-07-1-0600; ARL CTA, DAAD19-01-2-0011;
and ARO MURI, W911NF-07-1-0287. This docu- ment was
prepared through collaborative participation in the
Communications and Networks Consortium sponsored by the
US Army Research Laboratory under the Collaborative
Technology Alliance Program, DAAD19-01-2-0011. The US
Government is authorized to reproduce and distribute reprints
for Government purposes notwithstanding any copyright
notation thereon. The views and conclusions contained in this
document are those of the author and should not be interpreted
as representing the official policies, either expressed or
implied, of the Army Research Laboratory or the US
Government.
P. Tague, S. Nabar, J. A. Ritcey, and R.
Poovendran are with the Network Security Lab
(NSL), Electrical Engineering Department, University
of Washington, Seattle, Washington. Email:
{tague,snabar,jar7,rp3. P. Tague is currently with
Carnegie Mellon University, Silicon Valley Campus,
Moffett Field, California. defensive measures must be
incorporated into higher-layer protocols, for example
channel surfing [8] or routing around jammed regions of the
network [6].
The majority of anti-jamming techniques make use of
diversity. For example, anti-jamming protocols may employ
multiple frequency bands, different MAC channels, or multiple
routing paths. Such diversity techniques help to curb the effects
of the jamming attack by requiring the jammer to act on
multiple resources simultaneously. In this paper, we consider
the anti-jamming diversity based on the use of multiple routing
paths. Using multiple-path variants of source routing protocols
such as Dynamic Source Routing (DSR) [9] or Ad-Hoc On-
Demand Distance Vector (AODV) [10], for example the MP-
DSR protocol [11], each source node can request several
routing paths to the destination node for concurrent use. To
make effective use of this routing diversity, however, each
source node must be able to make an intelligent allocation
of traffic across the available paths while considering the
potential effect of jamming on the resulting data throughput.
In order to characterize the effect of jamming on throughput,
each source must collect information on the impact of the
jamming attack in various parts of the network. However,
the extent of jamming at each network node depends on a
number of unknown parameters, including the strategy used
by the individual jammers and the relative location of the
jammers with respect to each transmitter-receiver pair. Hence,
the impact of jamming is probabilistic from the perspective of
the network
1
, and the characterization of the jamming impact
is further complicated by the fact that the jammers strategies
may be dynamic and the jammers themselves may be mobile
2
.
In order to capture the non-deterministic and dynamic
effects of the jamming attack, we model the packet error rate
at each network node as a random process. At a given time, the
randomness in the packet error rate is due to the uncertainty
in the jamming parameters, while the time-variability in the
packet error rate is due to the jamming dynamics and mobility.
Since the effect of jamming at each node is probabilistic,
the end-to-end throughput achieved by each source-destination
pair will also be non-deterministic and, hence, must be studied
using a stochastic framework.
Page 24 of 165

Fig. 1. An example network with sources S =


unicast link (i, j ) E is labeled with the correspond
the allocation of traffic across multiple
contributions to this problem are as follo
We formulate the problem of allo
multiple routing paths in the presen
lossy network flow optimization prob
optimization problem to that of a
portfolio selection theory [12], [13].
We formulate the centralized traffic
for multiple source nodes as a conve
lem.
We show that the multi-source mu
traffic allocation can be computed
using a distributed algorithm based
network utility maximization (NUM)
We propose methods which allow
nodes to locally characterize the ja
aggregate this information for the sour
We demonstrate that the use of portfo
allows the data sources to balance
throughput with the uncertainty in ac
The remainder of this article is organized
tion II, we state the network model and a
jamming attack. To motivate our formulati
we present methods that allow nodes to
jamming impact. These concepts are re
the traffic allocation optimization and
problem to Portfolio selection. In Sectio
the optimal multiple path traffic allocatio
source networks. In Section V, we evalu
of the optimal traffic allocation formulati
our contributions in Section VI.
II. SYSTE M MODE L AND ASSU
The wireless network of interest can
directed graph G = (N , E ). The vertex
network nodes, and an ordered pair (i, j
edge set E if and only if node j can rece
from node i. We assume that all communicati
the directed edges in E , i.e. each packet
i N is intended for a unique node j
The maximum achievable data rate, or cap
link (i, j) E in the absence of jamming is
{r, s} is illustrated. Each
sponding link capacity.
determined constant rate c
ij
in
Each source node s in a
for a single destination node
source node s constructs mu
a route request process similar
AODV [10] protocols. We let
collection of L
s
loop-free rou
that these paths need not b
Representing each path p
s!
E , the sub-network of intere
directed subgraph
!
Ls
G
s
= N
s
=
"
{j : (i
le routing paths. Our
ow:
of the graph G.
!=1
ocating traffic across
nce of jamming as a
problem. We map the
f asset allocation using
.
fic allocation problem
ex optimization prob-
multiple-path optimal
at the source nodes
on decomposition in
NUM) [14].
w individual network
jamming impact and
ource nodes.
folio selection theory
ce the expected data
in achievable traffic rates.
ized as follows. In Sec-
assumptions about the
lation, in Section III,
characterize the local
equired to understand
the mapping of this
on IV, we formulate
on problem for multi-
uate the performance
lation. We summarize
SSUMPTIONS
be represented by a
set N represents the
j) of nodes is in the
eceive packets directly
ication is unicast over
transmitted by node
N with (i, j) E.
pacity, of each unicast
is denoted by the pre-
Figure 1 illustrates an examp
{r, s}. The subgraph G
r
cons
p
r1
= {(r, i), (i,
p
r2
= {(r, i), (i,
and the subgraph G
s
consists
p
s1
= {(s, i), (i,
p
s2
= {(s, j), (j,
In this article, we assume th
prior knowledge about the ja
That is, we make no assump
method of attack, or mobility
number of jammers and their
network nodes. Instead of rel
jammers, we suppose that the
jamming impact in terms of t
Network nodes can then relay
source nodes in order to ass
Each time a new routing p
routing path is updated, the
will relay the necessary par
part of the reply message fo
information from the routin
thus provided with additional
impact on the individual node
III. CHARACTERIZ ING
In this section, we propose
to estimate and characterize
a source node to incorporate t
allocation. In order for a sour
jamming impact in the traffic
jamming on transmissions ov
estimated and relayed to s. H
mobility and the dynamic ef
local estimates need to be
with an example to illustrate
mobility on the traffic allocati
of continually updated local e
3
We assume that this capacity is
to the maximum packet rate for relia
do not address the analysis or estimati
in units of packets per second
3
.
subset S N generates data
e d
s
N . We assume that each
multiple routing paths to d
s
using
ilar to those of the DSR [9] or
P
s
= {p
s1
, . . . , p
sLs
} denote the
routing paths for source s, noting
be disjoint as in MP-DSR [11].
by a subset of directed link set
est to source s is given by the
Ls
#
i, j) p
s!
}, E
s
=
"
p
s!
!=1
mple network with sources S =
sists of the two routing paths
k), (k, m), (m, u)}
j), (j, n), (n, u)},
ts of the two routing paths
k), (k, m), (m, t)}
j, n), (n, m), (m, t)}.
hat the source nodes in S have no
jamming attack being performed.
umption about the jammers goals,
ility patterns. We assume that the
eir locations are unknown to the
elying on direct knowledge of the
e network nodes characterize the
the empirical packet delivery rate.
elay the relevant information to the
ssist in optimal traffic allocation.
path is requested or an existing
responding nodes along the path
rameters to the source node as
for the routing path. Using the
ng reply, each source node s is
al information about the jamming
es.
T HE IMPACT OF JAMMING
techniques for the network nodes
the impact of jamming and for
ate these estimates into its traffic
ource node s to incorporate the
allocation problem, the effect of
over each link (i, j) E
s
must be
However, to capture the jammer
ffects of the jamming attack, the
e continually updated. We begin
ate the possible effects of jammer
cation problem and motivate the use
estimates.
an available constant which corresponds
eliable transport over each wireless link. We
ation of this link capacity parameter.
Page 25 of 165

Figure 2 illustrates a single-sour


three routing paths p1
= {
Fig. 2. An example network that illustrates a s
three routing paths. Each unicast link (i, j ) is labele
link capacity cij in units of packets per second. The
to nodes x and y impedes packet delivery over the
the jammer mobility affects the allocation of traffic
function of time.
A. Illustrating the Effect of Jammer M
Throughput
p
2
= {(s, y), (y, b), (b, d)} and p
3
= {
The label on each edge (i, j) is the link ca
the maximum number of packets per sec
can be transported over the wireless li
we assume that the source is generatin
300 pkts/s. In the absence of jamm
continuously send 100 pkts/s over each
yielding a throughput rate equal to the s
of 300 pkts/s. If a jammer near node
high power, the probability of successfu
referred to as the packet success rate,
drops to nearly zero, and the traffic flow to
200 pkts/s. If the source node becomes
the allocation of traffic can be changed
each of paths p
2
and p
3
, thus recoverin
attack at node x. However, this one-time
source node s does not adapt to the pote
jammer. If the jammer moves to node y
rate over (s, x) returns to one and th
to zero, reducing the throughput to nod
which is less than the 200 pkts/s that
using the original allocation of 100 pkts
three paths. Hence, each node must relay
packet success rate to the source node s
use this information to reallocate traffic
if the effect of the attack is to be miti
information from the nodes can be done
instants when the packet success rates
These updates must be performed at a rate
rate of the jammer movement to provide
against the mobile jamming attack.
Next, suppose the jammer continually c
between nodes x and y, causing the
over links (s, x) and (s, y) to oscillate
one. This behavior introduces a high degr
the observed packet success rates, lead
estimate of the future success rates over
ource network with
{(s, x), (x, b), (b, d)},
ij
single-source network with
eled with the corresponding
he proximity of the jammer
he corresponding paths, and
fic to the three paths as a
Mobility on Network
{(s, z), (z, b), (b, d)}.
capacity c
ij
indicating
econd (pkts/s) which
link. In this example,
ng data at a rate of
mming, the source can
each of the three paths,
source generation rate
x is transmitting at
ful packet reception,
te, over the link (s, x)
w to node d reduces to
es aware of this effect,
ed to 150 pkts/s on
ng from the jamming
e re-allocation by the
tential mobility of the
y, the packet success
hat over (s, y) drops
node d to 150 pkts/s,
at would be achieved
s/s over each of the
elay an estimate of its
and the source must
fic in a timely fashion
itigated. The relay of
periodically or at the
change significantly.
ate comparable to the
e an effective defense
ally changes position
packet success rates
cillate between zero and
gree of variability into
ing to a less certain
er the links (s, x) and
Fig. 3. The estimation update pro
estimate ij (t) is updated every T

2
(t) is computed only every Ts second
source nodes every Ts seconds.
(s, y). However, since the pac
has historically been more stea
option. Hence, the source s can
and partition the remaining 10
p
2
. This solution takes into
in the packet success rates du
following section, we build o
of parameters to be estimated
for the sources to aggregate t
the available paths on the bas
B. Estimating Local Packet Su
We let x
ij
(t) denote the
(i, j) E at time t, notin
analytically as a function o
of node i, the signal power
distances from node j, and
wireless medium. In reality, h
jammers are often unknown, a
analytical model is not applica
the jamming impact, we mod
as a random process and allo
empirical data in order to cha
that each node j maintains an
success rate xij (t) as well as
characterize the estimate unce
We propose the use of
allowing each node j to period
as a function of time. As illu
that each node j updates the
period of T seconds and rela
source node s after each upda
seconds. The shorter update
node j to characterize the va
relay period of T
s
seconds, a
We propose the use of the
(PDR) to compute the estim
corporates additional factors
shown by extensive experim
4
At a time instant t, the estimate
define a random variable describing the
This random variable can be appropr
[15], though the results of this articl
ij
ij
ij
process is illustrated for a single link. The
T seconds, and the estimation variance
econds. Both values are relayed to relevant
acket success rate over link (s, z)
teady, it may be a more reliable
can choose to fill p
3
to its capacity
100 pkts/s equally over p
1
and
to account the historic variability
due to jamming mobility. In the
on this example, providing a set
ated by network nodes and methods
ate this information and characterize
sis of expected throughput.
Success Rates
e packet success rate over link
ng that x
ij
(t) can be computed
of the transmitted signal power
wer of the jammers, their relative
d the path loss behavior of the
however, the locations of mobile
, and, hence, the use of such an
licable. Due to the uncertainty in
model the packet success rate x
ij
(t)
ow the network nodes to collect
racterize the process. We suppose
an estimate
ij
(t) of the packet
as a variance parameter
2
(t) to
certainty and process variability
4
.
a recursive update mechanism
odically update the estimate
ij
(t)
ustrated in Figure 3, we suppose
estimate
ij
(t) after each update
elays the estimate to each relevant
update relay period of T
s
# T
ate period of T seconds allows each
ariation in x
ij
(t) over the update
a key factor in
2
(t).
e observed packet delivery ratio
mate
ij
(t). While the PDR in-
such as congestion, it has been
mentation [8] that such factors
e ij (t) and estimation variance
2
(t)
he current view of the packet success rate.
ppropriately modeled as a beta random variable
ticle do not require such an assumption.
Page 26 of 165

2
ij
do not affect the PDR in a similar m
we propose to average the empirical PDR
to smooth out the relatively short-term
noise or fading. During the update perio
time interval [t - T, t], each node j can
r
ij
([t - T, t]) of packets received over
number v
ij
([t - T, t]) : r
ij
([t - T, t]) o
pass an error detection check
5
. The PDR
the update period [t - T, t], denoted P DR
equal to the ratio
v
ij
([t -
P DR
ij
([t - T, t]) =
ij
([t -
This PDR can be used to update the e
end of the update period. In order to
variation in the estimate
ij
(t) and to i
the jamming attack history, we suggest
weighted moving average (EWMA) [16] to

ij
(t) as a function of the previous estim

ij
(t) =
ij
(t - T ) + (1 - )P DR
where [0, 1] is a constant weight ind
preference between current and historic s
We use a similar EWMA process to
ij
(t) at the end of each update relay p
Since this variance is intended to captur
packet success rate over the last T
s
second
sample variance V
ij
([t - T
s
, t]) of the s
ratios computed using (1) during the inte
Vij ([t - Ts, t]) =V ar {P DRij ([t - kT
k = 0, . . . , &T
s
/T ' -
The estimation variance
2
(t) is thus defi
the previous variance
2
(t - T
s
) as
ij

2 2
ij
(t) =
ij
(t - T
s
) + (1 - )V
i
where [0, 1] is a constant weight sim
The EWMA method is widely used in s
processes, including estimation of the round
in TCP [17]. We note that the parameters
allow for design of the degree of histor
in the parameter estimate updates, and
themselves be functions (t) and (t) o
decreasing the parameter allows the m
more rapidly with the PDR due to ja
decreasing the parameter allows the va
more preference to variation in the most
period over historical variations. We furth
date period T and update relay period T
s
updates of the parameter estimates have
on the quality of the estimate. In partic
period T
s
is too large, the relayed estimates
will be outdated before the subsequent upd
5
In the case of jamming attacks which prevent t
detecting transmissions by node i, additional hea
periodically exchanged between nodes i and j to
total number of transmissions, yielding the same ov
ij
ij
manner. Furthermore,
PDR values over time
m variations due to
od represented by the
can record the number
er link (i, j) and the
of valid packets which
PDR over link (i, j) for
R
ij
([t - T, t]), is thus
- T, t])
Furthermore, if the update pe
the dynamics of the jamming
the large number of samples r
T and T
s
must thus be short
of the jamming attack. Howev
T
s
between successive updates
increases the communication
there exists a trade-off between
the choice of the update per
of the update relay period T
s
and jammer mobility models.
. (1)
- T, t])
of the update relay period T
s
Using the above formulation
estimate
ij
(t) at the
to prevent significant
to include memory of
using an exponential
to update the estimate
mate
ij
(t - T ) as
R
ij
([t - T, t]), (2)
ndicating the relative
samples.
update the variance
period of T
s
seconds.
ure the variation in the
onds, we consider the
set of packet delivery
terval [t - T
s
, t] as
T , t - kT + T ]) :
is requested or an existing rou
along the path will include th
part of the reply message. In
source node s uses these esti
packet success rates over each
C. Estimating End-to-End Pac
Given the packet success
for the links (i, j) in a routi
to estimate the effective end
determine the optimal traffic all
time required to transport pac
corresponding destination d
s
update relay period T
s
, we dro
the end-to-end packet success

ij
and
2
. The end-to-end
p
s!
can be expressed as the produ
- 1} . (3)
efined as a function of
y
s!
=
(i
ij
([t - T
s
, t]), (4)
milar to in (2).
sequential estimation
round-trip time (RTT)
s in (2) and in (4)
orical content included
these parameters can
of time. For example,
mean
ij
(t) to change
jammer mobility, and
ariance
2
(t) to give
t recent update relay
which is itself a random varia
each x
ij
. We let
s!
denote th
denote the covariance of y
s!
a
Due to the computational burd
inference of correlation between
we let the source node s assum
mutually independent, even t
We maintain this independe
work, yielding a feasible approx
of correlated random variab
inference of the relevant corr
Under this independence ass
given in (5) is equal to the produ
her note that the up-
between subsequent

s!
=
(i
significant influence
ticular, if the update
and the covariance
s!m
=
similarly given by
ates
ij
(t) and
ij
(t) $
update at time t + T
s
.
the receiving node j from
eader information can be

s!m
=
(i,j)ps!psm

ij
(i,j)
o achieve the convey the
overall effect.
6
If the x
ij
are modeled as beta rando
approximated by a beta random variabl
ij
ij
eriod T at each node is too large,
g attack may be averaged out over
r
ij
([t- T, t]). The update periods
enough to capture the dynamics
ver, decreasing the update period
ates to the source node necessarily
overhead of the network. Hence,
etween performance and overhead in
riod T
s
. We note that the design
s
depends on assumed path-loss
. The application-specific tuning
s
is not further herein.
on, each time a new routing path
routing path is updated, the nodes
he estimates
ij
(t) and
2
(t) as
n what follows, we show how the
timates to compute the end-to-end
each path.
cket Success Rates
rate estimates
ij
(t) and
2
(t)
ting path p
s!
, the source s needs
nd-to-end packet success rate to
fic allocation. Assuming the total
ackets from each source s to the
s
is negligible compared to the
drop the time index and address
s rates in terms of the estimates
packet success rate y
s!
for path
product
$
i,j)ps!
x
ij
, (5)
iable
6
due to the randomness in
he expected value of y
s!
and
s!m
and y
sm
for paths p
s!
, p
sm
P
s
.
urden associated with in-network
etween estimated random variables,
ume the packet success rates x
ij
as
though they are likely correlated.
ence assumption throughout this
pproximation to the complex reality
bles, and the case of in-network
orrelation is left as future work.
ssumption, the mean
s!
of y
s!
product of estimates
ij
as
$
i,j)ps!

ij
, (6)
= E[y
s!
y
sm
] - E[y
s!
]E[y
sm
] is
$
%

2 2
&
- .
)ps!psm
ij
+
ij
s! sm
(7)
andom variables, the product y
s!
is well-
iable [18].
Page 27 of 165

s! s!
y
(i)
s! s!
s!
s!

s!
y
(i)
s!
In (7), denotes the exclusive-OR set operator such that an value and variance. The mean
(i)
and variance (
(i)
)
2
of
element is in A B if it is in either A or B but not both.
s!
can be computed using (6) and (7), respectively, with p
s!
The covariance formula in (7) reflects the fact that the end-
replaced by the sub-path p
(i) (i)
to-end packet success rates y
s!
and y
sm
of paths p
s!
and p
sm
with shared links are correlated even when the rates x
ij
are
s!
. We thus replace y
s!
in (8) with
the statistic
(i)
+
(i)
, where > 0 is a constant which can
be tuned bas on t rance to delay resulting from capacity
independent. We note that the variance
2
of the end-to-end
ed ole
s!
rate y
s!
can be computed using (7) with / = m.
Let
s
denote the L
s
1 vector of estimated end-to-end
violations
7
. We let Ws denote the |E| Ls weighted link-path
incidence matrix for source s with rows indexed by links (i, j)
and columns indexed by paths p
s!
. The element w((i, j), p
s!
)
packet success rates
s!
computed using (6), and let
s
denote the L
s
L
s
covariance matrix with (/, m) entry
s!m
in row (i, j) and column p
s!
of W
s
is thus given by
computed using (7). The estimate pair (
s
,
s
) provides the
(
min
)
1,
(i)
+
(i)
*
, if (i, j) p
s!
sufficient statistical characterization of the end-to-end packet
success rates for source s to allocate traffic to the paths in
P
s
. Furthermore, the off-diagonal elements in
s
denote the
w ((i, j), p
s!
) =
s! s!
0, otherwise.
extent of mutual overlap between the paths in P
s
.
IV. OPT IMAL JAMMING-AWARE TRAFFIC
ALLOCATION
In this section, we present an optimization framework for
jamming-aware traffic allocation to multiple routing paths in
P
s
for each source node s S . We develop a set of constraints
imposed on traffic allocation solutions and then formulate a
utility function for optimal traffic allocation by mapping the
problem to that of portfolio selection in finance. Letting
s!
denote the traffic rate allocated to path p
s!
by the source node
s, the problem of interest is thus for each source s to determine
the optimal L
s
1 rate allocation vector
s
subject to network
flow capacity constraints using the available statistics
s
and

s
of the end-to-end packet success rates under jamming.
A. Traffic Allocation Constraints
In order to define a set of constraints for the multiple-path
traffic allocation problem, we must consider the source data
rate constraints, the link capacity constraints, and the reduction
of traffic flow due to jamming at intermediate nodes. The traf-
fic rate allocation vector
s
is trivially constrained to the non-
negative orthant, i.e.
s
> 0, as traffic rates are non-negative.
Assuming data generation at source s is limited to a maximum
data rate Rs, the rate allocation vector is also constrained as
1
T

s
: R
s
. These constraints define the convex space
s
of feasible allocation vectors
s
characterizing rate allocation
solutions for source s.
Due to jamming at nodes along the path, the traffic rate is
potentially reduced at each receiving node as packets are lost.
Hence, while the initial rate of
s!
is allocated to the path,
the residual traffic rate forwarded by node i along the path
Letting c denote the |E| 1 vector of link capacities c
ij
for (i, j ) E , the link capacity constraint in (8) including
expected packet loss due to jamming can be expressed by the
vector inequality
'
Wss : c, (10)
sS
which is a linear constraint in the variable
s
. We note that
this statistical constraint formulation generalizes the standard
network flow capacity constraint corresponding to the case of
x
ij
= 1 for all (i, j) E in which the incidence matrix W
s
is deterministic and binary.
B. Optimal Traffic Allocation Using Portfolio Selection Theory
In order to determine the optimal allocation of traffic to the
paths in P
s
, each source s chooses a utility function U
s
(
s
)
that evaluates the total data rate, or throughput, successfully
delivered to the destination node d
s
. In defining our utility
function U
s
(
s
), we present an analogy between traffic allo-
cation to routing paths and allocation of funds to correlated
assets in finance.
In Markowitzs portfolio selection theory [12], [13], an
investor is interested in allocating funds to a set of financial
assets that have uncertain future performance. The expected
performance of each investment at the time of the initial
allocation is expressed in terms of return and risk. The return
on the asset corresponds to the value of the asset and measures
the growth of the investment. The risk of the asset corresponds
to the variance in the value of the asset and measures the
degree of variation or uncertainty in the investments growth.
We describe the desired analogy by mapping this allocation
of funds to financial assets to the allocation of traffic to routing
p
s!
may be less than
s!
. Letting p
(i)
denote the sub-path paths. We relate the expected investment return on the financial
of p
s!
from source s to the intermediate node i, the residual
traffic rate forwarded by node i is given by y
(i)

s!
, where y
(i)
portfolio to the estimated end-to-end success rates
s
and the
investment risk of the portfolio to the estimated success rate
s! s!
is computed using (5) with p
s!
replaced by the sub-path p
(i)
.
The capacity constraint on the total traffic traversing a link
(i, j) thus imposes the stochastic constraint
' '
s!
: c
ij
(8)
sS !:(i,j)ps!
covariance matrix
s
. We note that the correlation between
related assets in the financial portfolio corresponds to the
correlation between non-disjoint routing paths. The analogy
between financial portfolio selection and the allocation of
traffic to routing paths is summarized below.
7
The case of = 0 corresponds to the average-case constraint and will
on the feasible allocation vectors
s
. To compensate for the
(i) (i)
randomness in the capacity constraint in (8), we replace the
residual packet success rate y
(i)
with a function of its expected
.
Page 28 of 165
lEEE/ACM TRANSACTlONS ON NETWORKlNG,

s
Portfolio Selection Traffic
Funds to be invested Source d
Financial assets Routing
Expected Asset return Expected Packet
Investment portfolio Traffic all
Portfolio return Mean throughpu
Portfolio risk Estimation varia
As in Markowitzs theory, we define a c
factor k
s
> 0 for source s S to ind
for source s to allocate resources to le
lower throughput variance. This risk-aver
the trade-off between expected throughpu
variance. We note that each source s can
risk-aversion factor, and a source may v
factor k
s
with time or for different types
traffic rate allocation vector
s
, the expected
for source s is equal to the vector inner
corresponding variance in the throughput
the uncertainty in the estimate s is equal

T
ss. Based on the above analogy ma
selection theory, we define the utility
source s as the weighted sum
U
s
(
s
) =
T

s
- k
s

s s
Setting the risk-aversion factor ks to ze
source s is willing to put up with any amoun
in the estimate
s
of the end-to-end succe
the expected throughput. The role of the
is thus to impose a penalty on the object
tional to the uncertainty in the estimatio
narrowing the gap between expected throughpu
throughput. The cases of k
s
= 0 and k
s
detail in Section V.
Combining the utility function in (11)
straints defined in Section IV-A yields the
aware traffic allocation optimization prob
find the globally optimal traffic allocatio
sources.
Optimal Jamming-Aware Traffic


= arg max
!

T
s - ks
s
{s}
sS
s.t.
!
Wss : c
sS
1
T

s
: R
s
for all s S
0 :
s
for all s S .
Since the use of centralized protocols
may be undesirable due to excessive commun
in large-scale wireless networks, we seek
lation for the optimal traffic allocation prob
C. Optimal Distributed Traffic Allocation
In the distributed formulation of the al
s determines its own traffic allocation
s
,
message passing between sources. By
that the optimal jamming-aware flow all
(12) is similar to the network utility m
NETWORKlNG, VOL. 19, NO. 1, FEB 2011
s
s
s
s,n
Allocation
data rate Rs
g paths Ps
et success rate
s!
llocation s
oughput
T
s
ariance
T
ss
constant risk-aversion
formulation of the basic maxi
We thus develop a distributed
Lagrangian dual decompositio
The dual decomposition tec
the capacity constraint in (10

ij
corresponding to each li
|E| 1 vector of link prices
the optimization problem in (12
ndicate the preference
L(, ) =
'

T T
to less risky paths with
rsion constant weighs
sS
s

s
- k
s

s
hroughput and estimation
can choose a different
vary the risk-aversion
of data. For a given
The distributed optimizati
using the Lagrangian dual m
of link prices
n
at iteration n,
optimization problem
ected total throughput


%
T
er product
T

s
. The
t for source s due to

s,n
= arg max
ss

s
-
to the quadratic term
aking use of portfolio
The link prices n+1 are then
iteration as
function U
s
(
s
) at

n+1
=
!

n
- a
!

s
. (11)
s
where a > 0 is a constant step
zero indicates that the
mount of uncertainty
ccess rates to maximize
e risk-aversion factor
jective function propor-
n process, potentially
hroughput and achieved
> 0 are compared in
) with the set of con-
e following jamming-
problem which aims to
on over the set S of
c Allocation

T
s s
is the element-wise projectio
In order to perform the local
exchange information about th
step. Since updating the link
expected link usage, sources
link usage vectors us,n = W
prices are consistently updated
ative optimization step can
vectors
s
converge
8
for all
+

s,n
-
s,n-1
+ : for all s
approach yields the following
jamming-aware flow allocation
Distributed Jamming-Awa
Initialize n = 1 with initial
1. Each source s independ


"
s

s,n
= arg max
T
s
ss
,
(12) 2. Sources exchange the
us,n = Ws

.
3. Each source locally upda
ls for source routing
n+1 =


$
n - a


$
c
ommunication overhead
eek a distributed formu-
problem in (12).
n using NUM
algorithm, each source
, ideally with minimal
y inspection, we see
allocation problem in
maximization (NUM)
4. If $
s,n
-
s,n-1
$ >
increment n and go to
Given the centralized opti
the above distributed formulati
allocation, a set of sources with

s
can proactively compens
on network traffic flow.
6
s
s,n
s,n
imum network flow problem [14].
ted traffic allocation algorithm using
on techniques [14] for NUM.
technique is derived by decoupling
(10) and introducing the link prices
link (i, j). Letting denote the

ij
, the Lagrangian L(, ) of
(12) is given by
! #
T
W .
s

s
+ c -
'
sS
s s
(13)
ization problem is solved iteratively
method as follows. For a given set
n n, each source s solves the local
T
&
- k
T

n
W
s s s

s
. (14)
updated using a gradient descent
!
c -
'
W
s


##
+
, (15)
sS
tep size and (v)
+
= max(0, v)
on into the non-negative orthant.
cal update in (15), sources must
he result of the local optimization
k prices depends only on the
must only exchange the |E| 1
Ws

to ensure that the link
ated across all sources. The iter-
be repeated until the allocation
all sources s S , i.e. when
s with a given > 0. The above
distributed algorithm for optimal
on.
are Traffic Allocation
itial link prices 1.
ndently computes
" #
-
T
Ws s - ks
T
ss.
s n s
link usage vectors
updates link prices as
+
$ %%
c -
!
us,n .
sS
> " for any s,
to step 1.
timization problem in (12) and
lation for jamming-aware traffic
with estimated parameters
s
and
sate for the presence of jamming
Page 29 of 165
lEEE/ACM TRANSACTlONS ON NETWORKlNG, VOL. 19, NO. 1, FEB 2011
7

D. Computational Complexity
We note that both the centralized optimization problem in
(12) and the local optimization step in the distributed algo-
rithm are quadratic programming optimization problems with
linear constraints [13]. The computational time required for
solving these problems using numerical methods for quadratic
programming is a polynomial function of the number of
optimization variables and the number of constraints.
In the centralized problem, there are
+
sS
|P
s
| optimiza-
tion variables corresponding to the number of paths available
to each of the sources. The number of constraints in the
centralized problem is equal to the total number of links
|
,
sS
E
s
|, corresponding to the number of link capacity
constraints. In the distributed algorithm, each source iteratively
solves a local optimization problem, leading to |S | decoupled
optimization problems. Each of these problems has |P
s
| opti-
mization variables and |E
s
| constraints. Hence, as the number
of sources in the network increases, the distributed algorithm
may be advantageous in terms of total computation time. In
what follows, we provide a detailed performance evaluation
of the methods proposed in this article.
VI. CONCLUSION
In this article, we studied the problem of traffic allocation in
multiple-path routing algorithms in the presence of jammers
whose effect can only be characterized statistically. We have
presented methods for each network node to probabilistically
characterize the local impact of a dynamic jamming attack
and for data sources to incorporate this information into
the routing algorithm. We formulated multiple-path traffic
allocation in multi-source networks as a lossy network flow
optimization problem using an objective function based on
portfolio selection theory from finance. We showed that this
centralized optimization problem can be solved using a dis-
tributed algorithm based on decomposition in network utility
maximization (NUM). We presented simulation results to
illustrate the impact of jamming dynamics and mobility on
network throughput and to demonstrate the efficacy of our
traffic allocation algorithm. We have thus shown that multiple-
REFERENCES
[1] I. F. Akyildiz, X. Wang, and W. Wang, Wireless mesh networks: A
survey, Computer Networks, vol. 47, no. 4, pp. 445487, Mar. 2005.
[2] E. M. Sozer, M. Stojanovic, and J. G. Proakis, Underwater acoustic
networks, IEEE Journal of Oceanic Engineering, vol. 25, no. 1, pp.
7283, Jan. 2000.
[3] R. Anderson, Security Engineering: A Guide to Building Dependable
Distributed Systems. John Wiley & Sons, Inc., 2001.
[4] J. Bellardo and S. Savage, 802.11 denial-of-service attacks: Real
vulnerabilities and practical solutions, in Proc. USENIX Security Sym-
posium, Washington, DC, Aug. 2003, pp. 1528.
[5] D. J. Thuente and M. Acharya, Intelligent jamming in wireless net-
works with applications to 802.11b and other networks, in Proc. 25th
IEEE Communications Society Military Communications Conference
(MILCOM06), Washington, DC, Oct. 2006, pp. 17.
[6] A. D. Wood and J. A. Stankovic, Denial of service in sensor networks,
IEEE Computer, vol. 35, no. 10, pp. 5462, Oct. 2002.
[7] G. Lin and G. Noubir, On link layer denial of service in data wireless
LANs, Wireless Communications and Mobile Computing, vol. 5, no. 3,
pp. 273284, May 2005.
[8] W. Xu, K. Ma, W. Trappe, and Y. Zhang, Jamming sensor networks:
Attack and defense strategies, IEEE Network, vol. 20, no. 3, pp. 4147,
May/Jun. 2006.
[9] D. B. Johnson, D. A. Maltz, and J. Broch, DSR: The Dynamic Source
Routing Protocol for Multihop Wireless Ad Hoc Networks. Addison-
Wesley, 2001, ch. 5, pp. 139172.
[10] E. M. Royer and C. E. Perkins, Ad hoc on-demand distance vector
routing, in Proc. 2nd IEEE Workshop on mobile Computing Systems
and Applications (WMCSA99), New Orleans, LA, USA, Feb. 1999, pp.
90100.
[11] R. Leung, J. Liu, E. Poon, A.-L. C. Chan, and B. Li, MP-DSR: A QoS-
aware multi-path dynamic source routing protocol for wireless ad-hoc
networks, in Proc. 26th Annual IEEE Conference on Local Computer
Networks (LCN01), Tampa, FL, USA, Nov. 2001, pp. 132141.
[12] H. Markowitz, Portfolio selection, The Journal of Finance, vol. 7,
no. 1, pp. 7792, Mar. 1952.
[13] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, 2004.
Page 30 of 165
PATH FINDING MOBILE ROBOT INCORPORATING TWO WHEEL
DIFFERENTIAL DRIVE AND BOOLEAN LOGIC FOR
OBSTACLE AVOIDANCE
S.Venkatesan,
AP/CSE
Gojan School of Business &
Technology.
selvamvenkatesan@gmail.com
A.C.Arun
IV CSE
Gojan School of Business &
Technology.
acarun90@gmail.com
R.Rajesh Manikandan
IV CSE
Gojan School of Business &
Technology.
rajesh.rajesh269@gmail.com
Abstract
Mobile robots have the capability to move
around in their environment and are not
fixed to one physical location. The robot
could sense its surroundings with the aid of
various electronic sensors while mechanical
actuators were used to move it around. In
this research work we are focusing on
building a Obstacle Avoiding Robot using
Boolean Logic by deducing a truth table.
The design consisted of two main sections:
Electronic analysis of the various robot
sensors and Boolean logic used to interface
the sensors with the robots actuators. The
prototype is build using infra red sensor with
comparator circuit and DC motor. In this
paper its shown that obstacle detection
using IR-Phototransistor sensors and motors
act as actuators in turning to the next
position. This system can be further
enhanced by providing an external
monitoring control to the robot.
Keywords: Robot, Obstacle Avoidance,
Boolean logic.
Introduction
Obstacle avoidance is one of the most
critical factors in the design of autonomous
vehicles such as mobile robots. One of the
major challenges in designing intelligent
vehicles capable of autonomous travel on
highways is reliable obstacle avoidance
system. Obstacle avoidance system may be
divided into two parts, obstacle detection
(mechanism, hardware, sensors) and
avoidance control.
The traditional artificial intelligence
approach to building a control system for an
autonomous robot is to break the task into a
number of subsystems. These subsystems
typically include perception, world
modeling, planning, task execution and
motor control. The subsystems can be
thought of as a series of vertical slices with
sensor inputs on the left and actuator outputs
on the right. The disadvantage of this
approach, however, is that all of these
subsystems must work correctly for the
robot to function at all. To overcome this we
provide an external monitoring control.
Differential Drive
Differential drive is a method of controlling
a robot with only two motorized wheels.
What makes this algorithm important for a
robot builder is that it is also the simplest
control method for a robot. The term
'differential' means that robot turning speed
is determined by the speed difference
Page 31 of 165
between both wheels, each on either side of
your robot. For example: keep the left wheel
still, and rotate the right wheel forward, and
the robot will turn left. If you are clever with
it, or use PID control, you can get interesting
curved paths just by varying the speeds of
both wheels over time. Dont want to turn?
As long as both wheels go at the same
speed, the robot does not turn - only going
forward or reverse.
The differential drive algorithm is useful for
light chasing robots. This locomotion is the
most basic of all types, and is highly
recommended for beginners. Mechanical
construction, as well as the control
algorithm, cannot get any simpler than this
Operating Logic
Pseudocode:
Input sensor reading
Make decision based on sensor
reading
Do one of below actions:
To drive straight both wheels move
forward at same speed
To drive reverse both wheels move
back at same speed
To turn left the left wheel moves in
reverse and the right wheel moves forward
To turn right the right wheel moves
in reverse and the left wheel moves forward
Circuit Design
Sensor circuit
The Infrared emitter detector circuit is used
for a robot with basic object or obstacle
detection. Infrared emitter detector pair
sensors are fairly easy to implement,
although involved some level
of testing and calibration to get right. They
can be used for obstacle detection, motion
detection, transmitters, encoders, and color
detection (such as for line following).
Infrared Emitter Detector Basic Circuit
R1 is to prevent the emitter
(clear) LED from melting itself. Look at the
emitter spec sheet to find maximum power.
Make sure you choose an R1 value so
that Vcc^2/R1 < Power_spec. Or just use R1
= 120 ohms if you are lazy and trust me.
R2 should be larger then the maximum
resistance of the detector. Measure the
resistance of the detector (black) when it is
pointing into a dark area and
Control Logic Design
The output of the sensor circuit is taken and
the truth table is built as follows, Where M
+
and M
-
are the terminals of the motor. Based
Page 32 of 165
on the above truth the logic expression for
M
+
and M
-
are deduced using any of the
truth table minimization technique.
Conclusion
We have completed and tested the final
circuit and analyzed the results. The Robot
is found to of good mark unless and until it
is not in the range of sunlight. This can be
enhanced by using ultrasonic sensors. And
also an external monitoring assistance can
be given to avoid deadlock in unfriendly
environments.
References
[1] Development of a Navigation System for
Mobile Robots Using Different Patterns of
Behavior Based on Fuzzy Logic Villaseor-
Carrillo, U.G.Sotomayor-Olmedo, A.;
Gorrostieta-Hurtado, E.; Pedraza-Ortega,
J.C.; Aceves-Fernandez, M.A.; Delgado-
Rosas, M.; Electronics, Robotics and
Page 33 of 165
Automotive Mechanics Conference
(CERMA), 2010
Digital Object Identifier
10.1109CERMA.2010.57 Publication Year
2010 , Page(s) 451 456
[2]Differential Drive Wheeled Mobile Robot
(WMR) Control Using Fuzzy Logic
Techniques Rashid, Razif; Elamvazuthi,
Irraivan; Begam, Mumtaj; Arrofiq, M.;
Mathematical/Analytical Modelling and
Computer Simulation (AMS), 2010 Fourth
Asia International Conference on Digital
Object Identifier: 10.1109/AMS.2010.23
Publication Year: 2010 , Page(s): 51 55
[3]Autonomous navigation of a
nonholonomic mobile robot in a complex
environment Kokosy, A.; Defaux, F.-O.;
Perruquetti, W.; Safety, Security and Rescue
Robotics, 2008. SSRR 2008. IEEE
International Workshop on Digital Object
Identifier: 10.1109/SSRR.2008.4745885
Publication Year: 2008 , Page(s): 102 108
[4]Artificial Immune Algorithm based robot
obstacle-avoiding path planning
Zeng Dehuai; Xu Gang; Xie Cunxi; Yu
degui;
Automation and Logistics, 2008. ICAL
2008. IEEE International Conference on
Digital Object Identifier:
10.1109/ICAL.2008.4636259
Publication Year: 2008 , Page(s): 798 - 803
[5]Studying on path planning and dynamic
obstacle avoiding of soccer robot
Tang Ping; Zhang Qi; Yang Yi Min;
Intelligent Control and Automation, 2000.
Proceedings of the 3rd World Congress on
Volume: 2 Digital Object Identifier:
10.1109/WCICA.2000.863442 Publication
Year: 2000 , Page(s): 1244 - 1247 vol.2
Cited by: 1
[6]Collision-free curvature-bounded smooth
path planning using composite Bezier curve
based on Voronoi diagram
Yi-Ju Ho; Jing-Sin Liu; Computational
Intelligence in Robotics and Automation
(CIRA), 2009 IEEE International
Symposium on Digital Object Identifier:
10.1109/CIRA.2009.5423161
Publication Year: 2009 , Page(s): 463 - 468
Cited by: 2
[7]A motion planning method for an AUV
Arinaga, S.; Nakajima, S.; Okabe, H.;
OnoA.; Kanayama, Y.; Autonomous
Underwater Vehicle Technology, 1996.
AUV '96., Proceedings of the 1996
Symposium on Digital Object Identifier:
10.1109/AUV.1996.532450
Publication Year: 1996 , Page(s): 477 - 484
[8]Mobile Robot Path Planning Base on the
Hybrid Genetic Algorithm in Unknown
Environment Yong Zhang; Lin Zhang;
Xiaohua Zhang; Intelligent Systems Design
and Applications, 2008. ISDA '08. Eighth
International Conference on Volume: 2
Digital Object Identifier:
10.1109/ISDA.2008.18
Publication Year: 2008 , Page(s): 661 - 665
Cited by: 1
[9]Robot motion planning for time-varying
obstacle avoidance using view-time concept
Nak Yong Ko; Bum Hee Lee; Myoung Sam
Ko; Yun Seok Nam; Industrial Electronics,
1992., Proceedings of the IEEE International
Symposium on Digital Object Identifier:
10.1109/ISIE.1992.279551
Publication Year: 1992 , Page(s): 366 - 370
vol.1
Page 34 of 165
T
File sharing in Unstructured Peer-to-Peer
Network Using Sampling Technique
Ms. P. Preethi Rebecca, Asst.Professor / CSE , St. Peters University, Chennai.
M.ARUNA M.E (CSE) , St. Peters University, Chennai
AbstractThis paper presents a detailed examination of how
the dynamic and heterogeneous nature of real-world peer-to-peer
systems can introduce bias into the selection of representative
samples of peer properties (e.g., degree, link bandwidth, number
of files shared). We propose the Metropolized Random Walk with
Backtracking (MRWB) as a viable and promising technique for
collecting nearly unbiased samples and conduct an extensive
simulation study to demonstrate that our technique works well
for a wide variety of commonly-encountered peer-to-peer net-
work conditions. We have implemented the MRWB algorithm
for selecting peer addresses uniformly at random into a tool
any of the present peers with equal probability. The addresses
of the resulting peers may then be used as input to another mea-
surement tool to collect data on particular peer properties (e.g.,
degree, link bandwidth, number of files shared). The focus of
our work is on unstructured P2P systems, where peers select
neighbors through a predominantly random process. Most pop-
ular P2P systems in use today belong to this unstructured cate-
gory. For structured P2P systems such as Chord [1] and CAN
[2], knowledge of the structure significantly facilitates unbiased
called I on
Sampl er
. Using the Gnutella network, we empirically
sampling as we discuss in Section VII.
show that - yields more accurate samples than tools
that rely on commonly-used sampling techniques and results in
dramatic improvements in efficiency and scalability compared to
performing a full crawl.
Index TermsPeer-to-peer, sampling.
I. INTRODUCTION
HE popularity and wide-spread use of peer-to-peer sys-
tems has motivated numerous empirical studies aimed at
providing a better understanding of the properties of deployed
peer-to-peer systems. However, due to the large scale and highly
dynamic nature of many of these systems, directly measuring
the quantities of interest on every peer is prohibitively expen-
sive. Sampling is a natural approach for learning about these
systems using light-weight data collection, but commonly-used
sampling techniques for measuring peer-to-peer systems tend
to introduce considerable bias for two reasons. First, the dy-
namic nature of peers can bias results towards short-lived peers,
much as naively sampling flows in a router can lead to bias to-
wards short-lived flows. Second, the heterogeneous nature of the
overlay topology can lead to bias towards high-degree peers.
In this paper, we are concerned with the basic objective of
devising an unbiased sampling method, i.e., one which selects
Manuscript received March 25, 2007; revised January 23, 2008; approved by
IEEE/ACM TRANSACTIONS ON NETWORKING Editor L. Massoulie. First pub-
lished October 03, 2008; current version published April 15, 2009. This material
is based upon work supported in part by the National Science Foundation (NSF)
under Grant Nets-NBD-0627202 and an unrestricted gift from Cisco Systems.
Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the authors and do not necessarily reflect the views of the
NSF or Cisco. An earlier version of this paper appeared in the Proceedings of
the ACMSIGCOMM Internet Measurement Conference 2006.
D. Stutzbach is with Stutzbach Enterprises, LLC, Dallas, TX 75206 USA
(e-mail: daniel@stutzbachenterprises.com; http://stutzbachenterprises.com).
R. Rejaie is with the Department of Computer Science, Universityof Oregon,
Eugene, OR 97403-1202 USA (e-mail: reza@cs.uoregon.edu).
N. Duffield, S. Sen, and W. Willinger are with AT&T LabsResearch,
Florham Park, NJ 07932 USA (e-mail: duffield@research.att.com; sen@re-
search.att.com; walter@research.att.com).
Digital Object Identifier 10.1109/TNET.2008.2001730
Achieving the basic objective of selecting any of the peers
present with equal probability is non-trivial when the structure
of the peer-to-peer system changes during the measurements.
First-generation measurement studies of P2P systems typically
relied on ad-hoc sampling techniques (e.g., [3], [4]) and pro-
vided valuable information concerning basic system behavior.
However, lacking any critical assessment of the quality of these
sampling techniques, the measurements resulting from these
studies may be biased and consequently our understanding of
P2P systems may be incorrect or misleading. The main contri-
butions of this paper are (i) a detailed examination of the ways
that the topological and temporal qualities of peer-to-peer sys-
tems (e.g., Churn [5]) can introduce bias, (ii) an in-depth ex-
ploration of the applicability of a sampling technique called the
Metropolized Random Walk with Backtracking (MRWB), repre-
senting a variation of the MetropolisHastings method [6][8],
and (iii) an implementation of the MRWB algorithm into a tool
called - . While sampling techniques based on the
original MetropolisHastings method have been considered ear-
lier (e.g., see Awan et al. [9] and Bar-Yossef and Gurevich [10]),
we show that in the context of unstructured P2P systems, our
modification of the basic MetropolisHastings method results
in nearly unbiased samples under a wide variety of commonly
encountered peer-to-peer network conditions.
The proposed MRWB algorithm assumes that the P2P
system provides some mechanism to query a peer for a list of
its neighborsa capability provided by most widely deployed
P2P systems. Our evaluations of the - tool shows
that the MRWB algorithm yields more accurate samples than
previously considered sampling techniques. We quantify the
observed differences, explore underlying causes, address the
tools efficiency and scalability, and discuss the implications on
accurate inference of P2P properties and high-fidelity modeling
of P2P systems. While our focus is on P2P networks, many of
our results apply to any large, dynamic, undirected graph where
nodes may be queried for a list of their neighbors.
After discussing related work and alternative sampling tech-
niques in Section II, we build on our earlier formulation in [11]
Page 35 of 165
and focus on sampling techniques that select a set of peers uni-
formly at random from all the peers present in the overlay and
then gather data about the desired properties from those peers.
While it is relatively straightforward to choose peers uniformly
at random in a static and known environment, it poses consid-
erable problems in a highly dynamic setting like P2P systems,
which can easily lead to significant measurement bias for two
reasons.
The first cause of sampling bias derives from the temporal
dynamics of these systems, whereby new peers can arrive and
existing peers can depart at any time. Locating a set of peers and
measuring their properties takes time, and during that time the
peer constituency is likely to change. In Section III, we show
how this often leads to bias towards short-lived peers and ex-
plain howto overcome this difficulty.
The second significant cause of bias relates to the connectivity
structure of P2P systems. As a sampling program explores a
given topological structure, each traversed link is more likely to
lead to a high-degree peer than a low-degree peer, significantly
biasing peer selection. We describe and evaluate different tech-
niques for traversing static overlays to select peers in Section IV
and find that the Metropolized Random Walk (MRW) collects
unbiased samples.
In Section V, we adapt MRWfor dynamic overlays by adding
backtracking and demonstrate its viability and effectiveness
when the causes for both temporal and topological bias are
present. We show via simulations that the MRWB technique
works well and produces nearly unbiased samples under a
variety of circumstances commonly encountered in actual P2P
systems.
Finally, in Section VI we describe the implementation of the
- tool based on the proposed MRWB algorithm and
empirically evaluate its accuracy and efficiency through com-
parison with complete snapshots of Gnutella taken with Cruiser
[12], as well as with results obtained from previously used,
more ad-hoc, sampling techniques. Section VII discusses some
important questions such as how many samples to collect and
outlines a practical solution to obtaining unbiased samples for
structured P2P systems. Section VIII concludes the paper by
summarizing our findings and plans for future work.
II. RELATED WORK
A. Graph Sampling
The phrase graph sampling means different things in dif-
ferent contexts. For example, sampling from a class of graphs
has been well studied in the graph theory literature [13], [14],
where the main objective is to prove that for a class of graphs
sharing some property (e.g., same node degree distribution), a
given random algorithm is capable of generating all graphs in
the class. Cooper et al. [15] used this approach to show that
their algorithm for overlay construction generates graphs with
good properties. Our objective is quite different; instead of sam-
pling a graph from a class of graphs our concern is sampling
peers (i.e., vertices) from a largely unknown and dynamically
changing graph. Others have used sampling to extract informa-
tion about graphs (e.g., selecting representative subgraphs from
a large, intractable graph) while maintaining properties of the
original structure [16][18]. Sampling is also frequently used
as a component of efficient, randomized algorithms [19]. How-
ever, these studies assume complete knowledge of the graphs in
question. Our problemis quite different in that we do not know
the graphs in advance.
A closely related problemto ours is sampling Internet routers
by running traceroute from a few hosts to many destinations for
the purpose of discovering the Internets router-level topology.
Using simulation [20] and analysis [21], research has shown
that traceroute measurements can result in measurement bias
in the sense that the obtained samples support the inference of
power law-type degree distributions irrespective of the true na-
ture of the underlying degree distribution. A common feature
of our work and the study of the traceroute technique [20], [21]
is that both efforts require an evaluation of sampling techniques
without complete knowledge of the true nature of the underlying
connectivity structure. However, exploring the router topology
and P2P topologies differ in their basic operations for graph-ex-
ploration. In the case of traceroute, the basic operation is What
is the path to this destination? In P2P networks, the basic oper-
ation is What are the neighbors of this peer? In addition, the
Internets router-level topology changes at a much slower rate
than the overlay topology of P2P networks.
Another closely related problem is selecting Web pages uni-
formly at randomfrom the set of all Web pages [22], [23]. Web
pages naturally form a graph, with hyper-links forming edges
between pages. Unlike unstructured peer-to-peer networks, the
Web graph is directed and only outgoing links are easily dis-
covered. Much of the work on sampling Web pages therefore
focuses on estimating the number of incoming links, to facili-
tate degree correction. Unlike peers in peer-to-peer systems, not
much is known about the temporal stability of Web pages, and
temporal causes of sampling bias have received little attention
in past measurement studies of the Web.
B. RandomWalk-Based Sampling of Graphs
A popular technique for exploring connectivity structures
consists of performing random walks on graphs. Several
properties of random walks on graphs have been extensively
studied analytically [24], such as the access time, cover time,
and mixing time. While these properties have many useful
applications, they are, in general, only well-defined for static
graphs. To our knowledge the application of random walks
as a method of selecting nodes uniformly at random from a
dynamically changing graph has not been studied. A number
of papers [25][28] have made use of random walks as a basis
for searching unstructured P2P networks. However, searching
simply requires locating a certain piece of data anywhere along
the walk, and is not particularly concerned if some nodes are
preferred over others. Some studies [27], [28] additionally use
random walks as a component of their overlay-construction
algorithm.
Two papers that are closely related to our randomwalk-based
sampling approach are by Awan et al. [9] and Bar-Yossef and
Gurevich [10]. While the former also address the problem of
gathering uniform samples from peer-to-peer networks, the
latter are concerned with uniform sampling from a search
Page 36 of 165
FILE SHARING IN UNSTRUCTURED PEER-TO-PEER NETWORKS
engines index. Both works examine several random walk tech-
niques, including the MetropolisHastings method, but assume
an underlying graph structure that is not dynamically changing.
In addition to evaluating their techniques empirically for static
power-law graphs, the approach proposed by Awan et al. [9]
also requires special underlying support from the peer-to-peer
application. In contrast, we implement the MetropolisHast-
ings method in such a way that it relies only on the ability
to discover a peers neighbors, a simple primitive operation
commonly found in existing peer-to-peer networks. Moreover,
we introduce backtracking to cope with departed peers and con-
duct a much more extensive evaluation of the proposed MRWB
method. Specifically, we generalize our formulation reported in
[11] by evaluating MRWB over dynamically changing graphs
with a variety of topological properties. We also perform
empirical validations over an actual P2P network.
C. Sampling in Hidden Populations
The problemof obtaining accurate estimates of the number of
peers in an unstructured P2P network that have a certain prop-
erty can also be viewed as a problem in studying the sizes of
hidden populations. Following Salganik [29], a population is
called hidden if there is no central directory of all population
members, such that samples may only be gathered through re-
ferrals from existing samples. This situation often arises when
public acknowledgment of membership has repercussions (e.g.,
injection drug users [30]), but also arises if the target population
is difficult to distinguish from the population as a whole (e.g.,
jazz musicians [29]). Peers in P2P networks are hidden because
there is no central repository we can query for a list of all peers.
Peers must be discovered by querying other peers for a list of
neighbors.
Proposed methods in the social and statistical sciences for
studying hidden populations include snowball sampling [31],
key informant sampling [32], and targeted sampling [33]. While
these methods gather an adequate number of samples, they are
notoriously biased. More recently, Heckathorn [30] (see also
[29], [34]) proposed respondent-driven sampling, a snowball-
type method for sampling and estimation in hidden populations.
Respondent-driven sampling first uses the sample to make infer-
ences about the underlying network structure. In a second step,
these network-related estimates are used to derive the propor-
tions of the various subpopulations of interest. Salganik et al.
[29], [34] show that under quite general assumptions, respon-
dent-driven sampling yields estimates for the sizes of subpop-
ulations that are asymptotically unbiased, no matter how the
seeds were chosen.
Unfortunately, respondent-driven sampling has only been
studied in the context where the social network is static and
does not change with time. To the best of our knowledge, the
accuracy of respondent-driven sampling in situations where the
underlying network structure is changing dynamically (e.g.,
unstructured P2P systems) has not been considered in the
existing sampling literature.
D. Dynamic Graphs
While graph theory has been largely concerned with studying
and discovering properties of static connectivity structures,
many real-world networks evolve over time, for example via
node/edge addition and/or deletion. In fact, many large-scale
networks that arise in the context of the Internet (e.g., WWW,
P2P systems) are extremely dynamic and create havoc for
graph algorithms that have been designed with static or only
very slowly changing network structures in mind. Furthermore,
the development of mathematical models for evolving graphs
is still at an early stage and is largely concerned with genera-
tive models that are capable of reproducing certain observed
properties of evolving graphs. For example, recent work by
Leskovec et al. [35] focuses on empirically observed properties
such as densification (i.e., networks become denser over time)
and shrinking diameter (i.e., as networks grow, their diameter
decreases) and on new graph generators that account for these
properties. However, the graphs they examine are not P2P
networks and their properties are by and large inconsistent with
the design and usage of measured P2P networks (e.g., see [5]).
Hence, the dynamic graph models proposed in [35] are not
appropriate for our purpose, and neither are the evolving graph
models specifically designed to describe the Web graph (e.g.,
see [36] and references therein).
III. SAMPLING WITH DYNAMICS
We develop a formal and general model of a P2P system as
follows. If we take an instantaneous snapshot of the system at
time , we can view the overlay as a graph with the
peers as vertices and connections between the peers as edges.
Extending this notion, we incorporate the dynamic aspect by
viewing the system as an infinite set of time-indexed graphs,
. The most common approach for sampling
from this set of graphs is to define a measurement window,
, and select peers uniformly at random from the
set of peers who are present at any time during the window:
. Thus, it does not distinguish between
occurrences of the same peer at different times.
This approach is appropriate if peer session lengths are expo-
nentially distributed (i.e., memoryless). However, existing mea-
surement studies [3], [5], [37], [38] show session lengths are
heavily skewed, with many peers being present for just a short
time (a few minutes) while other peers remain in the system for
a very long time (i.e., longer than ). As a consequence, as
increases, the set includes an increasingly large frac-
tion of short-lived peers.
A simple example may be illustrative. Suppose we wish to
observe the number of files shared by peers. In this example
system, half the peers are up all the time and have many files,
while the other peers remain for around 1 minute and are imme-
diately replaced by new short-lived peers who have few files.
The technique used by most studies would observe the system
for a long time and incorrectly conclude that most of the
peers in the system have very few files. Moreover, their results
will depend on howlong they observe the system. The longer the
measurement window, the larger the fraction of observed peers
with few files.
One fundamental problem of this approach is that it focuses
on sampling peers instead of peer properties. It selects each
sampled vertex at most once. However, the property at the vertex
may change with time. Our goal should not be to select a vertex
Page 37 of 165
, but rather to sample the property at at a par-
ticular instant . Thus, we distinguish between occurrences of
the same peer at different times: samples and gath-
ered at distinct times are viewed as distinct, even when
they come from the same peer. The key difference is that it
must be possible to sample fromthe same peer more than once,
at different points in time. Using the formulation ,
, the sampling technique will not be biased by
the dynamics of peer behavior, because the sample set is decou-
pled frompeer session lengths. To our knowledge, no prior P2P
measurement studies relying on sampling make this distinction.
Returning to our simple example, our approach will correctly
select long-lived peers half the time and short-lived peers half
the time. When the samples are examined, they will show that
half of the peers in the systemat any given moment have many
files while half of the peers have fewfiles, which is exactly cor-
rect.
If the measurement window is sufficiently small, such
that the distribution of the property under consideration does
not change significantly during the measurement window, then
we may relax the constraint of choosing uniformly at random
from .
We still have the significant problemof selecting a peer uni-
formly at random from those present at a particular time. We
begin to address this problemin Section IV.
IV. SAMPLING FROM STATIC GRAPHS
We now turn our attention to topological causes of bias.
Towards this end, we momentarily set aside the temporal issues
by assuming a static, unchanging graph. The selection process
begins with knowledge of one peer (vertex) and progressively
queries peers for a list of neighbors. The goal is to select peers
uniformly at random. In any graph-exploration problem, we
have a set of visited peers (vertices) and a front of unexplored
neighboring peers. There are two ways in which algorithms
differ: (i) how to chose the next peer to explore, and (ii) which
subset of the explored peers to select as samples. Prior studies
use simple breadth-first or depth-first approaches to explore the
graph and select all explored peers. These approaches suffer
fromseveral problems:
The discovered peers are correlated by their neighbor rela-
tionship.
Peers with higher degree are more likely to be selected.
Because they never visit the same peer twice, they will
introduce bias when used in a dynamic setting as described
in Section III.
1) Random Walks: A better candidate solution is the random
walk, which has been extensively studied in the graph theory lit-
erature (for an excellent survey see [24]). We briefly summarize
the key terminology and results relevant to sampling. The tran-
sition matrix describes the probability of transitioning
to peer if the walk is currently at peer :
is a neighbor of
otherwise.
If the vector describes the probability of currently being at
each peer, then the vector describes the
probability
after taking one additional step. Likewise, describes the
probability after taking steps. As long as the graph is con-
nected and not bipartite, the probability of being at any partic-
ular node, , converges to a stationary distribution:
In other words, if we select a peer as a sample every steps, for
sufficiently large , we have the following good properties:
The information stored in the starting vector, , is lost,
through the repeated selection of random neighbors. There-
fore, there is no correlation between selected peers. Alter-
nately, we may start many walks in parallel. In either cases,
after steps, the selection is independent of the origin.
While the stationary distribution, , is biased towards
peers with high degree, the bias is precisely known, al-
lowing us to correct it.
Randomwalks may visit the same peer twice, which lends
itself better to a dynamic setting as described in Section III.
In practice, need not be exceptionally large. For graphs
where the edges have a strong random component (e.g., small-
world graphs such as peer-to-peer networks), it is sufficient that
the number of steps exceed the log of the population size, i.e.,
.
2) Adjusting for Degree Bias: To correct for the bias towards
high degree peers, we make use of the MetropolisHastings
method for Markov Chains. Randomwalks on a graph are a spe-
cial case of Markov Chains. In a regular random walk, the tran-
sition matrix leads to the stationary distribution ,
as described above. We would like to choose a new transition
matrix, , to produce a different stationary distribution,
. Specifically, we desire to be the uniform distribu-
tion so that all peers are equally likely to be at the end of the
walk. MetropolisHastings [6][8] provides us with the desired
:
if
if .
Equivalently, to take a step from peer , select a neighbor
of as normal (i.e., with probability ). Then, with
probability , accept the move. Otherwise,
return to (i.e., with probability ).
To collect uniformsamples, we have , so the move-
acceptance probability becomes:
Therefore, our algorithm for selecting the next step from some
peer is as follows:
Select a neighbor of uniformly at random.
Query for a list of its neighbors, to determine its degree.
Generate a randomvalue, , uniformly between 0 and 1.
If , is the next step.
Otherwise, remain at as the next step.
We call this the Metropolized Random Walk (MRW). Quali-
tatively, the effect is to suppress the rate of transition to peers of
Page 38 of 165
FILE SHARING IN UNSTRUCTURED PEER-TO-PEER NETWORKS
TABLE I
KOLMOGOROVSMIRNOV TEST STATISTIC FOR TECHNIQUES OVER STATIC GRAPHS. VALUES ABOVE
1.07 1o LIE IN THE REJECTION REGION AT THE 5% LEVEL
Fig. 1. Bias of different sampling techniques; after collecting samples. The figures showhowmany peers ( -axis) were selected times.
higher degree, resulting in selecting each peer with equal prob-
ability.
3) Evaluation: Although [6] provides a proof of correctness
for the MetropolisHastings method, to ensure the correctness
of our implementation we conduct evaluations through simu-
lation over static graphs. This additionally provides the oppor-
tunity to compare MRW with conventional techniques such as
Breadth-First Search (BFS) or naive random walks (RW) with
no adjustments for degree bias.
To evaluate a technique, we use it to collect a large number of
sample vertices froma graph, then perform a goodness-of-fit test
against the uniform distribution. For Breadth-First Search, we
simulate typical usage by running it to gather a batch of 1,000
peers. When one batch of samples is collected, the process is
reset and begins anew at a different starting point. To ensure
robustness with respect to different kinds of connectivity struc-
tures, we examine each technique over several types of graphs
as follows:
ErdsRnyi: The simplest variety of randomgraphs
WattsStrogatz: Small world graphs with high clus-
tering and low path lengths
BarabsiAlbert: Graphs with extreme degree distribu-
tions, also known as power-lawor scale-free graphs
Gnutella: Snapshots of the Gnutella ultrapeer topology,
captured in our earlier work [39]
To make the results more comparable, the number of vertices
and edges in each graph
are approximately the same.
1
Table I presents the results of the
goodness-of-fit tests after collecting samples, showing
that MetropolisHastings appears to generate uniform samples
over each type of graph, while the other techniques fail to do so
by a wide margin.
Fig. 1 explores the results visually, by plotting the number
of times each peer is selected. If we select samples,
the
1
ErdsRnyi graphs are generated based on some probability ) that any edge
may exist. We set ) = so that there will be close to E edges,
though the exact value may vary slightly. The WattsStrogatz model require
that E be evenly divisible by , so in that model we use E = 1 9o4 1o .
typical node should be selected times, with other nodes being
selected close to times approximately following a normal dis-
tribution with variance .
2
We used samples. We also
include an Oracle technique, which selects peers uniformly
at random using global information. The MetropolisHastings
results are virtually identical to the Oracle, while the other
techniques select many peers much more and much less than
times. In the Gnutella, WattsStrogatz, and BarabsiAlbert
graphs, Breadth-First Search exhibits a few vertices that are
selected a large number of times . The (not-ad-
justed) Random Walk (RW) method has similarly selected a
few vertices an exceptionally large number of times in the
Gnutella and BarabsiAlbert models. The Oracle and MRW,
by contrast, did not select any vertex more than around 1,300
times.
In summary, the MetropolisHastings method selects peers
uniformly at random from a static graph. The Section V exam-
ines the additional complexities when selecting from a dynamic
graph, introduces appropriate modifications, and evaluates the
algorithms performance.
V. SAMPLING FROM DYNAMIC GRAPHS
Section III set aside topological issues and examined the dy-
namic aspects of sampling. Section IV set aside temporal issues
and examined the topological aspects of sampling. This section
examines the unique problems that arise when both temporal
and topological difficulties are present.
Our hypothesis is that a MetropolisHastings random walk
will yield approximately unbiased samples even in a dynamic
environment. Simulation results testing this hypothesis are later
in this section and empirical tests are in Section V-A. The funda-
mental assumption of MetropolisHastings is that the frequency
of visiting a peer is proportional to the peers degree. This as-
sumption will be approximately correct if peer relationships
change only slightly during the walk. On one extreme, if the
2
Based on the normal approximation of a binomial distribution with ) =
and = .
Page 39 of 165
entire walk completes before any graph changes occur, then the
problem reduces to the static case. If a single edge is removed
mid-walk, the probability of selecting the two affected peers
is not significantly affected, unless those peers have very few
edges. If many edges are added and removed during a random
walk, but the degree of each peer does not change significantly,
we would also expect that the probability of selecting each peer
will not change significantly. In peer-to-peer systems, each peer
actively tries to maintain a number of connections within a cer-
tain range, so we have reason to believe that the degree of each
peer will be relatively stable in practice. On the other hand, it is
quite possible that in a highly dynamic environment, or for cer-
tain degree distributions, the assumptions of MetropolisHast-
ings are grossly violated and it fails to gather approximately un-
biased samples.
The fundamental question we attempt to answer in this sec-
tion is: Under what conditions does the MetropolisHastings
random walk fail to gather approximately unbiased samples?
If there is any bias in the samples, the bias will be tied to
some property that interacts with the walk. Put another way,
if there were no properties that interacted with the walk, then
the walking process behaves as it would on a static graph, for
which we have a proof from graph theory. Therefore, we are
only worried about properties which cause the walk to behave
differently. We identify the following three fundamental prop-
erties that interact with the walk:
Degree: the number of neighbors of each peer. The
MetropolisHastings method is a modification of a reg-
ular random walk in order to correct for degree-bias as
described in Section IV. It assumes a fixed relationship
between degree and the probability of visiting a peer.
If the MetropolisHastings assumptions are invalid, the
degree-correction may not operate correctly, introducing a
bias correlated with degree.
Session lengths: how long peers remain in the system.
Section III showed howsampling may result in a bias based
on session length. If the walk is more likely to select either
short-lived or long-lived peers, there will be a bias corre-
lated with session length.
Query latency: how long it takes the sampler to query
a peer for a list of its neighbors. In a static environment
the only notion of time is the number of steps taken by
the walk. In a dynamic environment, each step requires
querying a peer, and some peers will respond more quickly
than others. This could lead to a bias correlated with the
query latency. In our simulations, we model the query la-
tency as twice the round-trip time between the sampling
node and the peer being queried.
3
For other peer properties, sampling bias can only arise if the
desired property is correlated with a fundamental properties
and that fundamental property exhibits bias. For example, when
sampling the number of files shared by each peer, there may be
sampling bias if the number of files is correlated with session
length and sampling is biased with respect to session length.
One could also imagine the number of files being correlated
3
RTT for the SYN, RTT for the SYN-ACK, RTT for the ACKand the
request, and RTT for the reply.
with query latency (which is very loosely related to the peer
bandwidth). However, sampling the number of shared files
cannot be biased independently, as it does not interact with
the walk. To show that sampling is unbiased for any property,
it is sufficient to show that it is unbiased for the fundamental
properties that interact with the sampling technique.
A. Coping With Departing Peers
Departing peers introduce an additional practical considera-
tion. The walk may try to query a peer that is no longer present-a
case where the behavior of the ordinary random walk algorithm
is undefined. We employ a simple adaptation to mimic an ordi-
nary random walk on a static graph as closely as possible, by
maintaining a stack of visited peers. When the walk chooses a
newpeer to query, we push the peers address on the stack. If the
query times out, we pop the address off the stack, and choose a
new neighbor of the peer that is now on top of the stack. If all of
a peers neighbors time out, we re-query that peer to get a fresh
list of its neighbors. If the re-query also times out, we pop that
peer from the stack as well, and so on. If the stack underflows,
we consider the walk a failure. We do not count timed-out peers
as a hop for the purposes of measuring the length of the walk.
We call this adaptation of the MRW sampling technique the
Metropolized Random Walk with Backtracking (MRWB) method
for sampling from dynamic graphs. Note that when applied in a
static environment, this method reduces to MRW.
B. Evaluation Methodology
In the static case, we can rely on graph theory to prove the
accuracy of the MRW technique. Unfortunately, graph theory
is not well-suited to the problem of dynamically changing
graphs. Therefore, we rely on simulation rather than analysis.
We have developed a session-level dynamic overlay simulator
that models peer arrivals, departures, latencies, and neighbor
connections. We nowdescribe our simulation environment.
The latencies between peers are modeled using values from
the King data set [40]. Peers learn about one another using one of
several peer discovery mechanisms described below. Peers have
a target minimumnumber of connections (i.e., degree) that they
attempt to maintain at all times. Whenever they have fewer con-
nections, they open additional connections. We assume connec-
tions are TCP and require a 3-way handshake before the connec-
tion is fully established, and that peers will time out an attempted
connection to a departed peer after 10 seconds. A new peer gen-
erates its session length from one of several different session
length distributions described below and departs when the ses-
sion length expires. New peers arrive according to a Poisson
process, where we select the mean peer arrival rate based on the
session length distribution to achieve a target population size of
100,000 peers.
To query a peer for a list of neighbors, the sampling node
must set up a TCP connection, submit its query, and receive a
response. The query times out if no response is received after 10
seconds.
4
We run the simulator for a warm-up period to reach
steady-state conditions before performing any randomwalks.
4
The value of 10 seconds was selected based on our experiments in devel-
oping a crawler for the Gnutella network in [12].
Page 40 of 165
FILE SHARING IN UNSTRUCTURED PEER-TO-PEER NETWORKS
Fig. 2. Comparison of sampled and expected distributions. They are visually indistinguishable.
Fig. 3. Distribution of time needed to complete a randomwalk (simulated).
Our goal is to discover if random walks started under identical
conditions will select a peer uniformly at random. To evaluate
this, we start 100,000 concurrent randomwalks froma single lo-
cation. Although started at the same time, the walks will not all
complete at the same time.
5
We chose to use 100,000 walks as
we believe this is a much larger number of samples than most
researchers will use in practice. If there is no discernible bias
with 100,000 samples, we can conclude that the tool is unbiased
for the purposes of gathering fewer samples (i.e., we cannot get
more accuracy by using less precision). Fig. 3 shows the dis-
tribution of how long walks take to complete in one simulation
using 50 hops per walk, illustrating that most walks take 1020
seconds to complete. In the simulator the walks do not interact
or interfere with one another in any way. Each walk ends and
collects an independent sample.
As an expected distribution, we capture a perfect snapshot
(i.e., using an oracle) at the median walk-completion time, i.e.,
when 50%of the walks have completed.
C. Evaluation of a Base Case
Because the potential number of simulation parameters is un-
bounded, we need a systematic method to intelligently explore
the most interesting portion of this parameter space. Towards
this end, we begin with a base case of parameters as a starting
point and examine the behavior of MRWB under those condi-
tions. In Sections V-D and E, we vary the parameters and ex-
plore how the amount of bias varies as a function of each of the
parameters. As a base case, we use the following configuration:
Fig. 2 presents the sampled and expected distributions for the
three fundamental properties: degree, session length, and query
latency. The fact that the sampled and expected distributions are
visually indistinguishable demonstrates that the samples are not
significantly biased in the base case.
5
Each walk ends after the same number of hops, but not every hop takes the
same amount of time due to differences in latencies and due to the occasional
timeout.
TABLE II
BASE CASE CONFIGURATION
To efficiently examine other cases, we introduce a sum-
mary statistic to quickly capture the difference between the
sampled and expected distributions, and to provide more rigor
than a purely visual inspection. For this purpose, we use the
Kolmogorov-Smirnov (KS) statistic, , formally defined as
follows. Where is the sampled cumulative distribution
function and is the expected cumulative distribution
function fromthe perfect snapshot, the KS statistic is
In other words, if we plot the sampled and expected CDFs,
is the maximum vertical distance between them and has a pos-
sible range of . For Fig. 2(a)(c), the values of were
0.0019, 0.0023, and 0.0037, respectively. For comparison, at the
significance level, is 0.0061, for the two-sample KS
statistic with 100,000 data points each. However, in practice we
do not expect most researchers to gather hundreds of thousands
of samples. After all, the initial motivation for sampling is to
gather reasonably accurate data at relatively low cost. As a rough
rule of thumb, a value of is quite bad, corresponding to at
least a 10 percentage point difference on a CDF. A value of
is excellent for most purposes when studying a peer property,
corresponding to no more than a 1 percentage point difference
on a CDF.
D. Exploring Different Dynamics
In this section, we examine how the amount of bias changes as
we vary the type and rate of dynamics in the system. We examine
different settings of the simulation parameters that affect dy-
namics, while continuing to use the topological characteristics
fromour base case (Table II). We would expect that as the rate
of peer dynamics increases, the sampling error also increases.
The key question is: How fast can the churn rate be before it
causes significant error, and is that likely to occur in practice?
In this subsection, we present the results of simulations with
a wide variety of rates using three different models for session
length, as follows:
Exponential: The exponential distribution is a one-param-
eter distribution (rate ) that features sessions relatively
close together in length. It has been used in many prior
Page 41 of 165
Fig. 4. Sampling error of the three fundamental properties as a function of session-length distribution. Exceptionallyheavy churn eaiimd
error into the sampling process.
< 1m ii) introduces
simulation and analysis studies of peer-to-peer systems
[41][43].
Pareto: The Pareto (or power-law) distribution is a two-pa-
rameter distribution (shape , location ) that features
many short sessions coupled with a few very long sessions.
Some prior measurement studies of peer-to-peer systems
have suggested that session lengths follow a Pareto distri-
bution [44][46]. One difficulty with this model is that
is a lower-bound on the session length, and fits of to
empirical data are often unreasonably high (i.e., placing a
lower bound significantly higher than the median session
length reported by other measurement studies). In their in-
sightful analytical study of churn in peer-to-peer systems,
Leonard, Rai, and Loguinov [47] instead suggest using a
shifted Pareto distribution (shape , scale ) with .
We use this shifted Pareto distribution, holding fixed and
varying the scale parameter . We examine two different
values: (infinite variance) and (finite
variance).
Weibull: Our own empirical observations [5] suggest the
Weibull distribution (shape , scale ) provides a good
model of peer session lengths, representing a compromise
between the exponential and Pareto distributions. We fix
(based on our empirical data) and vary the scale
parameter .
Fig. 4 presents the amount of sampling error as a func-
tion of median session length, for the three fundamental proper-
ties, with a logarithmic -axis scale. The figure shows that error
is low over a wide range of session lengths but be-
gins to become significant when the median session length drops
below 2 minutes, and exceeds when the median drops
below 30 seconds. The type of distribution varies the threshold
slightly, but overall does not appear to have a significant im-
pact. To investigate whether the critical threshold is a function
of the length of the walk, we ran some simulations using walks
of 10,000 hops (which take around one simulated hour to com-
plete). Despite the long duration of these walks, they remained
unbiased with for each of the three
fundamental properties. This suggests that the accuracy of
MRWB is not ad- versely affected by a long walk.
While the median session length reported by measurement
studies varies considerably (see [42] for a summary), none re-
port a median below 1 minute and two studies report a median
session length of one hour [3], [4]. In summary, these results
demonstrate that MRWBcan gracefully tolerate peer dynamics.
In particular, it performs well over the rate of churn reported in
real systems.
E. Exploring Different Topologies
In this section, we examine different settings of the simulation
parameters that directly affect topological structure, while using
the dynamic characteristics from our base case (Table II). The
MetropolisHastings method makes use of the ratio between the
degrees of neighboring peers. If this ratio fluctuates dramatically
while the walk is conducted, it may introduce significant bias.
If peers often have only a few connections, any change in their
degree will result in a large percentage-wise change. One key
question is therefore: Does a low target degree lead to sampling
bias, and, if so, when is significant bias introduced?
The degree of peers is controlled by three factors. First, each
peer has a peer discovery mechanism that enables it to learn
the addresses of potential neighbors. The peer discovery mech-
anism will influence the structure of the topology and, if per-
forming poorly, will limit the ability of peers to establish con-
nections. Second, peers have a target degree which they actively
try to maintain. If they have fewer neighbors than the target, they
open additional connections until they have reached the target.
If necessary, they make use of the peer discovery mechanism
to locate additional potential neighbors. Finally, peers have a
maximumdegree, which limits the number of neighbors they are
willing to accept. If they are at the maximum and another peer
contacts them, they refuse the connection. Each of these three
factors influences the graph structure, and therefore may affect
the walk.
We model four different types of peer discovery mechanisms,
based on those found in real systems:
Random Oracle: This is the simplest and most idealistic
approach. Peers learn about one another by contacting a
rendezvous point that has perfect global knowledge of the
system and returns a random set of peers for them to con-
nect to.
FIFO: In this scheme, inspired by the GWebCaches of
Gnutella [48], peers contact a rendezvous point which re-
turns a list of the last peers that contacted the rendezvous,
where is the maximumpeer degree.
Soft State: Inspired by the approach of BitTorrents
trackers, peers contact a rendezvous point that has
imperfect global knowledge of the system. In addition to
contacting the rendezvous point to learn about more peers,
every peer periodically (every half hour) contacts the
Page 42 of 165
FILE SHARING IN UNSTRUCTURED PEER-TO-PEER NETWORKS
Fig. 5. Sampling error of the three fundamental properties as a function of the number of connections each peer actively attempts to maintain. Lowtarget degree
2) introduces significant sampling error.
rendezvous point to refresh its state. If a peer fails to make
contact for 45 minutes, the rendezvous point removes it
fromthe list of known peers.
History: Many P2P applications connect to the network
using addresses they learned during a previous session
[49]. A large fraction of these addresses will timeout, but
typically enough of the peers will still be active to avoid
the need to contact a centralized rendezvous point. As
tracking the re-appearance of peers greatly complicates
our simulator (as well as greatly increasing the memory
requirements), we use a coarse model of the History
mechanism. We assume that 90% of connections auto-
matically timeout. The 10% that are given valid addresses
are skewed towards peers that have been present for a
long time (more than one hour) and represent regular
users who might have been present during the peers last
session. While this might be overly pessimistic, it reveals
the behavior of MRWB under harsh conditions.
Fig. 5 presents the amount of sampling error for the
three fundamental properties as a function of the target degree,
for each of the peer discovery methods, holding the maximum
peer degree fixed at 30 neighbors. It shows that sampling
is not significantly biased in any of the three fundamental
properties as long as peers attempt to maintain at least three
connections. Widely deployed peer-to-peer systems typically
maintain dozens of neighbors. Moreover, maintaining fewer
than three neighbors per peer almost certainly leads to network
fragmentation, and is therefore not a reasonable operating point
for peer-to-peer systems.
The results for the different peer-discovery mechanisms were
similar to one another, except for a small amount of bias ob-
served when using the History mechanism as the target degree
approaches the maximumdegree (30). To investigate this issue,
Fig. 6 presents the sampled and expected degree distribution
when using the History mechanism with a target degree of 30.
The difference between the sampled and expected distributions
is due to the 2.4% of peers with a degree of zero. These iso-
lated peers arise in this scenario because the History mechanism
has a high failure rate (returning addresses primarily of departed
peers), and when a valid address is found, it frequently points
to a peer that is already at its connection limit. The zero-degree
peers are visible in the snapshot (which uses an oracle to obtain
global information), but not to the sampler (since peers with a
degree of zero have no neighbors and can never be reached).
We do not regard omitting disconnected peers as a serious lim-
itation.
Fig. 6. Comparison of degree distributions using the History mechanism with a
target degree of 30. Sampling cannot capture the unconnected peers dg=eee
o), causing the sampling error observed in Fig. 5.
Having explored the effects of lowering the degree, we now
explore the effects of increasing it. In Fig. 7, we examine sam-
pling error as a function of the maximumdegree, with the target
degree always set to 15 less than the maximum. There is little
error for any setting of the maximumdegree.
In summary, the proposed MRWB technique for sampling
from dynamic graphs appears unbiased for a range of different
topologies (with reasonable degree distributions; e.g.,
), operates correctly for a number of different mechanisms for
peer discovery, and is largely insensitive to a wide range of peer
dynamics, with the churn rates reported for real systems safely
within this range.
VI. EMPIRICAL RESULTS
In addition to the simulator version, we have implemented
the MRWB algorithm for sampling from real peer-to-peer net-
works into a tool called - . The Sections VI-AE
briefly describe the implementation and usage of -
and present empirical experiments to validate its accuracy.
A. Ion-Sampler
The - tool uses a modular design that accepts
plug-ins for new peer-to-peer systems.
6
A plug-in can be written
for any peer-to-peer systemthat allows querying a peer for a list
of its neighbors. The - tool hands IP-address:port
pairs to the plug-in, which later returns a list of neighbors or sig-
nals that a timeout occurred. The - tool is respon-
sible for managing the walks. It outputs the samples to stan-
dard output, where they may be easily read by another tool that
collects the actual measurements. For example, -
6
In fact, it uses the same plug-in architecture as our earlier, heavy-weight tool,
Cruiser, which exhaustively crawls peer-to-peer systems to capture topology
snapshots.
Page 43 of 165
Fig. 7. Sampling error of the three fundamental properties as a function of the maximumnumber of connections each peer will accept. Each peer activelyattempts
to maintain - 1 connections.
could be used with existing measurement tools for measuring
bandwidth to estimate the distribution of access link bandwidth
in a peer-to-peer system. Listing 1 shows an example of using
- to sample peers fromGnutella.
B. Empirical Validation
Empirical validation is challenging due to the absence of
high-quality reference data to compare against. In our earlier
work [12], [39], we developed a peer-to-peer crawler called
Cruiser that captures the complete overlay topology through
exhaustive exploration. We can use these topology snapshots as
a point of reference for the degree distribution. Unfortunately,
we do not have reliably accurate empirical reference data for
session lengths or query latency.
By capturing every peer, Cruiser is immune to sampling diffi-
culties. However, because the network changes as Cruiser oper-
ates, its snapshots are slightly distorted [12]. In particular, peers
arriving near the start of the crawl are likely to have found addi-
tional neighbors by the time Cruiser contacts them. Therefore,
we intuitively expect a slight upward bias in Cruisers observed
degree distribution. For this reason, we would not expect a per-
fect match between Cruiser and sampling, but if the sampling is
unbiased we still expect them to be very close. We can view the
CCDF version of the degree distribution captured by Cruiser as
a close upper-bound on the true degree distribution.
Fig. 8 presents a comparison of the degree distribution of
reachable ultrapeers in Gnutella, as seen by Cruiser and by the
sampling tool (capturing approximately 1,000 samples with
hops). It also includes the results of a short crawl,
7
a sampling
technique commonly used in earlier studies (e.g., [3]). We inter-
leaved running these measurement tools to minimize the change
in the systembetween measurements of different tools, in order
to make their results comparable.
Examining Fig. 8, we see that the full crawl and sampling
distributions are quite similar. The sampling tool finds slightly
more peers with lower degree, compared to the full crawl, in ac-
cordance with our expectations described above. We examined
several such pairs of crawling and sampling data and found the
same pattern in each pair. By comparison, the short crawl ex-
hibits a substantial bias towards high degree peers relative to
both the full crawl and sampling. We computed the KS statistic
7
A short crawl is a general term for a progressive exploration of a portion
of the graph, such as by using a breadth-first or depth-first search. In this case,
we randomly select the next peer to explore.
Fig. 8. Comparison of degree distributions observed from sampling versus ex-
haustively crawling all peers.
between each pair of datasets, presented in Table III. Since
the full crawl is a close upper-bound of the true degree dis-
tribution, and since samplings distribution is lower, the error
in the sampling distribution relative to the true distribution is
. On the other hand, because the short crawl data ex-
ceeds the full crawl distribution, its error relative to the true dis-
tribution is . In other words, the true for the sam- pling
data is at most 0.043, while the true for the short crawl data is
at least 0.120. It is possible that sampling with MRWB
produces more accurate results than a full crawl (which suffers
from distortion), but this is difficult to prove conclusively.
C. Efficiency
Having demonstrated the validity of the MRWB technique,
we now turn our attention to its efficiency. Performing the walk
requires queries, where is the desired number of samples
and is the length of the walk in hops. If is too low, significant
bias may be introduced. If is too high, it should not introduce
bias, but is less efficient. From graph theory, we expect to require
for an ordinary randomwalk.
To empirically explore the selection of for Gnutella, we
conducted many sets of sampling experiments using different
values of , with full crawls interspersed between the sampling
experiments. For each sampling experiment, we compute the KS
statistic, , between the sampled degree distribution and that
captured by the most recent crawl. Fig. 9 presents the mean and
standard deviation of as a function of across different exper-
iments. The figure shows that lowvalues of can lead to
enormous bias . The amount of bias decreases rapidly
with , and low bias is observed for hops. However, in a
single experiment with hops, we observed ,
while all other experiments at that length showed .
Page 44 of 165
UNSTRUCTURED PEER-TO-PEER NETWORKS
TABLE III
KS STATISTIC D) BETWEEN PAIRS OF EMPIRICAL DATASETS
Fig. 10. Difference between sampled results and a crawl as a function of walk
length, after the change suggested in Section VI-C. Each experiment was re-
peated several times. Error bars show the sample standard deviation.
Fig. 9. Difference between sampled results and a crawl as a function of walk
length. Each experiment was repeated several times. Error bars show the sample
standard deviation.
Investigating the anomalous dataset, we found that a single peer
had been selected 309 out of 999 times.
Further examining the trace of this walk, we found that the
walk happened to start at a peer with only a single neighbor.
In such a case, the walk gets stuck at that peer due to the way
MetropolisHastings transitions to a new peer with probability
only . When this stuck event occurs late in the walk,
it is just part of the normal re-weighting to correct for a regular
random walks bias towards high degree peers. However, when
it occurs during the first step of the walk, a large fraction of
the walks will end at the unusual low-degree peer, resulting in
an anomalous set of selections where the same peer is chosen
many times.
One way to address this problem is to increase the walk length
by requiring
Fig. 11. Runtime of ion-mea
ri
1,000 samples.
as a function of walk length when collecting
However, this reduces the efficiency of the walk. More impor-
tantly, we typically do not accurately know the maximum de-
gree, i.e., while increasing decreases the probability of an
anomalous event, it does not preclude it. Therefore, we suggest
the following heuristic to prevent such problems from occur-
ring. During the first few steps of the walk, always transition
to the next peer as in a regular random walk; after the first few
steps, use the MetropolisHastings method for deciding whether
to transition to the next peer or remain at the current one. This
modification eliminates the correlations induced by sharing a
single starting location, while keeping the walk length relatively
short. We repeated the experiment after making this change; the
results are shown in Fig. 10. The observed error in the revised
implementation is low for , with low variance. In other
words, the samples are consistently accurate for .
In light of these considerations, we conservatively regard a
choice of as a safe walk length for Gnutella. Choosing
, we can collect 1,000 samples by querying 25,000
peers, over an order of magnitude in savings compared with per-
forming a full crawl which must contact more than 400,000.
Fig. 12. Example usage of the ion- am ier tool. We specify that we want to
use the Gnutella plug-in, each walk should take 25 hops, and we would like 10
samples. The tool then prints out 10 IP-address:port pairs. We have changed the
first octet of each result to 10 for privacy reasons.
D. Execution Time
We examined execution time as a function of the number of
hops, , and present the results in Fig. 11. With hops,
the execution time is around 10 minutes. In our initial imple-
mentation of - , a small fraction of walks would get
stuck in a corner of the network, repeatedly trying to contact
a set of departed peers. While the walks eventually recover, this
corner-case significantly and needlessly delayed the overall exe-
cution time. We added a small cache to remember the addresses
of unresponsive peers to address this issue.
For comparison, Cruiser takes around 13 minutes to capture
the entire topology. This begs the question: if -
does an order of magnitude less work, why is the
running time only slightly better? While - contacts
significantly fewer peers, walks are sequential in nature
which limits the amount of parallelism that - can
exploit. Cruiser, on the other hand, can query peers almost
entirely in parallel,
Page 45 of 165
but it must still do work, where is the population size. In
other words, if a peer-to-peer network doubles in size, Cruiser
will take twice as long to capture it. Alternately, we can keep
Cruisers execution time approximately constant by double the
amount of hardware and bandwidth we provision for Cruisers
use. The - tool requires only work,
meaning there is little change in its behavior as the network
grows.
While longer execution time has a negative impact on the ac-
curacy of Cruisers results, - s results are not sig-
nificantly impacted by the time required to perform the walk
(as demonstrated in Section V-D where we simulate walks of
10,000 hops).
E. Summary
In summary, these empirical results support the conclusion
that a Metropolized Random Walk with Backtracking is an ap-
propriate method of collecting measurements from peer-to-peer
systems, and demonstrate that it is significantly more accurate
than other common sampling techniques. They also illustrate the
dramatic improvement in efficiency and scalability of MRWB
compared to performing a full crawl. As network size increases,
the cost of a full crawl grows linearly and takes longer to com-
plete, introducing greater distortion into the captured snapshots.
For MRWB, the cost increases logarithmically, and no addi-
tional bias is introduced.
VII. DISCUSSION
A. How Many Samples are Required?
An important consideration when collecting samples is to
knowhowmany samples are needed for statistically significant
results. This is principally a property of the distribution being
sampled. Consider the problemof estimating the underlying fre-
quency of an event, e.g., that the peer degree takes a particular
value. Given unbiased samples, an unbiased estimate of is
where is the number of samples for which the
event occurs. has root mean square (RMS) relative error
Fromthis expression, we derive the following observations:
Estimation error does not depend on the population size; in
particular the estimation properties of unbiased sampling
scale independently of the size of the systemunder study.
The above expression can be inverted to derive the number
of samples required to estimate an outcome of fre-
quency up to an error . A simple bound is
.
Unsurprisingly, smaller frequency outcomes have a larger
relative error. For example, gathering 1,000 unbiased sam-
ples gives us very little useful information about events
which only occur one time in 10,000; the associated
value is approximately 3: the likely error dominates the
value to be estimated. This motivates using biased sam-
pling in circumstances that we discuss in Section VII-B.
The presence of sampling bias complicates the picture. If an
event with underlying frequency is actually sampled with fre-
quency , then the RMS relative error acquires an additional
term which does not reduce as the number of sam-
ples grows. In other words, when sampling from a biased
distribution, increasing the number of samples only increases
the accuracy with which we estimate the biased distribution.
B. Unbiased Versus Biased Sampling
At the beginning of this paper, we set the goal of collecting
unbiased samples. However, there are circumstances where
unbiased samples are inefficient. For example, while unbiased
samples provide accurate information about the body of a
distribution, they provide very little information about the tails:
the pitfall of estimating rare events we discussed in the previous
subsection.
In circumstances such as studying infrequent events, it may
be desirable to gather samples with a known sampling bias, i.e.,
with non-uniform sampling probabilities. By deliberately intro-
ducing a sampling bias towards the area of interest, more rele-
vant samples can be gathered. During analysis of the data, each
sample is weighted inversely to the probability that it is sampled.
This yields unbiased estimates of the quantities of interest, even
though the selection of the samples is biased. This approach is
known as importance sampling [50].
A known bias can be introduced by choosing an appropriate
definition of in the MetropolisHastings equations pre-
sented in Section IV and altering the walk accordingly. Because
the desired type of known bias depends on the focus of the re-
search, we cannot exhaustively demonstrate through simulation
that MetropolisHastings will operate correctly in a dynamic
environment for any . Our results show that it works well
in the common case where unbiased samples are desired (i.e.,
for all and ).
C. Sampling From Structured Systems
Throughout this paper, we have assumed an unstructured
peer-to-peer network. Structured systems (also known as Dis-
tributed Hash Tables or DHTs) should work just as well with
random walks, provided links are still bidirectional. However,
the structure of these systems often allows a more efficient
technique.
In a typical DHT scheme, each peer has a randomly generated
identifier. Peers form an overlay that actively maintains certain
properties such that messages are efficiently routed to the peer
closest to a target identifier. The exact properties and the defi-
nition of closest vary, but the theme remains the same. In these
systems, to select a peer at random, we may simply generate an
identifier uniformly at random and find the peer closest to the
identifier. Because peer identifiers are generated uniformly at
random, we know they are uncorrelated with any other prop-
erty. This technique is simple and effective, as long as there is
little variation in the amount of identifier space that each peer is
responsible for. We made use of this sampling technique in our
study of the widely-deployed Kad DHT [51].
VIII. CONCLUSIONS AND FUTURE WORK
This paper explores the problem of sampling representative
peer properties in large and dynamic unstructured P2P systems.
We show that the topological and temporal properties of P2P
Page 46 of 165
UNSTRUCTURED PEER-TO-PEER NETWORKS
systems can lead to significant bias in collected samples.
To collect unbiased samples, we present the Metropolized
Random Walk with Backtracking (MRWB), a modification of
the MetropolisHastings technique, which we developed into
the - tool. Using both simulation and empirical
evaluation, we show that MRWB can collect approximately
unbiased samples of peer properties over a wide range of
realistic peer dynamics and topological structures.
We are pursuing this work in the following directions. First,
we are exploring improving sampling efficiency for uncommon
events (such as in the tail of distributions) by introducing
known bias, as discussed in Section VII-B. Second, we are
studying the behavior of MRWB under flash-crowd scenarios,
where not only the properties of individual peers are changing,
but the distribution of those properties is also rapidly evolving.
Finally, we are developing additional plug-ins for -
and using it in conjunction with other measurement tools to
accurately characterize several properties of widely-deployed
P2P systems.
ACKNOWLEDGMENT
The authors would like to thank A. Rasti and J. Capehart for
their invaluable efforts in developing the dynamic overlay sim-
ulator, and V. Lo for her valuable feedback on this paper.
REFERENCES
[1] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F.
Dabek, and H. Balakrishnan, Chord: A scalable peer-to-peer lookup
protocol for Internet applications, IEEE/ACMTrans. Networking, vol.
11, no. 1, pp. 1732, Feb. 2002.
[2] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker,
A scalable content-addressable network, presented at the ACM
SIGCOMM2001, San Diego, CA.
[3] S. Saroiu, P. K. Gummadi, and S. D. Gribble, Measuring and ana-
lyzing the characteristics of Napster and Gnutella hosts, Multimedia
Syst. J., vol. 9, no. 2, pp. 170184, Aug. 2003.
[4] R. Bhagwan, S. Savage, and G. Voelker, Understanding availability,
presented at the 2003 Int. Workshop on Peer-to-Peer Systems,
Berkeley, CA.
[5] D. Stutzbach and R. Rejaie, Understanding churn in peer-to-peer net-
works, presented at the 2006 Internet Measurement Conf., Rio de
Janeiro, Brazil.
[6] S. Chib and E. Greenberg, Understanding the Metropolis-Hastings al-
gorithm, The Americian Statistician, vol. 49, no. 4, pp. 327335, Nov.
1995.
[7] W. Hastings, Monte carlo sampling methods using Markov chains and
their applications, Biometrika, vol. 57, pp. 97109, 1970.
[8] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller,
Equations of state calculations byfast computingmachines, J. Chem.
Phys., vol. 21, pp. 10871092, 1953.
[9] A. Awan, R. A. Ferreira, S. Jagannathan, and A. Grama, Distributed
uniform sampling in unstructured peer-to-peer networks, presented
at the 2006 Hawaii Int. Conf. System Sciences, Kauai, HI, Jan.
2006.
[10] Z. Bar-Yossef and M. Gurevich, Random sampling from a search en-
gines index, presented at the 2006 WWW Conf., Edinburgh, Scot-
land.
[11] D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, Sam-
pling techniques for large, dynamic graphs, presented at the 2006
Global Internet Symp., Barcelona, Spain, Apr. 2006.
[12] D. Stutzbach and R. Rejaie, Capturing accurate snapshots of the
Gnutella network, in Proc. 2005 Global Internet Symp., Miami, FL,
Mar. 2005, pp. 127132.
[13] B. Bollobs, A probabilistic proof of an asymptotic formula for the
number of labelled regular graphs, Eur. J. Combinator., vol. 1, pp.
311316, 1980.
[14] M. Jerrumand A. Sinclair, Fast uniformgeneration of regular graphs,
Theoret. Comput. Sci., vol. 73, pp. 91100, 1990.
[15] C. Cooper, M. Dyer, and C. Greenhill, Sampling regular graphs and a
peer-to-peer network, in Proc. Symp. Discrete Algorithms, 2005, pp.
980988.
[16] V. Krishnamurthy, J. Sun, M. Faloutsos, and S. Tauro, Sampling In-
ternet topologies: How small can we go?, in Proc. 2003 Int. Conf.
Internet Computing, Las Vegas, NV, Jun. 2003, pp. 577580.
[17] V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, and J.-H. C. G.
Percus, Reducing large Internet topologies for faster simulations,
presented at the 2005 IFIP Networking Conf., Waterloo, Ontario,
Canada, May 2005.
[18] M. P. H. Stumpf, C. Wiuf, and R. M. May, Subnets of scale-free
networks are not scale-free: Sampling properties of networks, Proc.
National Academy of Sciences, vol. 102, no. 12, pp. 42214224, Mar.
2005.
[19] A. A. Tsay, W. S. Lovejoy, and D. R. Karger, Random sampling in
cut, flow, and network design problems, Math. Oper. Res., vol. 24, no.
2, pp. 383413, Feb. 1999.
[20] A. Lakhina, J. W. Byers, M. Crovella, and P. Xie, Sampling biases in
IP topology measurements, presented at the IEEE INFOCOM2003,
San Francisco, CA.
[21] D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, On the bias
of traceroute sampling; or, power-law degree distributions in regular
graphs, presented at the 2005 Symp. Theory of Computing, Baltimore,
MD, May 2005.
Page 47 of 165
Page 48 of 165
RANDOM STRING GENERATION USING BIOMETRIC
AUTHENTICATION SCHEME FOR PROTECTING AND SECURING
DATA IN DISTRIBUTED SYSTEMS
Asst Prof B.SHANMUGHA SUNDARAM
1
, S.MADHUMATHY
2
, R.MONISHA
3
Gojan School of Business & Technology
ssmcame@gmail.com
1
,madhumathy17@gmail.com
2
,monisha179@gmail.com
3

Abstract- Remote authentication is the most commonly used method to determine the identity of a remote
client. Three factor authentications provide smart-card based authentication with biometric after a
successful completion of a valid password. It could also fail if these authentication factors are compromised
(e.g., an attacker has successfully obtained the password and the data in the smart card). A generic and
secure framework is proposed to upgrade the three-factor authentication. A random string is generated after
the validation using human characteristics to login successfully Using smart card based password
authentication protocol and cryptographic algorithm
Index TermsAuthentication, distributed systems, security, privacy, password, biometrics.
1 INTRODUCTION
IN a distributed system, various resources are
distributed in the form of network services provided
and managed by servers. Remote authentication is the
most commonly used method to determine the
identity of a remote client. In general, there are three
authentication factors:
1. Something the client knows: password.
2. Something the client is: biometric characteristics
(e.g., fingerprint, voiceprint, and iris scan)
3. Something that the client gets: Random string.
Most early authentication mechanisms are solely
based on password. While such protocols are
relatively easy to implement, passwords (and human
generated passwords in particular) have many
vulnerabilities. As an example, human generated and
memorable passwords are usually short strings of
characters and (sometimes) poorly selected. By
exploiting these vulnerabilities, simple dictionary
attacks can crack passwords in a short time. Due to
these concerns, hardware authentication tokens are
introduced to strengthen the security in user
authentication, and smart-card-based password
authentication has become one of the most common
authentication mechanisms. Smart-card-based
password authentication provides two factor
authentications, namely a successful login requires
the client to have a valid smart card and a correct
password. While it provides stronger security
guarantees than password authentication, it could also
fail if both authentication factors are compromised
(e.g., an attacker has successfully obtained the
Password and the data in the smart card). In this case,
a third authentication factor can alleviate the problem
and further improve the systems assurance. Another
authentication mechanism is biometric
authentication, where users are identified by their
measurable human characteristics, such as
fingerprint, voiceprint, and iris scan. Biometric
characteristics are believed to be a reliable
authentication factor since they provide a potential
source of high-entropy information and cannot be
easily lost or forgotten. Despite these merits,
biometric authentication has some imperfect features.
Unlike password, biometric characteristics cannot be
easily changed or revoked. Some biometric
characteristics (e.g., fingerprint) can be easily
obtained without the awareness of the owner. This
motivates the three-factor authentication, which
incorporates the advantages of the authentication
based on password, biometrics and generation of
random string.
1.1 MOTIVATION
The motivation of this paper is to investigate a
systematic approach for the design of secure three-
factor authentication with the protection of user
privacy. Three-factor authentication is introduced to
incorporate the advantages of the authentication
based on password, biometrics and generation of
random string. A well designed three-factor
authentication protocol can greatly improve the
information assurance in distributed systems.
1.1.1 SECURITY ISSUES
Page 49 of 165
Most existing three factor authentication protocols
are flawed and cannot meet security requirements in
their applications. Even worse, some improvements
of those flawed protocols are not secure either. The
research history of three-factor authentication can be
summarized in the following sequence.
NEW PROTOCOLS! BROKEN! IMPROVED
PROTOCOLS! BROKEN AGAIN!
1.1.2 PRIVACY ISSUES
Along with the improved security features, three-
factor authentication also raises another subtle issue,
namely how to protect the biometric data. Not only is
this the privacy information of the owner, it is also
closely related to the security in the authentication.
As biometrics cannot be easily changed, the breached
biometric information (either on the server side or the
client side) will make the biometric authentication
totally meaningless. However, this issue has received
less attention than it deserves from protocol
designers. We believe it is worthwhile, both in theory
and in practice, to investigate a generic framework
for three-factor authentication, which can preserve
the security and the privacy in distributed systems.
1.2 CONTRIBUTIONS
The main contribution of this paper is a generic
framework for three-factor authentication in
distributed systems. The proposed framework has
several merits as follows: First, we demonstrate how
to incorporate biometrics in the existing
authentication based on password. Our framework is
generic rather than instantiated in the sense that it
does not have any additional requirements on the
underlying smart-card-based password
authentication. Not only will this simplify the design
and analysis of three-factor authentication protocols,
but also it will contribute a secure and generic
upgrade from two-factor authentication to three-
factor authentication possessing the practice-friendly
properties of the underlying two-factor authentication
system. Second, authentication protocols in our
framework can provide true three-factor
authentication, namely a successful authentication
requires password, biometric and generation of
random string. Characteristics In addition, our
framework can be easily adapted to allow the server
to decide the authentication factors in user
authentication (instead of all three authentication
factors). Last, in the proposed framework clients
biometric characteristics are kept secret from servers.
This not only protects user privacy but also prevents
a single-point failure (e.g., a breached server) from
undermining the authentication level of other
services. Furthermore, the verification of all
authentication factors is performed by the server. In
particular, our framework does not rely on any
trusted devices to verify the authentication factors,
which also meets the imperfect feature of distributed
systems where devices cannot be fully trusted.
1.3 RELATED WORK
Several authentication protocols have been proposed
to integrate biometric authentication with password
authentication and/or smart-card authentication. Lee
et al. designed an authentication system which does
not need a password table to authenticate registered
users. Instead, smart card and fingerprint are required
in the authentication; Lee et al.s scheme is insecure
under conspiring attack. Lin and Lai showed that Lee
et al.s scheme is vulnerable to masquerade attack.
Namely, a legitimate user (i.e., a user who has
registered on the system) is able to make a successful
login on behalf of other users. An improved
authentication protocol was given by Lin and Lai to
fix that flaw. The new protocol, however, has several
other security vulnerabilities. First, Lin-Lais scheme
only provides client authentication rather than mutual
authentication, which makes it susceptible to the
server spoofing attack. Second, the password
changing phase in Lin- Lais scheme is not secure as
the smart card cannot check the correctness of old
passwords .
Third, Lin-Lais scheme is insecure under
impersonation attacks due to the analysis given by
Yoon and Yoo , who also proposed a new scheme.
However, the new scheme is broken and improved by
Lee and Kwon. In , Kim et al. proposed two ID-based
password authentication schemes where users are
authenticated by smart cards, passwords, and
fingerprints. However, Scott showed that a passive
eavesdropper (without access to any smart card,
password or fingerprint) can successfully login to the
server on behalf of any claiming identity after
passively eavesdropping only one legitimate login.
Bhargav-Spantzel et al. proposed a privacy
preserving multifactor authentication protocol with
biometrics. The authentication server in their protocol
does not have the biometric information of registered
clients. However, the biometric authentication is
implemented using zero knowledge proofs, which
requires the server to maintain a database to store all
users commitments and uses costly modular
exponentiations in the finite group.
Page 50 of 165
2 PRELIMINARIES
This section reviews the definitions of smart-card-
based password authentication, three-factor
authentication, and fuzzy extractor.
2.1 Smart-Card-Based Password Authentication
Definition 1. A smart-card-based password
authentication protocol (hereinafter referred to as
SCPAP) consists of four phases.
2-Factor-Initialization: The server (denoted by S)
generates two system parameters PK and SK. PK is
published in the system, and SK is kept secret by S.
2-Factor-Reg: The client (denoted by C), with an
initial password PW, registers on the system by
running this interactive protocol with S. The output
of this protocol is a smart card SC. An execution of
this protocol is denoted by
2-Factor-Login-Auth: This is another interactive
protocol between the client and the server, which
enables the client to login successfully using PW and
SC. An execution of this protocol is denoted by
The output of this protocol is 1 (if the
authentication is successful) or 0 (otherwise).
2-Factor-Password-Changing: This protocol enables
a client to change his/her password after a successful
authentication (i.e., 2-Factor-Login-Auth outputs
1). The data in the smart card will be updated
accordingly.
The attacker on SCPAP can be classified from two
aspects: the behavior of the attacker and the
information compromised by the attacker.
As an interactive protocol, SCPAP may face passive
attackers and active attackers.
Passive attacker. A passive attacker can obtain
messages transmitted between the client and the
server. However, it cannot interact with the client or
the server.
Active attacker. An active attacker has the full
control of the communication channel. In addition to
message eaves-dropping, the attacker can arbitrarily
inject, modify, and delete messages in the
communication between the client and the server.
On the other hand, SCPAP is a two-factor
authentication protocol, namely a successful login
requires a valid smart card and a correct password.
According to the compromised secret, an attacker can
be further classified into the following two types.
Attacker with smart card. This type of attacker has
the smart card, and can read and modify the data in
the smart card. Notice that there are techniques to
restrict access to both reading and modifying data in
the smart card. Nevertheless, from the security point
of view, authentication protocols will be more robust
if they are secure against attackers with the ability to
do that.
Attacker with password. The attacker is assumed to
have the password of the client but is not given the
smart card.
Definition 2 (Secure SCPAP). The basic security
requirement of SCPAP is that it should be secure
against a passive attacker with smart card and a
passive attacker with password. It is certainly more
desirable that SCPAP is secure against an active
attacker with smart card and an active attacker with
password.
2.2 THREE-FACTOR AUTHENTICATION
Three-factor authentication is very similar to smart-
card-based password authentication, with the only
difference that it requires biometric characteristics as
an additional authentication factor.
Definition 3 (Three-Factor Authentication). A three-
factor authentication protocol involves a client C and
a server S, and consists of five phases.
3-Factor-Initialization: S generates two system
para-meters PK and SK. PK is published in the
system, and SK is kept secret by S. An execution of
this algorithm is denoted by 3-Factor-Initialization
PK; SK.
3-Factor-Reg: A client C, with an initial password
PW and biometric characteristics BioData, registers
on the system by running this interactive protocol
with the server S. The output of this protocol is a
smart card SC, which is given to C. An execution of
this protocol is denoted by
3-Factor-Login-Auth: This is another interactive
protocol between the client C and the server S, which
enables the client
to login successfully using PW, SC, and BioData.
The output of this protocol is 1 (if the
authentication is successful) or 0 (otherwise).
Page 51 of 165
3-Factor-Password-Changing: This protocol
enables a client to change his/her password after a
successful authentication. The data in the smart card
will be updated accordingly.
3-Factor-Biometrics-Changing: An analogue of
pass-word-changing is biometrics-changing, namely
the client can change his/her biometrics used in the
authentication, e.g., using a different finger or using
iris instead of finger.
While biometrics-changing is not supported by
previous three-factor authentication protocols, we
believe it provides the client with more flexibility in
the authentication.
Security requirements. A three-factor authentication
protocol can also face passive attackers and active
attackers as defined in SCPAP (Section. 2.1). A
passive (an active) attacker can be further classified
into the following three types.
Type I attacker has the smart card and the biometric
characteristics of the client. It is not given the
password of that client.
Type II attacker has the password and the biometric
characteristics. It is not allowed to obtain the data in
the smart card.
Type III attacker has the smart card and the password
of the client. It is not given the biometric
characteristics of that client. Notice that such an
attacker is free to mount any attacks on the
(unknown) biometrics, including biometrics faking
and attacks on the metadata (related to the
biometrics) stored in the smart card.
Definition 4 (Secure Three-Factor Authentication).
For a three-factor authentication protocol, the basic
security require-ment is that it should be secure
against passive type I, type II, and type III attackers.
It is certainly more desirable that a three-factor
authentication protocol is secure against active type I,
type II, and type III attackers.
2.3 FUZZY EXTRACTOR
A fuzzy extractor extracts a nearly random string R
from its biometric input w in an error-tolerant way. If
the input changes but remains close, the extracted R
remains the same. To assist in recovering R from a
biometric input w
0
, a fuzzy extractor outputs an
auxiliary string P. However, R remains uniformly
random even given P. The fuzzy extractor is formally
defined as below.
Definition 5 (Fuzzy Extractor). A fuzzy extractor is
given by two procedures Gen; Rep.
Gen is a probabilistic generation procedure, which on
(biometric) input w 2 M outputs an extracted string
R 2 f0; 1g

and an auxiliary string P.



Rep is deterministic reproduction procedure allowing
recovering R from the corresponding auxiliary string
P.
3. CHALLENGES IN BIOMETRIC
AUTHENTICATION
This section is devoted to a brief description of three
subtle issues in biometric authentication, namely
privacy issues, error tolerance, and non trusted
devices.
3.1 PRIVACY ISSUES
A trivial way to include biometric authentication in
smart-card-based password authentication is to scan
the biometric characteristics and store the extracted
biometric data as a template in the server. During the
authentication, a comparison is made between the
stored data and the input biometric data. If there is a
sufficient commonality, a biometric authentication is
said to be successful. This method, how-ever, will
raise several security risks, especially in a multi-
server environment where user privacy is a concern
(e.g., in a distributed system). First, servers are not
100 percent secure. Servers with weak security
protections can be broken in by attackers, who will
obtain the biometric data on those servers. Second,
servers are not 100 percent trusted. Server-A
(equivalently, its curious administrator) could try to
login to Server-B on behalf of their common clients,
or distribute users biometric information in the
system. In either case, user privacy will be
compromised, and a single-point failure on a server
will downgrade the whole systems security level
from three-factor authentication to two-factor
authentication (since clients are likely to register the
same biometric characteristics on all servers in the
system).
3.2 ERROR TOLERANCE AND NON
TRUSTED DEVICES
One challenge in biometric authentication is that
biometric characteristics are prone to various noise
during data collecting, and this natural feature makes
Page 52 of 165
it impossible to reproduce precisely each time
biometric characteristics are measured. A practical
biometric authentication protocol cannot simply
compare the hash or the encryption of biometric
templates (which requires an exact match). Instead,
biometric authentication must tolerate failures within
a reasonable bound. Another issue in biometric
authentication is that the verification of biometrics
should be performed by the server instead of other
devices, since such devices are usually remotely
located from the server and cannot be fully trusted.
CONCLUSION
Preserving security and privacy is a challenging issue
in distributed systems. This paper makes a step
forward in solving this issue by proposing a generic
framework for three-factor authentication to protect
services and resources from unauthorized use. The
authentication is based on password, Biometric and
random string. Our framework not only demonstrates
how to obtain secure three-factor authentication from
two-factor authentication, but also addresses several
prominent issues of biometric authentication in
distributed systems (e.g., client privacy and error
tolerance). The analysis shows that the framework
satisfies all security requirements on three-factor
authentication and has several other practice-friendly
properties (e.g., key agreement, forward security, and
mutual authentication).The future work is to fully
identify the practical threats on three-factor
authentication and develop concrete three factor
authentication protocols with better performances.
REFERENCES
[1] D.V. Klein, Foiling the Cracker: A Survey of,
and Improvements to, Password Security, Proc.
Second USENIX Workshop Security, 1990.
[2] Biometrics: Personal Identification in Networked
Society, A.K. Jain, R. Bolle, and S. Pankanti, eds.
Kluwer, 1999.
[3] D. Maltoni, D. Maio, A.K. Jain, and S.
Prabhakar, Handbook of Fingerprint Recognition.
Springer-Verlag, 2003.
[4] Ed. Dawson, J. Lopez, J.A. Montenegro, and E.
Okamoto, BAAI: Biometric Authentication and
Authorization Infrastructure, Proc. IEEE Intl Conf.
Information Technology: Research and Education
2004.
[5] J.K. Lee, S.R. Ryu, and K.Y. Yoo, Fingerprint-
Based Remote User Authentication Scheme Using
Smart Cards, Electronics Letters, vol. 38, no. 12, pp.
554-555, June 2002.
[6] C.C. Chang and I.C. Lin, Remarks on
Fingerprint-Based Remote User Authentication
Scheme Using Smart Cards,ACM SIGOPS
Operating Systems Rev., vol. 38, no. 4, pp. 91-96,
Oct. 2004.
[7] C.H. Lin and Y.Y. Lai, A Flexible Biometrics
Remote User Authentication Scheme, Computer
Standards Interfaces, vol. 27, no. 1, pp. 19-23, Nov.
2004.
[8] M.K. Khan and J. Zhang, Improving the Security
of A Flexible Biometrics Remote User
Authentication Scheme, Computer Standards
Interfaces, vol. 29, no. 1, pp. 82-85, Jan. 2007.
[9] C.J. Mitchell and Q. Tang, Security of the Lin-
Lai Smart Card Based User Authentication Scheme,
Technical Report
http://www.ma.rhul.ac.uk/static/techrep/2005/
[10] E.J. Yoon and K.Y. Yoo, A New Efficient
Fingerprint-BasedRemote User Authentication
Scheme for Multimedia Systems,Proc. Ninth Intl
Conf. Knowledge-Based Intelligent Information and
Eng. Systems (KES), 2005.
Page 53 of 165
Secure Electronic Data Interchange In Cloud Computing
K.Dineshkumar, Lecturer
Department of Computer
Science & Engineering
Gojan School of Business &
Technology
dineshkumar_k@ymail.com
G.B.Anjani Prasanna
IV CSE
Gojan School of Business &
Technology
anju.anjani14@gmail.com
A.LeemaRose
IV CSE
Gojan School of Business &
Technology
aleemarosec@gmail.com
AbstractCloud Computing has been
envisioned as the next-generation
architecture of IT Enterprise. It moves the
application software and databases to the
centralized large data centers, where the
management of the data and services may
not be fully trustworthy. This unique
paradigm brings about many new security
challenges, which have not been well
understood. This work studies the problem
of ensuring the integrity of data storage in
Cloud Computing. In particular, we consider
the task of allowing a third party auditor
(TPA), on behalf of the cloud client, to
verify the integrity of the dynamic data
stored in the cloud. The introduction of TPA
Eliminates the involvement of the client
through the auditing of whether his data
stored in the cloud are indeed intact, which
can be important in achieving economies of
scale for Cloud Computing. The support for
data dynamics via the most general forms of
data operation, such as block modification,
insertion, and deletion, is also a significant
step toward practicality, since services in
Cloud Computing are not limited to archive
or backup data only. While prior works on
ensuring remote data integrity often lacks
the support of either public auditability or
dynamic data operations, this paper achieves
both. We first identify the difficulties and
potential security problems of direct
extensions with fully dynamic data updates
from prior works and then show how to
construct an elegant verification scheme for
the seamless integration of these two salient
features in our protocol design. In particular,
to achieve efficient data dynamics, we
improve the existing proof of storage models
by manipulating the classic Merkle Hash
Tree construction for block tag
authentication. To support efficient handling
of multiple auditing tasks, we further
explore the technique of bilinear aggregate
signature to extend our main result into a
multiuser setting, where TPA can perform
multiple auditing tasks simultaneously.
Extensive security and performance analysis
show that the proposed schemes are highly
efficient and provably secure.
Index TermsData storage, public
auditability, data dynamics, cloud
computing.
Introduction
Several trends are opening up the era of
Cloud Computing, which is an Internet-
based development and use of computer
technology. The ever cheaper and more
powerful processors, together with the
software as a service (SaaS) computing
architecture, are transforming data centers
into pools of computing service on a huge
scale. Meanwhile, the increasing network
bandwidth and reliable yet flexible network
connections make it even possible that
clients can now subscribe high-quality
services from data and software that reside
solely on remote data centers. In view of the
key role of public auditability and data
dynamics for cloud data storage, we propose
an efficient construction for the seamless
Page 54 of 165
integration of these two components in the
protocol design. Our contribution can be
summarized as follows:
1. We motivate the public auditing system of
data storage security in Cloud Computing,
and propose a protocol supporting for fully
dynamic data operations, especially to
support block insertion, which is missing in
most existing schemes.
2. We extend our scheme to support scalable
and efficient public auditing in Cloud
Computing. In particular, our scheme
achieves batch auditing where multiple
delegated auditing tasks from different users
can be performed simultaneously by the
TPA.
3. We prove the security of our proposed
construction and justify the performance of
our scheme through concrete
implementation and comparisons with the
state of the art.
PROBLEM STATEMENT
System Model
A Representative network architecture for
cloud data storage is illustrated in Fig. 1.
Three different network entities can be
identified as follows:
Client: an entity, which has large data files
to be stored in the cloud and relies on the
cloud for data maintenance and
computation, can be either individual
consumers or organizations;
Cloud Storage Server (CSS): an entity,
which is managed by Cloud Service
Provider (CSP), has significant storage
space and computation resource
to maintain the clients data;
Third Party Auditor: an entity, which has
expertise and capabilities that clients do not
have, is trusted to assess and expose risk of
cloud storage services on behalf of the
clients upon request.
In the cloud paradigm, by putting the large
data files on the remote servers, the clients
can be relieved of the burden of storage and
computation. As clients no longer possess
their data locally, it is of critical importance
for the clients to ensure that their data are
being correctly stored and maintained. That
is, clients should be equipped with certain
security means so that they can periodically
verify the correctness of the remote data
even without the existence of local copies.
In case that clients do not necessarily have
the time, feasibility or resources to monitor
their data, they can delegate the monitoring
task to a trusted TPA. In this paper, we only
Consider verification schemes with public
auditability: any TPA in possession of the
public key can act as a verifier. We assume
that TPA is unbiased while the server is
untrusted. For application purposes, the
clients may interact with the cloud servers
via CSP to access or retrieve their prestored
Data . More importantly, in practical
scenarios, the client may frequently perform
block-level operations on the data files. The
most general forms of these operations we
consider in this paper are modification,
insertion, and deletion. Note that we dont
address the issue of data privacy in this
paper, as the topic of data privacy in Cloud
Computing is orthogonal to the problem we
study here.
Security Model
Following the security model defined in [4],
we say that the checking scheme is secure if
1) there exists no polynomial time algorithm
that can cheat the verifier with no negligible
Page 55 of 165
probability; and 2) there exists a polynomial
time extractor that can recover the original
data files by carrying out multiple
challenges-responses. The client or
TPA can periodically challenge the storage
server to ensure the correctness of the cloud
data, and the original files can be recovered
by interacting with the server. The authors in
[4] also define the correctness and
soundness of their scheme: the scheme is
correct if the verification algorithm accepts
when interacting with the valid prover (e.g.,
the server returns a valid response) and it is
sound if any cheating server that convinces
the client it is storing the data file is actually
storing that file. Note that in the game
between the adversary and the client, the
adversary has full access to the information
stored in the server, i.e., the adversary can
play the part of the prover (server). The goal
of the adversary is to cheat the verifier
successfully, i.e., trying to generate valid
responses and pass the data verification
without being detected. Our security model
has subtle but crucial difference from that of
the existing PDP or PoR models [2], [3], [4]
in the verification process. As mentioned
above, these schemes do not consider
dynamic data operations, and the block
insertion cannot be supported at all. This is
because the construction of the signatures is
involved with the file index information i.
Therefore, once a file block is inserted, the
computation overhead is unacceptable since
the signatures of all the following file blocks
should be recomputed with the new indexes.
To deal with this limitation, we remove the
index information i in the computation of
signatures and use Hmi as the tag for
block mi instead of Hnameki [4] or
hvki [3], so individual data operation on
any file block will not affect the others.
Recall that in existing PDP or PoR models
[2], [4], Hnameki or hvki should be
generated by the client in the verification
process. However, in our new construction
the client has no capability to calculate
Hmi without the data information. In
order to achieve this blockless verification,
the server should take over the job of
computing and then return it to the prover.
The consequence of this variance will lead
to a serious problem: it will give the
adversary more opportunities to cheat the
prover by manipulating Hmi or mi. Due
to this construction; our security model
differs from that of the PDP or PoR models
in both the verification and the data updating
process. Specifically, the tags in our scheme
should be authenticated in each protocol
execution other than calculated or prestored
by the verifier.
The proposed scheme
In this section, we present our security
protocols for cloud data storage service with
the aforementioned research goals in mind.
We start with some basic solutions aiming to
provide integrity assurance of the cloud data
and discuss their demerits. Then, we present
our protocol which supports public
auditability and data dynamics. We also
show how to extent our main scheme to
support batch auditing for TPA upon
delegations from multiusers.
Notation and Preliminaries
Bilinear map
A bilinear map is a map e : G _ G ! GT ,
where G is a Gap Diffie-Hellman (GDH)
group and GT is another multiplicative
cyclic group of prime order p with the
following properties [16]: 1) Computable:
there exists an efficiently computable
algorithm for computing e; 2) Bilinear: for
all h1; h2 2 G and a; b 2 ZZp; eha1 ; hb
2 eh1; h2ab; 3) No degenerate: eg;
g 6 1, where g is a generator of G.
Merkle hash tree. A Merkle Hash Tree
(MHT) is a well studied authentication
structure [17], which is intended to
Page 56 of 165
efficiently and securely prove that a set of
elements are undamaged and unaltered. It is
constructed as a binary tree where the leaves
in the MHT are the hashes of authentic data
values. Fig. 2 depicts an example of
authentication. The verifier with the
authentic hr requests for fx2; x7g and
requires the authentication of the received
blocks. However, in this paper, we further
employ MHT to authenticate both the
values and the positions of data blocks. We
treat the leaf nodes as the left-to-right
sequence, so any leaf node can be uniquely
determined by following this sequence and
the way of computing the root in MHT.
Basic Solutions
Assume the outsourced data file F consists
of a finite ordered set of blocks m1;m2; . .
.;mn. One straightforward way to ensure the
data integrity is to precompute MACs for
the entire data file. Specifically, before data
outsourcing, the data owner precomputes
MACs of F with a set of secret keys and
stores them locally. During the auditing
process, the data owner each time reveals a
secret key to the cloud server and asks for a
fresh keyed MAC for verification. This
approach provides deterministic data
integrity assurance straightforwardly as the
verification covers all the data blocks.
However, the number of verifications
allowed to be performed in this solution is
limited by the number of secret keys. Once
the keys are exhausted, the data owner has
to retrieve the entire file of F from the server
in order to compute new MACs, which is
usually impractical due to the huge
communication overhead. Moreover, public
auditability is not supported as the private
keys are required for verification. Another
basic solution is to use signatures instead of
MACs to obtain public auditability. The data
owner precomputes the signature of each
block mi (i 2 1; n_) and sends both F and
the signatures to the cloud server for storage.
To verify the correctness of F, the data
owner can adopt a spot-checking approach,
i.e., requesting a number of randomly
selected blocks and their corresponding
signatures to be returned. This basic solution
can provide probabilistic assurance of the
data correctness and support public
auditability. However, it also severely
suffers from the fact that a considerable
number of original data blocks should be
retrieved to ensure a reasonable detection
probability, which again could result in a
large communication overhead and greatly
affects system efficiency. Notice that the
above solutions can only support the case of
static data, and none of them can deal with
dynamic data updates.
Conclusion
To ensure cloud data storage security, it is
critical to enable a TPA to evaluate the
service quality from an objective and
independent perspective. Public auditability
also allows clients to delegate the integrity
verification tasks to TPA while they
themselves can be unreliable or not be able
to commit necessary computation resources
performing continuous verifications.
Another major concern is how to construct
verification protocols that can accommodate
dynamic data files. In this paper, we
explored the problem of providing
simultaneous public auditability and data
dynamics for remote data integrity check in
Cloud Computing. Our construction is
Page 57 of 165
deliberately designed to meet these two
important goals while efficiency being kept
closely in mind. To achieve efficient data
dynamics, we improve the existing proof of
storage models by manipulating the classic
Merkle Hash Tree construction for block tag
authentication. To support efficient handling
of multiple auditing tasks, we further
explore the technique of bilinear aggregate
signature to extend our main result into a
multiuser setting, where TPA can perform
multiple auditing tasks simultaneously.
Extensive security and performance analysis
show that the proposed scheme is highly
efficient and provably secure.
References
[1] Q. Wang, C. Wang, J. Li, K. Ren, and W.
Lou, Enabling Public Verifiability and Data
Dynamics for Storage Security in Cloud
Computing, Proc. 14th European Symp.
Research in Computer Security (ESORICS 09),
pp. 355-370, 2009.
[2] G. Ateniese, R. Burns, R. Curtmola, J.
Herring, L. Kissner, Z. Peterson, and D. Song,
Provable Data Possession at Untrusted Stores,
Proc. 14th ACM Conf. Computer and Comm.
Security (CCS
07), pp. 598-609, 2007.
[3] A. Juels and B.S. Kaliski Jr., Pors: Proofs of
Retrievability for Large Files, Proc. 14th ACM
Conf. Computer and Comm. Security (CCS
07), pp. 584-597, 2007.
[4] H. Shacham and B. Waters, Compact
Proofs of Retrievability, Proc. 14th Intl Conf.
Theory and Application of Cryptology and
Information Security: Advances in Cryptology
(ASIACRYPT 08), pp. 90-107, 2008.
[5] K.D. Bowers, A. Juels, and A. Oprea,
Proofs of Retrievability: Theory and
Implementation, Report 2008/175, Cryptology
ePrint Archive, 2008.
[6] M. Naor and G.N. Rothblum, The
Complexity of Online Memory Checking, Proc.
46th Ann. IEEE Symp. Foundations of
Computer Science (FOCS 05), pp. 573-584,
2005.
[7] E.-C. Chang and J. Xu, Remote Integrity
Check with Dishonest Storage Server, Proc.
13th European Symp. Research in Computer
Security (ESORICS 08), pp. 223-237, 2008.
[8] M.A. Shah, R. Swaminathan, and M. Baker,
Privacy-Preserving Audit and Extraction of
Digital Contents, Report 2008/186, Cryptology
ePrint Archive, 2008.
[9] A. Oprea, M.K. Reiter, and K. Yang, Space-
Efficient Block Storage Integrity, Proc. 12th
Ann. Network and Distributed System Security
Symp. (NDSS 05), 2005.
[10] T. Schwarz and E.L. Miller, Store, Forget,
and Check: Using Algebraic Signatures to
Check Remotely Administered Storage, Proc.
26th IEEE Intl Conf. Distributed Computing
Systems (ICDCS
06), p. 12, 2006.
Page 58 of 165
Online Credit Card Fraudulent Detection
Using Data Mining



VIDHYAA M.
mvidhya.btech87@gmail.com
KINGS COLLEGE OF ENGG.,
TANJORE

Abstract As e-commerce sales continue to grow, the associated
online fraud remains an attractive source of revenue for
fraudsters. These fraudulent activities impose a considerable
financial loss to merchants, making online fraud detection a
necessity. The problem of fraud detection is concerned with not
only capturing the fraudulent activities, but also capturing them
as quickly as possible. This timeliness is crucial to decrease
financial losses. In this research, a profiling method has been
proposed for credit card fraud detection. The focus is on fraud
cases which cannot be detected at the transaction level. In the
proposed method the patterns inherent in the time series of
aggregated daily amounts spent on an individual credit card
account has been extracted. These patterns have been used to
shorten the time between when a fraud occurs and when it is
finally detected, which resulted in timelier fraud detection,
improved detection rate and less financial loss.

Keywords- Fraud detection; aggregation; profile; credit card;
time series;

I. INTRODUCTION
Nowadays fraud detection is a hot topic in the context
of electronic payments. This is mostly due to considerable
financial losses incurred by payment card companies for
fraudulent activities. According to a CyberSource study
conducted in 2010, the percent of payment fraud lost in the
United States and Canada was $3.3 billion in 2009 which is a
considerable number [1].

A good fraud detection system should be able to
identify the fraudulent activities accurately and also as quickly
as possible. Fraud detection approaches can be divided into
two main groups: misuse detection and anomaly detection. A
misuse detection system is trained on examples of normal and
fraudulent transactions. So they can only recognize known
frauds. While an anomaly detection system is trained only on
normal transactions and they have a potential to detect novel
frauds. Difficult access to labeled data and the evolving nature
of fraudulent activities, leads to more concentration on
anomaly detection techniques. In these techniques the
cardholder's profile is constructed based on his normal
spending habits and any inconsistency with regards to this
normal profile is considered as a potential fraud. The problem
with this approach is the large number of false alarms due to
normal changes in cardholders behavior.

Using anomaly detection techniques for fraud
detection involves constructing an efficient profile which
considers all aspects of a cardholder behavior. Usually a
fraudster is not familiar with the spending habits of a
cardholder, while try to get the most profit from a stolen card.
Hence they tend to perform high value transactions, which
usually have a different characteristic from the normal
cardholder transactions.





SHANTHALAKSHMI K.
shanthakirishnaswamy@gmail.com
KINGS COLLEGE OF ENGG.,
TANJORE

In this context the transactional profile can reveal the frauds.
Many researches consider this kind of fraudulent activities and
construct a transactional profile [7], [8], [9] and [10]. But more
cautious fraudsters try to follow the normal behaviors of card
holder or perform low value transactions in short time
intervals. In this case the frequency or volume of transactions
is a much better indicator of fraud compared to the
characteristics of each individual transaction. For instance, in
these frauds the total number or total amount spent on a credit
card over a specific time window increases. A few researches
consider this type of frauds and construct an aggregated profile
The problem with this approach is the late detection because
the system has to wait until the end of the aggregation period
before it can make a decision. This problem seems more
crucial when the aggregation period is considerable. Also
some useful information like the order of data is lost during the
aggregation. This order of data is another aspect of a
cardholder behavior which can be used to detect some types of
frauds.

In this research, we approach the credit card fraud
detection problem with an improved aggregated profile. For
this purpose the sequence of aggregated daily amounts spent
on an individual cardholder in a time window has been
considered. Then the inherent patterns in these time series
have been extracted to shorten the time between when a fraud
occurs and when it is finally detected. Indeed we have taken
advantage of the order of data to timelier fraud detection. We
demonstrate that the proposed approach leads to improved
detection rate and timeliness while it decreases the cost
involved in some circumstances.

II. RELATED WORKS

Misuse detection and anomaly detection are the two
main approaches used for credit card fraud detection. The
emphasis on misuse detection approaches is usually upon
applying classification methods at transaction level. For a
recent survey of applying misuse detection techniques in the
area of credit card fraud detection see [2], [3], [4] and [5]. In
these researches various classification methods like neural
networks, decision trees, logistic regression and support vector
machine have been used and compared against each other in
the area of credit card fraud detection. Also a recent research
in [6] various classification methods have been applied on
aggregated transactions. This research has demonstrated that
aggregated values are a better indicator of frauds in some
circumstances.
Among the researches which have been conducted on
credit card fraud detection we have concentrated on the ones
which apply anomaly detection techniques, the so called
behavioral or profile-base techniques.





Page 59 of 165
Typically they have constructed a cardholder profile
based on normal training data and then tried to detect
fraudulent activities based on the inconsistencies with the
normal behavior. Most of these researches have applied data
mining techniques like clustering and association rules to
construct a transactional profile. For instance, in [7] self
organization map has been used to cluster customer
transactions. The density of each cluster is the basis of
distinction between normal and rare behavior of customers
which can be used for detect suspicious activities. Also in [8]
DBSCAN, which is a density based clustering algorithm, has
been used to create clusters of customer transactions and build
a transactional profile. An example of using association rules
can be found in [9]. In this research recent transactions of a
customer have been dynamically profiled using association
rules, to indicate how unusual a new transaction is. The word
recent is defined by a sliding window.

In a few researches in this area, the sequence of
transactions has been considered for building customer
profiles. An example of which can be found in [10]. In this
research a Hidden Markov Model for each customer has been
built during the training phase based on a sequence of
transaction amounts. When a new transaction arrives, a new
sequence is constructed by dropping the first member of the
old sequence and appending the new transaction at the end. If
the new sequence is not accepted by the trained model, it is
considered as fraud. In another research in which combines
anomaly and misuse detection techniques, normal and
fraudulent sequences of quantized transaction amounts have
been formed to capture the cardholder behavior. Then a
sequence alignment technique has been used to measure the
similarity between a new sequence and the training model. In a
different research in for each target cardholder, sequences of
daily transaction amounts have been compared against the
other cardholders to find the k nearest ones. These similar
sequences have been grouped to form the peer group of that
cardholder. If the future sequences of that cardholder deviate
from its peer group, a fraud alarm is raised. The basis of this
research is the assumption that when a group of cardholders
are behaving similarly until a specific time, it is very likely
that they will continue to have the same behavior for a while.

III. PROPOSED METHOD

In this research, we have explored the application of
transaction sequence for the purpose of timelier credit card
fraud detection. The focus in this work is on fraud cases which
cannot be detected at transaction level. Indeed, we have
proposed an improved aggregated profile which exploits the
inherent patterns in time series of transactions. Some extensive
modeling on real data reveals strong weekly and monthly
periodic structure in cardholder spending behavior. Based on
these observations we believe that instead of looking at
individual transactions, it makes more sense to look at
sequences of transactions. But it is impractical to consider the
entire series of cardholder transactions because of the high
dimensionality of this data. So we model the time with a
sequence of aggregated transactions which can reduce the
dimensionality. Also aggregated transactions are more robust
to minor shift in cardholder behavior.

To form the time series, the total amount of
transactions in each day of year has been calculated. Then the
ordered series of these aggregated values form the time series.


Like the aforementioned researches [13], [14] which consider
7 days for aggregation, we form 7-day time series. So each
time series consists of 7 dimensions each of which corresponds
to the total amount of transactions in one day.

As it is mentioned before, based on some observation
on real data, there are some periodic structures in transactions,
so we expect to find similar trends in yearly 7-day time series.
Also since the first year of each year is considered as the
starting point of the 7-day period of that year, the time series
for each year would be different in terms of days of the week.
For example one year may start on Sunday while the next year
starts on Friday. This implies that for each year the 7-day time
series, of a cardholder that follows a stable weekly trend,
should be aligned in terms of days of the week accordingly.
Furthermore, a cardholder himself may have some shift in
purchasing days. Another pattern is some occasional behavior
that can be seen due to holidays and occasions which are
repeated in all years like the Christmas holidays. In this
research we want to extract these inherent patterns in time
series of aggregated transactions, and apply them to detect
fraudulent activities more timely and accurately. In fact, by
exploiting these patterns we can detect fraud cases before the
end of an aggregation period. The details of constructing
profiles and fraud detection will be explained in the following
sub sections.

A. Make Profile

To construct a cardholder profile, his normal
transactions is needed as training data. As mentioned earlier a
preprocessing step is performed to build time series. Then the
inherent patterns in these time series should be extracted to
build an efficient profile. In this research 2 possible patterns
are extracted from the training data in two steps. The first
possible inherent pattern in a 7-day period could be following
the same trend in all years. For extracting this pattern, time
series have been clustered using k-means, the most popular
clustering algorithm, with Euclidean distance. Since Euclidean
distance is used as the similarity measure, the time series
which have almost the same trend will be placed in the same
cluster. After clustering, if all yearly time series for a specific
7-day period are placed in the same cluster, this period has
been labeled as stable-trend period. Then all of the time series
that belong to these periods are excluded from the training data
and the other ones remain for further analysis in the next stage.
The Euclidean distance is very sensitive to small
distortions in the time axis. If two time series are identical, but
one is different slightly along the time axis, then the Euclidean
distance may consider them to be very different from each
other. But as it was mentioned before, the second possible
inherent pattern in a period could be following the same trend
by permuting the time axis as we can see in Fig.1. So in order
to find the similarity between such sequences, the time axis
should be best aligned before calculating the Euclidean
distance.


Page 60 of 165
The remaining time series from the first stage have been
clustered using this new distance, we call it permuted distance.
For this purpose the k-means algorithm should be changed.
Briefly k-means algorithm selects k initial points as cluster
centers. Then each point is assigned to the closest center using
a distance measure. When all points have been assigned, the
new centers are recalculated by averaging cluster members.
These steps are repeated until the centers no longer move.
Usually the Euclidean distance is used as distance measure in
the k-means algorithm. This should be modified for the
permuted pattern. To find the distance for permuted time
series, any permutation of the time axis for the first one is
considered, and the Euclidean distance between all of them
with the second one is calculated. Then the minimum value is
selected as the distance between the two time series. Also the
current averaging method for finding new centers may not
produce the real average of the time series in our case, thus
resulting in incorrect k-means clustering results. See Fig. 2 and
Fig. 3 for further clarification. Fig. 2 is the result of usual
averaging method of the two time series while we expect the
result which is shown in Fig. 3. So the time series should be
aligned in time axis before calculating the average time series.



Figure 2. Usuall averaging of the two time series

Figure 3. Desired averaging of the two time series

The remaining time series from the first stage are
clustered with this new version of k-means. As a result the
time series which are almost the same after alignment in time
axis are been placed in a same cluster. We labeled the 7-day
periods for which all yearly time series placed in the same
cluster as permuted-trend. Moreover there are some yearly
occasions in which the cardholder behavior is almost the same
for all years like Christmas holidays. So we can improve our
profile by identifying these days in permuted-trend periods.
For this purpose for all time series of these periods the best
alignment for the permuted distance is found. If one day is not
permuted in the best alignments, it is flagged as a stable day.

After these two stages, the remaining periods are
labeled as unpredictable-trend. So at the end of the training
phase we have a time series for each 7-day period of year with
the specification about which groups it belongs to and which
days are stable days for the second group.

B. Fraud Detection
After the training phase, fraudulent activities can be
detected based on the degree of deviation from the cardholder
profile. For this purpose when a new transaction arrives they
are accumulated to build the current period time series. Based
on the type of current period in profile which can be
stabletrend, permuted-trend and unpredictable-trend, the fraud
detection is performed online, at the end of each day or at the
end of period respectively. For the stable-trend periods, since
the cardholder behavior in corresponding days are almost the
same, the fraud detection can be done online. While the
transactions of a day are accumulated, it is compared against
the corresponding values in the profile.

Whenever this value exceeds with a ratio of 1 from the
corresponding amount in the cardholders profile, it indicates a
fraud. For the permuted trend periods, at the end of each day,
the similarity between the current time series with the
corresponding one in the profile is computed. Since in the
middle of a period the current time series is smaller than the
corresponding one in the profile, we should consider all of the
subsets of profile time series with the same length as the
current time series. Then the minimum permuted distance
between them is considered. If this value exceeds from a
threshold 2, it indicates a fraud. Considering all subsets of
profile time series, the days which are flagged as
stable-days should remain immovable. So at the end of each
day we can say that there is some fraud among the days and
we don't have to wait until the end of period. One important
point is that for this group while we make the time series,
whenever a fraud case has been identified in a day, we should
replace this day with the corresponding value from the profile
in order to prevent the fraud value from affecting the decision
for the next days of the period.
Finally, for the unpredictable-trend periods at the end
of 7- day period, we compute the distance between the current
time series and the corresponding one in cardholder profile and
if it exceeds from a threshold 3, it indicates fraud. For this
group, at the end of the period we have a label which tells us
there are some frauds in this period.
The best value for the parameters 1, 2 and 3 is
obtained by examining the performance of the system over
various values for them, and choosing the one with the best
average result on all of the profiles using a tuning set.
Clearly, the proposed method improves the timeliness
of fraud detection which is proved to be most effective in the
stable-trend periods and the permuted-trend, consecutively.
However, the mentioned method does not improve the
timeliness of the fraud detection for the unpredictable-trend
periods.

IV. EXPERIMENTAL RESULTS

The performance of the proposed scheme has been
compared with the performance of the aggregational part of
the offline system proposed in [14]. In that research, the
aggregated profile is constructed based on the weekly behavior
of cardholders and the fraud detection is performed at the end
of each week. We expect that our proposed method can
increase the detection rate and improve the timeliness of that
method. Also the aggregational profile proposed in [13] will be
compared against our proposed method.
In [13] the model of aggregation consists of a set of
descriptors for quantifying time series of cardholder behavior.
These time series are built using all of the k-day periods of
normal transactions. 1, 3 and 7 days periods are used for
evaluation, among them we choose the 7- day one for
comparison, which conforms to our approach.

A. Dataset

To evaluate our work we have developed an
application to generate synthetic data containing genuine and
fraudulent transactions. The profile driven method has been
used for generating data like the one applied in [9]. We believe
that our dataset can give us a good approximation for
evaluation of the proposed method because we use real
scenarios to create the data. As it was mentioned before, based
on some observation on real data, there are some periodic
structure in credit card transaction data and also some
occasional events.

Page 61 of 165
Also there are various weekly and seasonal patterns in
cardholder behaviors. These real scenarios have been applied
in data generation to justify the results. Also normal
distribution, which is the most common observed probability
distribution in many natural processes, has been used to create
number and amount of transactions.
Five attributes for each transaction have been
considered including year, month, week of month, day of week
and amount. The first four attributes indicate the time
sequence of data and the last one is a good descriptor to
quantify the time series. We have created four different
profiles to generate different kinds of cardholder behaviors. In
the first one the cardholder has almost similar periodic
behavior. In the second one the cardholder has similar
behavior with some shift in the time axis. In the third profile
the cardholder has an unpredictable behavior. Finally in the
fourth cardholder has a mixture of different behaviors in
different times. Transactions for three years are created for
each cardholder as training data. Then a mixture of genuine
and fraudulent transactions of one year is generated for test
data. Fraudsters usually follow two different scenarios to avoid
detection: high value transactions with long gaps or small
value ones with short gaps. The first scenario can be detected
by a transactional profile and the second one can be detected
by an aggregational profile. Because we want to evaluate an
aggregational profile, fraudulent activities are created based on
the second scenario.

For each profile three datasets are created. The first
one which contains normal transactions is a training set. The
second and third ones contain a mixture of normal and
fraudulent transactions. The second one is a tuning set which
is used for obtaining the best values for the system parameters
and the last one is a test set used for evaluating the proposed
method. Table 1 shows the number of transactions in each
dataset of the four profiles.


B. Performance Measures
The transactions which are flagged by a fraud
detection system include the fraudulent and normal
transactions whichare classified correctly (TP, TN) and the
fraudulent and normal transaction flagged erroneously (FN,
FP). A good fraud detection system should lead to maximum
number of TP and TN and minimum number of FP and FN.
Several performance measures have been applied for
fraud detection systems. The appropriate one should take into
account the specific issues in fraud detection systems. In a
recent research [18] the appropriate performance measures for
plastic card fraud detection systems have been proposed. We
have applied two measures which are proposed in that research
and widely applied in recent fraud detection researches:
timeliness ratio and loss function. The first one measures the
speed of fraud detection and defined as the proportion of FN to
F. the second one measures the cost involved. In this measure
different cost consider to different error because the FNs are
more serious than the FPs. We use the function used in [6]
which is as (1).





Smaller values for these two measures indicate a better
performance. Also we use a standard measure, TP%, which is
the percent of TPs to all of the fraudulent transactions. Clearly
higher values indicate a better performance.

C. Optimization of Parameters

As discussed in section III, the proposed method has
3 parameters, 1, 2 and 3. In choosing a value for these
parameters, there is a tradeoff between TP% and FP%. In this
work we choose the best value for each of them using
TP/FP(%). The best value for each parameter is obtained by
examining the performance of the system over various values
of them using the tuning sets and choosing the one with the
best average result on all of the profiles. As a result the values
1.4, 0.7 and 0.2 have been obtained experimentally for 1, 2
and 3 respectively.

D. Validation Results

First we study the performance of our aggregation
method to the one proposed in [14]. As we can see in Fig. 4
TP% of our proposed method is better than the one proposed
in [14]. Also Fig. 5 and 6 indicate that the cost and timeliness
of our proposed method is better too. It can be clearly seen
from these figures that when a cardholder follows an almost
stable trend in the corresponding times of the years, the case
which has occurred in the first profile, the performance of the
system increases significantly. It is due to the fact that the
fraudulent activities can be detected in real time. As a result,
more frauds can be detected by the system, in a timelier
manner and with less cost. In the second test case which
indicates a cardholder with the permuted behavior, the
performance of the system is slightly better, because the
fraudulent activities can be detected at the end of each day. But
if the cardholder has an unpredictable behavior, which is
simulated in the third case, the performance of our method is
almost the same as the one proposed in [14] because there is
no pattern in the cardholder behavior which can be used for
timelier detection and the fraudulent activities can be detected
at the end of 7-day periods.





Page 62 of 165



Next the performance of the proposed method is
compared against the aggregation method proposed in [13]. In
that research the procedure for detecting fraudulent activities is
run at the end of each day, considering 7 days before the
current day. It can be seen from Fig. 7, 8 and 9 that almost the
same results are obtained as the previous experiment. One of
the underlying reasons for this improved result may be
considering seasonal behavior in the proposed method. Also
the same reasons as discussed for the previous experiment
apply to this experiment as well.











V. CONCLUSION

In this paper we have addressed the general problem
of credit card fraud detection using anomaly detection
techniques, by exploiting the sequence of transactions in
constructing cardholders' profiles. We have investigated how
this affects detection performance. The focus is on fraud cases
which cannot be detected at the transaction level. A new
method for constructing an aggregated profile is proposed.

To this end the pattern of aggregated daily purchases
of cardholders are extracted from the training data. Due to the
seasonal behavior of cardholders these patterns are time
dependent. Then these extracted patterns have been used for
more accurate fraud detection in a timelier manner.
Experimental results show that the proposed method can
improve the fraud detection in the situations where cardholders
follow some purchasing patterns in corresponding times of the
years.

REFERENCES

[1] CyberSource;"11th Annual Online Fraud Report"; 2010.
http://forms.cybersource.com/forms/FraudReport2010NACYBSwwwQ1
09. last accessed on 2010/09/10.

[2] R. Brause, L. T., and M. Hepp, "Neural data mining for credit card fraud
detection," 11th IEEE International Conference on Machine Learning
and Cybernetics , vol 7, 2008, pp.3630-3634.

[3] R. Chen, S. Luol, X. Liang, and V.C. Lee, "Personalized approach based
on SVM and ANN for detecting credit card fraud", International
Conference on Neural Networks and Brain, 2005, pp. 810-815.

[4] A. Shen, R. Tong, and Y. Deng, "Application of classification models on
credit card fraud detection," International Conference on Service
Systems and Service Management, June 2007, pp. 1-4.

[5] M.F. Gadi, X. Wang, and A.P. Lago, "Comparison with parametric
optimization in credit card fraud detection," Seventh International
Conference on Machine Learning and Applications, 2008, pp. 279-285.

[6] C. Whitrow, D.J. Hand, P. Juszczak, D. Weston, and N.M. Adams,
"Transaction aggregation as a strategy for credit card fraud detection,"
Data Mining and Knowledge Discovery, vol. 18, no. 1, 2009, pp. 30-55.

[7] J. Quah and M. Sriganesh, "Real-time credit card fraud detection using
computational intelligence," Expert Systems with Applications, vol. 35,
no. 4, 2008, pp. 1721-1732.

[8] S. Panigrahi, A. Kundu, S. Sural, and A. Majumdar, "Credit card fraud
detection: A fusion approach using DempsterShafer theory and
Bayesian learning," Information Fusion, vol. 10, no. 4, 2009, p. 9.

[9] J. Xu, A.H. Sung, and Q. Liu, "Behaviour mining for fraud detection,"
Journal of Research and Practice in Information Technology, vol. 39,
no. 1, 2007, pp. 3-18.

[10] A. Srivastava, A. Kundu, S. Sural, and A. Majumdar, "Credit card fraud
detection using Hidden Markov Model," IEEE Transactions on
Dependable and Secure Computing, vol. 5, no. 1, 2008, pp. 37-48.

[11] A. Kundu, S. Sural, and A. Majumdar, "Two-stage credit card fraud
detection using sequence alignment," Information Systems Security,
Springer Berlin / Heidelberg, 2006, pp. 260-275.

[12] D.J. Weston, D.J. Hand, N.M. Adams, C. Whitrow, and P. Juszczak,
"Plastic card fraud detection using peer group analysis," Advances in
Data Analysis and Classification, vol. 2, no. 1, 2008, pp. 45-62.

[13] M. Krivko, "A hybrid model for plastic card fraud detection systems,"
Expert Systems with Applications, vol 37, no 8, 2010, pp 6070-6076.
[14] L.Seyedhossein, M.R. Hashemi, " A hybrid profiling method to detect
heterogeneous credit card frauds", 7th International ISC Conference on
Information Security and Cryptology, 2010, pp 25-32.
Page 63 of 165
ONTOLOGICAL QUERY BASED SEARCH IN AN ONLINE
HEALTH LEVEL SERVICES
R.Abirami, V.Latha
GOJAN SCHOOL OF BUSINESS AND TECHNOLOGY
Email id:abi.cse09@gmail.com,lathu_07@yahoo.com
AbstractIn this paper, we present a system to support patients in search of healthcare services
in an e-health scenario. The proposed system is HL7-aware in that it represents both patient and
service information according to the directives of HL7, the information management standard
adopted in medical context. Our system builds a profile for each patient and uses it to detect
Healthcare Service Providers delivering e-health services potentially capable of satisfying his
needs by the concept of ontological web language. In order to handle this search it can exploit
three different algorithms: the first, called PPB, uses only information stored in the patient
profile; the second, called DSPPB, considers both information stored in the patient profile and
similarities among the e-health services delivered by the involved providers; the third, called AB,
relies on A_, a popular search algorithm in Artificial Intelligence. Our system builds also a social
network of patients; once a patient submits a query and retrieves a set of services relevant to him,
our system applies a spreading activation technique on this social network to find other patients
who may benefit from these services.
Index TermsIntelligent agents, Ontological web language,human-centered computing,
knowledge personalization and customization, HL7, personalized search of services.
1 INTRODUCTION
MOST industrialized countries are shifting
toward a knowledge-based economy in
which knowledge and technology play a key
role to support both productivity and
economic growth. This transition is
characterized by deep changes that affect the
individual quality of life, and requires that
economic development keeps pace with
social progress. In this scenario, it is
possible to foresee a rising demand of adhoc
social services shaped around citizen needs.
The application of Information and
Communication Technologies on the whole
range of health sector activities (also known
as e-health) can simplify the access to
healthcare services and can boost both their
quality and their effectiveness. E-Health
tools allow the construction of patient-
centric Healthcare Service Providers
(hereafter, HCSPs), that aim to support
patients to access health related information,
to prevent their possible diseases and to
monitor their health status. These
considerations explain the large amount of
health-related information disseminated over
the web; as an example, European
Commission has recently activated the EU-
health portal that supplies information on 47
health topics and allows the access to more
than 40,000 trustworthy data sources .
Despite the abundance of available
proposals, the retrieval of interesting
services is not always easy. In fact, existing
HCSPs often use proprietary formats to
represent data and services; as a
consequence, their interoperability and
comparison are generally difficult. In
addition, the vocabulary used by a patient
Page 64 of 165
for composing his queries is often limited
and consists of quite generic terms; on the
contrary, medical resources and services are
often described by means of specialized
terms.
Our system is HL7-aware; in fact, it
uses the Health Level Seven (HL7) standard
to effectively handle both service and patient
information. HL7 provides several
functionalities for the exchange, the
management and the integration of data
regarding both patients and healthcare
services (see below). It is a widely accepted
standard in the marketplace; specifically: 1)
a large number of commercial products
implement and support it; 2) several
research projects adopt it as the reference
format for representing clinical documents;
and 3) various industrial and academical
efforts have been performed to harmonize it
with other standards for the electronic
representation of health-related data. HL7
plays a key role in our system since it allows
interoperability and comparison problems to
be successfully faced.
Our system consists of five main
components, namely:
1. A Patient AgentPA, that allows a
patient to submit queries for detecting
services of interest to him;
2. A Healthcare Service Provider Agent
SPA, that supports a HCSP manager to
maintain the corresponding service database
up-to-date;
3. A Coordinator AgentCA, that
cooperates with Pas and SPAs to detect
those services appearing the closest to
patients queries and profiles;
4. A Healthcare Service Provider
DatabaseSD, that is associated with a
HCSP and manages information about
services delivered by it; and
5. A Patient Profile DatabasePD, that
stores and handles patient profiles; a patient
profile registers patient needs and
preferences.
Each time a patient submits a query,
our system intelligently forwards it only to
the most promising
HCSPs, i.e., to those HCSPs which provide
services that are likely to best match both
the submitted query and the patient profile.
In order to guarantee this important feature,
it implements three ad hoc strategies,
tailored to the maincharacteristics of our
reference context. As will be clear in the
following, these strategies avoid a patient to
manuall contact and query a large number of
HCSPs in order to retrieve services of
interest to him (this last activity is usually
called Brute-Force search in the literature),
and allows those HCSPs that will more
likely provide useful results to be identified
preventively. As a consequence, patient
queries are evaluated against a small number
of HCSPs. This allows more precise and
complete results tobe achieved, as well as
query execution time to be reduced, HCSP
resource management to be improved and,
finally, network performance to be
increased.
These considerations indicate that the usage
of the Intelligent Agent technology is
particularly adequate in this context for
allowing patients to access those services
being particularly tailored to their needs and
desires. Even if our system deals with some
aspects specific of the e-health scenario,
many of the ideas underlying it can be
generalized to support users of any online
community (e.g., users of e-learning or e-
recruitment applications) to search
information of their interest. With a further
effort, some of the ideas expressed in this
paper can be generalized to deal with more
general research problems. In particular, the
following generalizations can be thought:

This would supply the answers to some
relevant research questions like: how social
interaction analysis is effective to define
Page 65 of 165
user preferences? Have the acceptance and
the rejection of a recommendation the same
relevance in learning the profile of the
corresponding user? How do the other users
suggestions generate new interests in a user?
Our three (quite sophisticated and complex)
strategies for HCSP detection could be
generalized in such a way as to define three
query processing strategies capable of
improving traditional ranking functions
generally based on simple metrics like TF
IDF.. We could extend the three strategies
for HCSP detection to solve the problem of
allocating queries among different providers
by considering not only objective
parameters (like throughput and response
time) but also subjective parameters, which
reflect the perception of the user on the
utility of the information received by each
provider. It is worth pointing out that the
query allocation problem has been
extensively studied in the past but, to the
best of our knowledge, all proposed
approaches considered only objective
parameters.
2 HEALTH LEVEL SEVEN
Health Level Seven (hereafter, HL7) project
[1] began in 1987; it was started with the
purpose of simplifying the interoperability
among heterogeneous healthcare
applications; it focuses on the
standardization of the formats for the
exchange of some groups of data generally
present in every healthcare system.
HL7 has been conceived:
1. to support information exchange among
systems characterized by heterogeneous
technologies;
2. to allow each local organization to
perform some variations yet maintaining a
high level of standardization;
3. to allow a gradual evolution in such a way
as to adapt itself to variations in the
healthcare context and to cover each aspect
of the healthcare scenario;
4. to operate without assuming any specific
architecture for the underlying information
systems; and
5. to comply with the other standards
defined in the healthcare context (such as
ACR/NEMA DICOM, ASC X12, ASTM,
IEEE/MEDIX, NCPDP, etc.).
Presently, HL7 handles the following
activities concerning data exchange among
possibly heterogeneous healthcare systems:
1. entrance, dismissal, and transfer of
patients;
2. transmission of orders;
3. management of cost charges;
4. transmission of healthcare data;
5. management of Master Files;
6. management of medical reports;
7. management of bookings;
8. exchange of patient records; and
9. management of hospital treatments.
HL7 V3 extends also the set of formats
for data exchange. In fact, before HL7 V3,
all HL7 messages had to be encoded in a
unique format based on the ASCII code; on
the contrary, after the advent of the third
version, HL7 is capable of supporting also
XML, Active X, and CORBA technologies.
HL7 V3 is strictly related with XML; in
fact, XML is the reference coding language
for HL7 messages. Two committees have
been constituted in order to handle the
interaction between XML and HL7;
specifically: 1) the XML Special Interest
Group provides recommendations about the
usage of XML in all HL7-based platforms,
independently of the characteristics of
involved healthcare providers; and 2) the
Structured Documents Technical Committee
aims to define standards for the structured
documents produced in the healthcare
context. After the core of HL7 was defined,
other HL7-based standards were constructed
Page 66 of 165
in order to handle important aspects of the
healthcare context.
In HL7, a crucial role is played by
vocabularies. A vocabulary allows a well
defined and unequivocal knowledge of the
semantics of transferred data. This feature is
important when data are exchanged among
heterogeneous healthcare systems; in fact, in
this context, a structured vocabulary allows
a quicker communication and a marked
independence from the languages.
3 DESCRIPTION OF THE PROPOSED
SYSTEM
3.1 General Overview
The general architecture of our system is
illustrated in our system, namely: 1) a
Patient Agent (hereafter, PA), that supports
a patient to search services of interest to
him; 2) a Healthcare Service Provider Agent
(hereafter, SPA), that supports a HCSP
manager to maintain the corresponding
database up-to-date; and 3) a Coordinator
Agent (hereafter, CA), that cooperates with
PAs and SPAs to detect those services
appearing the closest to patients exigencies
and queries. In our system, a HCSP is
provided with a suitable database (hereafter,
SD) that stores and manages information
about services delivered by it. Our system is
also provided with a Patient Profile
Database (hereafter, PD), that stores and
handles patient profiles. In the following, we
call SDSet (resp., SPASet) the set of SDs
(resp., SPAs) associated with all involved
HCSPs.
3.2 Healthcare Service Provider Agent
(SPA)
SPA is an Interface Agent, analogous to
that described in An SPA is associated with
a HCSP that uses it for adding, modifying,
or removing information about the services
it provides. SPA allows our system to
uniformly managepossibly highly
heterogeneous services. It is also in charge
of processing queries submitted by CA; in
this case, it retrieves the necessary
information from its SD and processes it in
such a way as to construct the corresponding
answers.
3.2.1 Patient Profile Based
PPB relies on the general observation
that combining query processing techniques
with user profile information into a single
framework allows the most adequate
answers to user queries to be found. The
steps performed by CA for implementing
PPB are the following:
Step 1: CA splits SDSet into smaller
fragments, each constructed by considering
the Affinities between PPi and each SD of
SDSet (recall that CA retrieves all
information about a given SD by requiring it
to the corresponding SPA which is the only
agent that can directly access an SD). More
specifically, fragment construction is
performed as follows:
Step 2: CA selects the best fragment and
orders its SDs on the basis of their Goodness
against Qi.
Step 3: CA requires the SPA associated with
that SD of the best fragment having the
highest Goodness against Qi to process Qi
itself.
Step 4: When the selected SPA returns its
answer, CA verifies if the stop condition is
satisfied. In the affirmative case, it executes
Step 5. In the negative case, it considers the
next SD of the fragment and activates the
corresponding SPA. If all SDs of the best
fragment have been considered and the stop
condition is not yet satisfied, CA considers
the next fragment
Step 6: CA merges the answers returned by
selected SPAs in order to construct the final
answer to Qi.
Page 67 of 165
Step 7: CA sends the final answer to Pi via
PAi.
3.2.2 Database Similarity and Patient
Profile Based
DS-PPB considers not only
information stored in patient profiles but
also semantic similarities possibly existing
among SDs. In our opinion, the knowledge
of these similarities allows the enhancement
of query processing activities and,
consequently, the increase of the number of
relevant services that the system can suggest
to a patient. To better clarify this concept,
consider a patient P, whose Medical
Profile. This reasoning led us to define the
DS-PPB strategy. It consists of the
following steps:
Step 1: For each pair hSDl; SDmi of SDs,
such that SDl and SDm belong to SDSet,
CA computes Slm.
Step 2: CA applies a clustering algorithm
(such as Expectation Maximization) for
clustering the SDs of SDSet into
semantically homogeneous fragments.
Step 3: CA computes the Average Affinity
Ahi of each fragment Frh of FrSet w.r.t. PPi
as where cardFrh denotes the number of
SDs .
Step 4: CA selects the fragment of FrSet
having the highest Average Affinity with
PPi and orders its SDs on the basis of their
Goodness against Qi.
Step 5: CA requires the SPA associated with
that SD of the best fragment having the
highest Goodness against Qi to process Qi
itself.
4 EXPERIMENTAL RESULTS
4.1 Prototype and Data Sets
From the analysis of this figure, we observe
that the Average Relevance (resp.,
Unexpectedness) increases (resp., decreases)
as long as _act increases. In order to explain
this trend, we recall that Pk i is inserted in
the Activation List Ali of Pi if the energy
flowing from Pi to Pk i is greater than _act;
in addition, the energy received by Pk i is
proportional to the closeness coefficient
clski (see Section 3.4). If _act is high, then
ALi contains few patients whose profile is
very similar to that of Pi, i.e., whose
coefficient clski is very high. Now, since Pi
and Pk i share many interests, the services
relevant to Pi are presumably relevant also
to Pk i ; this explains the high values of the
Average Relevance.
4.2 Ontological Querying Strategy
From the previous experiments, it is possible
to derive the following criteria which may
support users in the choice of the most
adequate querying strategy. More
specifically: . Expert users, i.e., users with
a good background knowledge, should
prefer AB or PPB strategies(with _ _ 0:015).
In fact, these users have a full cognition of
their needs and desires and, then, are able to
formulate precise queries to satisfy their
information needs and desires. As a
consequence, they want to receive, as
answer to their queries, a small set of results
perfectly matching with their needs and
desires. In addition, they would like to filter
out all irrelevant (or, often, weakly relevant)
results; this requires to prioritize Precision
over Recall. For all these reasons AB or
PPB strategies (with _ _ 0:015) appear the
most adequate ones for this kind of users. .
Novice users, i.e., users with a poor
background knowledge and with a limited
terminology, should use DS-PPB or,
alternatively, PPB strategies (with _ _
0:025). In fact, these users have a vague
domain knowledge and, then, they are often
incapable of clearly specifying which kind
of information is actually relevant to them.
Page 68 of 165
Finally, AB (resp., DS-PPB) shows the
best performance in terms of Precision
(resp., Recall). Clearly, if many users
simultaneously submit a large number of
queries or if the number of SPAs to explore
is large, then the time required to process
queries may become high. This would
vanish the good performance of AB or DS-
PPB strategies; in these cases users should
be advised to fall back on the PPB
strategy, which provides lower values of
Precision or Recall but can ensure fast
answers.
CONCLUSIONS
In this paper, we have presented an HL7-
aware multiagent system that supports
patients in their access to services delivered
by HCSPs. The proposed system combines
submitted queries with the corresponding
patient profiles to identify those services that
are likely to satisfy patient needs and
desires. In this task, it fruitfully exploits all
features characterizing Intelligent Agents
(e.g., proactiveness, autonomy, and
sociality). Various extensions and
improvements might be thought for our
system in the future. Among them, we
consider particularly challenging to define
an approach that allows a user to provide his
feedback about the Quality of Service
perceived by him during an access to an
HCSP. Quality of Services perceived by the
users who access a given HCSP could be
exploited for computing an overall Quality
of Service for the HCSP itself. In its turn,
this parameter couldbe taken into account in
the selection of services answering a user
query.
REFERENCES
[1] Health Level Seven (HL7),
http://www.hl7.org, 2011.
[2] Logical Observation Identifiers Names
and Codes (LOINC),
http://www.regenstrief.org/loinc/, 2011.
[3] Systematized Nomenclature of Medicine
Clinical Terms (SNOMED CT),
http://www.snomed.org, 2011.
[4] O. Baujard, V. Baujard, S. Aurel, C.
Boyer, and R.D. Appel, MARVIN, Multi-
Agent Softbot to Retrieve Multilingual
Medical Information on the Web, Medical
Informatics, vol. 23, no. 3, pp. 187-191,
1998.
[5] N.J. Belkin, D. Kelly, G. Kim, J.Y. Kim,
H.J. Lee, G. Muresan, M.C. Tang, X.J.
Yuan, and C. Cool, Query Length in
Interactive Information Retrieval, Proc.
ACM SIGIR, pp. 205-212, 2003.
[6] L. Braun, F. Wiesman, H.J. van den
Herik, A. Hasmanb, and E. Korstenc,
Towards Patient-Related Information
Needs, Intl J. Medical Informatics, vol. 76,
nos. 2/3, pp. 246-251, 2007.
Page 69 of 165
PROVEABLE DATA INTEGRITY IN CLOUD
OUTSOURCED DATABASE
C.S.Subha, Lecturer
Gojan School Of Business &
Technology
cs_subha@hotmail.com
Esther Grace.M
Gojan School Of Business &
Technology
chinnu.grace18@gmail.com
Divya.K
Gojan School Of Business &
Technology
div.kd.kd@gmail.com
ABSTRACT:
Cloud computing has been envisioned as the
de-facto solution to the rising storage costs
of IT Enterprises. With the high costs of
data storage devices as well as the rapid rate
at which data is being generated it proves
costly for enterprises or individual users to
frequently update their hardware. Apart
from reduction in storage costs data
outsourcing to the cloud also helps in
reducing the maintenance. Cloud storage
moves the users data to large data centers,
which are remotely located, on which user
does not have any control. However, this
unique feature of the cloud poses many new
security challenges which need to be clearly
understood and resolved. We provide a
scheme which gives a proof of data integrity
in the cloud which the customer can employ
to check the correctness of his data in the
cloud.
This proof can be agreed upon by
both the cloud and the customer and
can be incorporated in the Service
level agreement (SLA).
Cloud computing proves to provide
an efficient storage of data.
KEYWORDS:SLA, PDA,POR,PDP,
SSKE, SPKE,CSP, TPA,SS
INTRODUCTION
Cloud computing providing unlimited
infrastructure to store and execute customer
data and program. As customers you do not
need to own the infrastructure, they are
merely accessing or renting; they can forego
capital expenditure and consume resources
as a service, paying instead for what they
use.
Benefits of Cloud Computing:
Minimized Capital expenditure
Location and Device independence
Utilization and efficiency improvement
Very high Scalability
High Computing power
INTRODUCTION
Data outsourcing to cloud storage servers is
raising trend among many firms and users
owing to its economic advantages. This
essentially means that the owner (client) of
the data moves its data to a third party cloud
storage server which is supposed to -
presumably for a fee - faithfully store the
data with it and provide it back to the owner
whenever required.As data generation is far
outpacing data storage it proves costly for
small firms to frequently update their
hardware whenever additional data is
created. Also maintaining the storages can
be a difficult task. Storage outsourcing of
data to cloud storage helps such firms by
reducing the costs of storage, maintenance
and personnel. It can also assure a reliable
storage of important data by keeping
multiple copies of the data thereby reducing
the chance of losing data by hardware
failures.Storing of user data in the cloud
despite its advantages has many interesting
security concerns which need to be
Page 70 of 165
extensively investigated for making it a
reliable solution to the problem of avoiding
local storage of data. In this paper we deal
with the problem of implementing a protocol
for obtaining a proof of data possession in
the cloud sometimes referred to as Proof of
retrievability (POR).This problem tries to
obtain and verify a proof that the data that is
stored by a user at a remote data storage in
the cloud (called cloud storage archives or
simply archives) is not modified by the
archive and thereby the integrity of the data
is assured. Such verification systems prevent
the cloud storage archives from
misrepresenting or modifying the data stored
at it without the consent of the data owner
by using frequent checks on the storage
archives. Such checks must allow the data
owner to efficiently, frequently, quickly and
securely verify that the cloud archive is not
cheating the owner. Cheating, in this
context, means that the storage archive
might delete some of the data or may modify
some of the data.
REALATED WORKS:
The simplest proof of retrivability (POR)
schemes can be made using a key hash
function hk(F) . In this scheme the verifier ,
before archiving the data file F in the cloud
storage , pre-computes the cryptographic
hash of key F using hk(F) and stores the
hash as well as the secrat key k. To check if
the integrity of the file F is lost the verifier
releases the secret key k to the cloud archive
and ask it to compute and return the value of
hk(F). By storing multiple has keys for
different keys the verifier can check for
integrity of the file F for multiple times ,
each one of being independent proof.
Though the scheme is simple and easily
implementable the main drawback of this
scheme are the high resource cost it requires
for implementation. At the verifier side this
involves storing as many as keys as the
number of checks it want to perform as well
as the hash value of the data files F with
each hash key. Also computing has value for
even a moderately large data files can be
computationally burdensome for some
clients( PDAs, mobile phones etc) As the
archives side each invocation of the protocol
requires the archive to process the entire
file F. This can be computationally
burdensome for the archive even for light
weight operations like hashing. Furthermore,
it requires that each proof requires the
prover to read the entire file F .This can be
computationally burdensome for the archive
even for a lightweight operation like
hashing. Furthermore, it requires that each
proof requires the prover to read the entire
file F - a significant overhead for an archive
whose intended load is only an occasional
read per file, were every file to be tested
frequently(archive) by specifying the
positions of a collection of sentinels and
asking the prover to return the associated
sentinel values.If the prover has modified or
deleted a substantial portion ofF, then with
high probability it will also have
suppresseda number of sentinels. It is
therefore unlikelyto respond correctly to the
verifier.To make the sentinels
indistinguishable from the data blocks, the
whole modified file is encrypted andstored
at the archive. The use of encryption here
renders the sentinels indistinguishable from
other file blocks. This schemeis best suited
for storing encrypted files.As this scheme
involves the encryption of the file F using a
secret keyitbecomesncomputationally
cumbersome especially when the data to be
encrypted is large. Hence, this scheme
proves disadvantages to small users with
limited computational power (PDAs, mobile
phones etc.). There will also be a storage
overhead at the server, partly due to the
newly insertedsentinels and partly due to the
error correcting codes that are inserted. Also
the client needs to store all the sentinels with
Page 71 of 165
it,which may be a storage overhead to thin
clients (PDAs, lowpower devices etc.).
As data generation is far outpacing data
storage it proves costly for small firms to
frequently update their hardware whenever
additional data is created. Also maintaining
the storages can be a difficult task. It
transmitting the file across the network to
the client can consume heavy bandwidths.
The problem is further complicated by the
fact that the owner of the data may be a
small device, like a PDA (personal digital
assist) or a mobile phone, which have
limited CPU power, battery power and
communication bandwidth.
DISADVANTAGES:
The main drawback of this scheme is
the high resource costs it requires for
the implementation.
Also computing hash value for even
a moderately large data files can be
computationally burdensome for
some clients (PDAs, mobile phones,
etc).
Data encryption is large so the
disadvantage is small users with
limited computational power (PDAs,
mobile phones etc.).
Time consumption is more.
IMPLEMANTATION
Cloud StorageData outsourcing to cloud
storage servers is raising trend among many
firms and users owing to its economic
advantages. This essentially means that the
owner (client) of the data moves its data to a
third party cloud storage server which is
supposed to - presumably for a fee -
faithfully store the data with it and provide it
back to the owner whenever required.The
data in the cloud is stored through the
process of registration of the owner.
Simply Archives
This problem tries to obtain and verify a
proof that the data that is stored by a user at
remote data storage in the cloud (called
cloud storage archives or simply archives) is
not modified by the archive and thereby the
integrity of the data is assured. Cloud
archive is not cheating the owner, if
cheating, in this context, means that the
storage archive might delete some of the
data or may modify some of the data. While
developing proofs for data possession at
untrusted cloud storage servers we are
often limited by the resources at the cloud
server as well as at the client.
Sentinals
In this scheme, unlike in the key-hash
approach scheme, only a single key can be
used irrespective of the size of the file or the
number of files whose retrievability it wants
to verify. Also the archive needs to access
only a small portion of the file F unlike in
the key-has scheme which required the
archive to process the entire file F for each
protocol verification. If the prover has
modified or deleted a substantial portion of
F, then with high probability it will also
have suppressed a number of sentinels.
Verification Phase
The verifier before storing the file at the
archive, preprocesses the file and appends
some Meta data to the file and stores at the
archive. At the time of verification the
verifier uses this Meta data to verify the
integrity of the data. It is important to note
that our proof of data integrity protocol just
checks the integrity of data i.e. if the data
has been illegally modified or deleted. It
does not prevent the archive from modifying
the data
Architecture
Page 72 of 165
Algorithm:
Meta-Data Generation:
Let the verifier V wishes to the store the
file F with the archive. Let this file F
consist of n file blocks. We initially
preprocess the file and create metadata
to be appended to the file. Let each of
the n data blocks have m bits in them. A
typical data file F which the client
wishes to store in the cloud.
Each of the Meta data from the data blocks
m
i
is encrypted by using a suitable algorithm
to give a new modified Meta data M
i
.
Without loss of generality we show this
process by using a simple XOR operation.
The encryption method can be improvised to
provide still stronger protection for verifiers
data. All the Meta data bit blocks that are
generated using the above procedure are to
be concatenated together. This concatenated
Meta data should be appended to the file F
before storing it at the cloud server. The file
F along with the appended Meta data e F is
archived with the cloud.
CONCLUSION
In this paper we have worked to facilitate
the client in getting a proof of integrity of
the data which he wishes to store in the
cloud storage servers with bare minimum
costs and efforts. Our scheme was
developed to reduce the computational and
storage overhead of the client as well as to
minimize the computational overhead of the
cloud storage server. We also minimized the
size of the proof of data integrity so as to
reduce the network bandwidth consumption.
Many of the schemes proposed earlier
require the archive to perform tasks that
need a lot of computational power to
generate the proof of data integrity. But in
our scheme the archive just need to fetch
and send few bits of data to the client.
FUTURE ENCHANCEMENT
The future enchancement of this project ie
PROVEABLE DATA IN CLOUD
OUTSOURCED DATABASE can be used
in many big organizations such as hosipatels
, banks, schools and colleages, research
institutes, finance management , global
market to store and retrieve the data with
high security so that there is no mishandling
or modifying the data. This effecint security
of storing the data in the cloud provides a
secured path for the data to be stored.
REFERANCES
1. Sravan kumar, Ashutosh Saxena data
integrity proofs in cloud storage
proceedings of 2011 IEEE paper
2. G.Ateniese, R.Burns, R.Curtmola,
J.Herring, L.Kissner, Z.Peterson, and D.
Song, Provable data possession at
untrusted stores, in CCS 07:
Proceedings of the 14th ACM
conference on Computer and
communications security. New York,
NY, USA: ACM,2007, pp. 598609.
3. A.Juels and B. S. Kaliski, Jr., Pors:
proofs of retrievability for large files, in
CCS 07: Proceedings of the 14th ACM
conference on Computer and
communications security. New York,
NY, USA: ACM, 2007, pp.584597.
4. E. Mykletun, M. Narasimha, and G.
Tsudik, Authentication and integrity in
outsourced databases, Trans. Storage,
vol. 2, no. 2, pp.107138, 2006.
5. D.X.Song, D.Wagner, and A. Perrig,
Practical techniques forsearches on
encrypted data, in SP 00: Proceedings
of the 2000
Page 73 of 165
Service Oriented Multi-agent System
K.Thanga Priya
Department of Computer Science and Engineering
SSN College of engineering, Chennai, India
k.thangapriya2288@gmail.com
Abstract Agent communication is used for
solving standard multi-agent problems, like
coordination or negotiation. Coordination and
communication among the agents have a critical
role for the success of the dynamic composition of
new services. Integrating Web services and
software agents brings about an obvious benefit:
connecting application domains by enabling a
Web service to invoke an agent service and vice
versa. However, this interconnection is more than
simply cross-domain discovery and invocation; it
will also allow complex compositions of agent
services and Web services to be created. JADE, an
agent development framework, is used for
creating agent services. Social Model is more
promising in agent communication. So
Commitment or institution based agent
communication on the social model of agents is
considered for agent communication. A social
model for communication is built as an extension
of JADE.
Keywords - Multi-agent System, Social Model,
JADE, Web service, Agent services,
Commitments, Agent Communication Language.
I. INTRODUCTION
Agents are entities that perceives from the
environment using sensors and act on that
environment using actuators. Those agents are
autonomous, heterogenous, proactive etc. Multi-agent
system is an area where agents are present and
communicate or interact with each other to reach a
consensus. So communication plays a vital role in a
system to function appropriately. The Agent
Communication Language (ACL) used must be
simple, understandable, extensible and reliable.
Due to the complexity associated with the
development of multi-agent systems, which typically
involves thread control, message exchange across the
network, cognitive ability, and discovery of agents
and their services, several architectures and platforms
have been proposed. Many agent platforms namely
Jason, JACK, Jadex, and the 3APL Platform are
available. These four platforms are based on the Java
language.
However, even though the underlying
language is a general purpose programming
language, agents are implemented in these platforms
in a new programming language Agent Speak (L),
JACK Agent Language, a Domain specific Language
(DSL) written in XML, and 3APL, respectively.
Source code written in these languages is either
precompiled or processed at runtime by the agent
platform. The adoption of this approach prevents
developers from using advanced features of the Java
language and it makes it complicated to integrate the
implementation of a multi-agent system with existing
technologies.
JADE is an agent framework which
facilitates the development of MAS. JADE is largely
an implementation of the FIPA (Foundation for
Intelligent Physical Agent) specifications. It provides
a runtime environment where JADE agents can
"live", a library of classes that programmers can use
to develop their agents and graphical tools that allows
administrating and monitoring the activity of running
agents. We built a social model for agent
communication as an extension of existing JADE
platform.
This paper is organized as follows. Section
II is about the Agent Communication Language.
Section III is about the brief description of the work
done in this paper, Section IV is about the integration
of agents services and web services. The conclusions
and remarks are presented in Section V.
Page 74 of 165
II. AGENT COMMUNICATION LANGUAGE
There are number of Agent Communication
Languages (ACLs) namely KQML and FIPA-ACL.
The message format in JADE uses FIPA-ACL and it
has the parameters like sender, receiver(s),
performatives (communicative acts), protocol,
ontology, content, language, conversation-id, etc. Of
all these, Communicative Acts (CAs) are mandatory
fields. They indicate the action to be taken. Examples
of performatives are REQUEST, INFORM, QUERY,
AGREE, PROPOSE, CFP, ACCEPT_PROPOSAL,
etc.
The FIPA supports 22 CAs. The parameters of a
FIPA-ACL message is shown in Table 1
III. RELATED WORK
A Social model for agent communication is
based on exchanging messages. The meaning of the
messages passed is based on social concepts like
commitments and conventions. A social semantics
naturally lends itself to observation and verification
whereas in mentalist approach, the internal states of
the agents are not observable and hence not
verifiable.
A.SOCIAL COMMITMENT
A (social) commitment is an elementary social
relation between two agents. A debtor commits to a
creditor to bring about a specified consequent if a
specified antecedent obtains. For example, in the
common purchase setting, one can specify the
meaning of the offer message as creating a
commitment from the merchant to the customer for
the delivery of goods in return for payment.
Commitments are distinct from arbitrary obligations:
commitments may be created, discharged, delegated,
or otherwise manipulated only by explicit
communication
among agents Commitments (1) are public and (2) it
can be used as the basis for compliance.
Commitments support the following key properties
that make them a useful computational abstraction for
service-oriented architectures.
Commitments can be written using a
predicate C. A commitment has the form
C(x, y, p, G), where x is its debtor, y its creditor, p
the condition the debtor will bring about, and G a
multi-agent system which serves as the
organizational context for the given commitment.


Parameter Description
Performative Type of the
communicative act of the
message
Sender Identity of the sender of
the message
Receiver Identity of the intended
recipients of the message
Content Content of the message
Language Language in which the
content parameter is
expressed
Protocol Interaction protocol used
to structure a
conversation
Conversation-id Unique identity of a
conversation thread
Table 1: Parameters of ACL Message.
B.POLICY
The performatives in agent communication
acts (messages), are translated (by a set of polices) to
a set of
social commitment operators, which either add or
delete a specific class of social commitments. We
model a social commitment as the promise by a
debtor agent to a creditor agent(s) to do some action:
(debtor; creditor; action)
and we model a social commitment operator as either
an add or delete of a social commitment:
(add/delete; socialCommitment)
We have defined several polices (e.g. propose,
accept, reject, counter, and inform) which can be
applied to an agents outgoing and incoming
messages and set of social commitment operators.
The description of various policies is shown in Table
2.
The policy selects a performative to be sent
back by the receiver. For ex., if a sender sends an
INFORM message, the receiver(s) is/are committed
to send back the ACK message to the sender
according to the policy, P_Ack. But according to
policy P_ Agree, if the receiver sends an AGREE
message, then the commitment to send ACK message
does not exist.
Page 75 of 165
Policy Description
P-Inform commits the addressee to
acknowledge
P-Ack releases informed agents
of the commitment
to acknowledge
P-Request commits the proposed
agents to reply
P-Counteroffer commits addressees to
reply
P-Reply releases proposed agents
of the commitment
to reply and releases
counter offered
agents of the
commitment to reply
P-Agree an acceptance realizes
the shared uptake of
proposed/counteroffered
commitments
P- Done releases accepted agents
of the commitment
earlier agree
Table 2: Informal Description of Policies.
C. AN EXAMPLE
Let us consider a simple example where
two agents A and B communicate and the resulting
commitments between them are noticed.
Message 1: A sends a message with a REQUEST
performative to B. The content of the message
describes As call for meeting.
So the existing Commitment between A and B are:
1. According to policy P_Request, the commitment
(B, A, reply) exists.
2. According to policy P_Inform, the commitment
(B, A, ack) exists.
i.e., B (debtor) is committed to send REPLY and
ACK performative to A (creditor). The commitments
are added.
1. (add, (B, A, reply))
2. (add, (B, A, ack)).
Message 2: Now B sends back a message with an
AGREE performative. Since it has replied using
AGREE performative, it need not acknowledge
again. So few commitments are added and existing
commitments are deleted. The commitments that
already exist are deleted and a new commitment is
added.
1. (delete, (B, A, reply)).
2. (delete, (B, A, ack)).
3. (add, (A, B, ack)).
Message 3: Again A sends message with an ACK
performative to B to acknowledge the reply sent by
B. Now all the commitments have been deleted.
(delete, (A, B, ack)).
Figure 1: A simple interaction among agents in JADE
After the communication has taken place, if there are
any pending commitments, we conclude that the
commitments are violated. Hence this social
commitment model serves as a basis for compliance
checking.
Page 76 of 165
D. OPERATIONS
The commitments can be created, cancelled,
discharged, and delegated. For ex., a buyer is
committed to pay money after the arrival of goods.
But if the arrived goods are damaged then the
commitments can be cancelled. The sender can
delegate its commitment to some other agent also.
i.e., the receiver is not committed to pay now; it can
delegate its commitment to some other agent.
IV. INTEGRATION OF AGENTS
SERVICES AND WEB SERVICES
A.JADE
The JADE internal Architecture is shown in Figure
2. The WSIG is provided as an add-on by the JADE,
through which the web service can call the agents
service and vice versa.
The JADE architecture is shown in Figure 2.
The JADE default agents are AMS and DF.
The Directory facilitator takes care of yellow page
services, which is similar to UDDI registry of web
services.

Figure 2: JADE architecture.
B.WSIG
The Web Service Integration Gateway is shown in
Figure 3. The JADE agent gateway is a specialized
JADE agent, that manages the entire WSIG system.
Its operations are:
1. Receive and translate agent service registrations
from the JADE DF into corresponding WSDL
descriptions and register these with the UDDI
repository as tModels. This also applies to
deregistration and modifications.
2. Receive and translate Web service operation
registrations from the UDDI repository into
corresponding ACL descriptions and register these
with the JADE DF. This also applies to deregistration
and modifications.
3. Receive and process Web service invocation
requests received from JADE agents. Processing
includes retrieving the appropriate tModel from the
UDDI repository, translating the invocation message
into SOAP and sending it to the specified Web
service. Any response from the Web service will be
translated back into ACL and sent to the originating
JADE agent.
4. Receive and process agent service invocation
requests received from Web service clients.
Processing includes retrieving the appropriate tModel
from the UDDI repository, translating the invocation
message into ACL and sending it to the specified
agent. Any response from the agent will be translated
back into SOAP and sent to the originating Web
service.

Figure 3: Architecture of WSIG
It has separate modules to convert
1. The ACL messages to SOAP Messages and vice
versa.
2. The FIPA-ACL service descriptions to WSDL
service descriptions.
C. COMPLIANCE CHECKING
Using the social commitment model, we can
check whether the communication is taking place in
the right manner for dynamic composition of
Page 77 of 165
services. The social model helps in compliance
checking whereas other agent communication
techniques will not support in compliance checking.

V.CONCLUSION
FIPA performatives are not enough to
demonstrate the social commitment. Additional
performatives are added to JADE by extending
existing classes and Social Model for agent
communication is to be built on the top of JADE.
Services are composed and compliance checking is
done.
ACKNOWLEDGEMENT
The author wish to thank the
management of SSN college of Engineering, Chennai
for providing all the facilities to carry out this work.
REFERENCES
[1] Service-oriented computing - semantics,
processes, agents. Munindar P. Singh and Michael N.
Huhns. Pearson Education Ltd., Harlow, England,
2005.
[2] Research direction in agent communication. Amit
K. Chopra, Alexander Artikis, Jamal Bentahar,
Marco Colombetti, Frank Dignum, Nicoletta Fornara,
Andrew J. I. Jones, Munindar P. Singh, and Pinar
Yolum. ACM Transactions on Intelligent Systems
and Technology, 2(4), 2011.
[3] Research directions for service-oriented
multiagent systems. Michael N. Huhns, Munindar P.
Singh, Mark Burstein, Keith Decker, Ed Durfee, Tim
Finin, Les Gasser, Hrishikesh Goradia, Nick
Jennings, Kiran Lakkaraju, HideyukiNakashima, Van
Parunak, Jeffrey S. Rosenschein, Alicia Ruvinsky,
Gita Sukthankar, Samarth Swarup, Katia Sycara,
Milind Tambe, Tom Wagner, and Laura Zavala.
IEEE Internet Computing, 9:65-70, 2005.
[4] BDI4JADE. Ingrid Nunes, Carlos J.P. de Lucena
and Michael Luck. ProMAS 2011, Ninth
International Workshop on Programming Multi-
Agent Systems, May 2011.
[5] Using a Performative Subsumption Lattice to
Support Commitmentbased Conversations. Rob
Kremer, Roberto Flores. AAMAS'05, July 25-29,
2005, Utrecht, Netherlands.
[6] Agent-Based Web Service Composition with
JADE and JXTA. Shenghua Liu, Peep Kngas and
Mihhail Matskin. Norwegian Research Foundation,
ACM Transactions, 2005.
[7] KQML as Agent Communication Language. Tim
Finin, Yannis Labrou, and James Mayfield. ACM
Transactions on Intelligent Systems and Technology,
USA. September, 1995.
[8] A Social Semantics for Agent Communication
Languages. Munindar P. Singh. ACM Transactions
on Intelligent Systems and Technology, 2000.
Page 78 of 165
DATA MINING
INTERACTIVE DATA EXPLORATION IN C-TREND
Md.Sanaullah Baig,AP
Department Of Computer Science and Engineering
Gojan School Of Business & Technology
sanubaig@hotmail.com

ABSTRACT: Organizations and firms are
capturing increasingly more data about their
customers, suppliers, competitors, and business
environment. Most of this data is multiattribute
(multidimensional) and temporal in nature. Data
mining and business intelligence techniques are
often used to discover patterns in such data;
however, mining temporal relationships typically is a
complex task. propose a new data analysis and
visualization technique for representing trends in
multiattribute temporal data using a clustering based
approach. We introduce Cluster-based Temporal
Representation of EveNt Data (C-TREND), a system
that implements the temporal cluster graph
construct, which maps multiattribute temporal data
to a two-dimensional directed graph that identifies
trends in dominant data types over time. In this
paper, we present our temporal clustering-based
technique, discuss its algorithmic implementation
and performance, demonstrate applications of the
technique by analyzing data on wireless networking
technologies and baseball batting statistics, and
introduce a set of metrics for further analysis of
discovered trends.
Keywords: Clustering, data and
knowledge visualization, data mining,
interactive data exploration and discovery,
temporal data mining, trend analysis.
1. INTRODUCTION
Organizations and firms are capturing increasingly
more data about their customers, suppliers,
competitors, and business environment. Most of this
data is multiattribute (multidimensional) and
temporal in nature. Data mining and business
intelligence techniques are often used to discover
patterns in such data; however, mining temporal
relationships typically is a complex task. I propose a
new data analysis and visualization technique for
representing trends in multiattribute temporal data
using a clustering based approach. I introduce
Cluster-based Temporal Representation of Event
Data (C-TREND), a system that implements the
temporal cluster graph construct, which maps
multiattribute temporal data to a two-dimensional
directed graph that identifies trends in dominant data
types over time.
In this paper, I present temporal clustering-
based technique, discuss its algorithmic
implementation and performance, demonstrate
applications of the technique by analyzing data on
wireless networking technologies and baseball
batting statistics, and introduce a set of metrics for
further analysis of discovered trends.In this paper, I
develop a new data analysis and visualization
technique that presents complex multi attributes
temporal data in a cohesive graphical manner by
building on well-established data mining methods.
Business intelligence tools gain their strength by
supporting decision-makers, and our technique
helps the users leverage their domain expertise to
generate knowledge visualization diagrams from
complex data and further customize them.
Organizations and firms are capturing
increasingly more data, and this data is often
transactional in nature, containing multiple attributes
and some measure of time. For example, through
their websites, e-commerce firms capture the click
Page 79 of 165
stream and purchasing behavior of their customers,
and manufacturing companies capture logistics data
(e.g.,on the status of orders in production or
shipping information).One of the common analysis
tasks for firms is to determine whether trends exis
in their transactional data. For example, a retaile
may wish to know if the types of its regular
customers are changing over time, a financial
institution may wish to determine if the major types
of credit card fraud transactions change over time,
and a website administrator may wish to model
changes in website visitors behavior over time.
Visualizing and analyzing this type of data can be
extremely difficult because it can have
attributes (dimensions).
Additionally, it is often desired to aggregate
over the temporal dimension (e.g., by day, month,
quarter, year, etc.) to match corporate reporting
standards. The approach that i take in the paper fo
addressing these types of issues is to mine the data
according to specific time periods and then compare
the data mining results across time periods to
discover similarities .
Consider the plot of a retailers customers
by age and income over three months in Figure. 1.
The letter X represent customers in the first month,
triangles represent customers in the second month,
and circles represent customers in the third month.
An analyst may be tasked with the job of discoverin
trends in customer type over these three months. In
Figure. 1.1, patterns in the data and relationships
over time are difficult to identify. However, In Fi
1.2 , partitioning the data by time leads to the
identification of clusters within each period.
Clusters encapsulate similar data points and
identify common types of customers. Note that in
this example, we used only two dimensions (age
and income) for more intuitive visualization. In ma
real-life applications, the number of dimensions
could be much higher, which further emphasizes the
need for more advanced trend visualization
capabilities.
stream and purchasing behavior of their customers,
and manufacturing companies capture logistics data
(e.g.,on the status of orders in production or
of the common analysis
tasks for firms is to determine whether trends exist
in their transactional data. For example, a retailer
may wish to know if the types of its regular
customers are changing over time, a financial
f the major types
of credit card fraud transactions change over time,
and a website administrator may wish to model
changes in website visitors behavior over time.
Visualizing and analyzing this type of data can be
extremely difficult because it can have numerous
Additionally, it is often desired to aggregate
over the temporal dimension (e.g., by day, month,
quarter, year, etc.) to match corporate reporting
standards. The approach that i take in the paper for
s of issues is to mine the data
according to specific time periods and then compare
the data mining results across time periods to
Consider the plot of a retailers customers
by age and income over three months in Figure. 1.
ter X represent customers in the first month,
triangles represent customers in the second month,
and circles represent customers in the third month.
An analyst may be tasked with the job of discovering
trends in customer type over these three months. In
Figure. 1.1, patterns in the data and relationships
over time are difficult to identify. However, In Figure
1.2 , partitioning the data by time leads to the
identification of clusters within each period.
Clusters encapsulate similar data points and
fy common types of customers. Note that in
this example, we used only two dimensions (age
and income) for more intuitive visualization. In many
life applications, the number of dimensions
could be much higher, which further emphasizes the
e advanced trend visualization
Figure. 1.3 is a mapping of the
multidimensional temporal data into an intuitive
analytical construct that we call a temporal cluste
graph. These graphs contain important information
about the relative proportion of common transaction
types within each time period, relationships and
similarities between common transaction types
across time periods, and trends in common
transaction types over time.

l



Figure. 1.3 is a mapping of the
multidimensional temporal data into an intuitive
analytical construct that we call a temporal cluster
graph. These graphs contain important information
proportion of common transaction
types within each time period, relationships and
similarities between common transaction types
across time periods, and trends in common
transaction types over time.




Page 80 of 165
Figure 1.2 Partitioning the data into
clusters
Figure 1.3 Temporal cluster graph.
Figure. 1. Reducing multiattribute t
complexity by partitioning data into
time periods and producing a temporal
cluster graph.
In summary, the main contribution of this
paper is the development of a novel and useful
approach for visualization and analysis of multi

Figure 1.2 Partitioning the data into
Figure 1.3 Temporal cluster graph.
Figure. 1. Reducing multiattribute temporal
time periods and producing a temporal
contribution of this
paper is the development of a novel and useful
approach for visualization and analysis of multi
attribute transactional data based on a new tempora
cluster graph construct, as well as the
implementation of this approach as the C
based Temporal Representation of Event Data (C
TREND) system.
The rest of the paper is organized as
follows: It provides an overview of related work in
the temporal data mining and visualization research
streams. It introduces the temporal cluster
construct and describes the technique for mapping
multi attribute temporal data to these graphs. It
discusses the algorithmic implementation of the
proposed technique as the C
includes performance analyses which presents an
evaluation of the technique using real
wireless networking technology certifications. a
discussion of possible applications, trend metrics,
and limitations associated with the proposed
technique and a brief discussion of future work. Th
conclusions are provided in terms of graph.
2. KEY INGREDIENTS OF THE PROPOSED
SYSTEM
Here I propose a new data analysis and
visualization technique for representing trends in
multiattribute temporal data using a clustering bas
approach. I introduce Cluster
Representation of EveNt Data (C
that implements the temporal cluster graph
construct, which maps multiattribute temporal data
to a two-dimensional directed graph that identifies
trends in dominant data types over time. discuss it
algorithmic implementation and performance.In our
project I use DENDROGRAM Data structure for
storing and Extracting cluster solutions generated
hierarchical clustering algorithms. Calculations ar
made using Tree structure
attribute transactional data based on a new temporal
cluster graph construct, as well as the
implementation of this approach as the Cluster-
based Temporal Representation of Event Data (C-
The rest of the paper is organized as
follows: It provides an overview of related work in
the temporal data mining and visualization research
streams. It introduces the temporal cluster graph
construct and describes the technique for mapping
multi attribute temporal data to these graphs. It
discusses the algorithmic implementation of the
proposed technique as the C-TREND system and
includes performance analyses which presents an
n of the technique using real-world data on
wireless networking technology certifications. a
discussion of possible applications, trend metrics,
and limitations associated with the proposed
technique and a brief discussion of future work. The
re provided in terms of graph.
2. KEY INGREDIENTS OF THE PROPOSED
Here I propose a new data analysis and
visualization technique for representing trends in
multiattribute temporal data using a clustering based
approach. I introduce Cluster-based Temporal
Representation of EveNt Data (C-TREND), a system
that implements the temporal cluster graph
construct, which maps multiattribute temporal data
dimensional directed graph that identifies
trends in dominant data types over time. discuss its
algorithmic implementation and performance.In our
project I use DENDROGRAM Data structure for
storing and Extracting cluster solutions generated by
hierarchical clustering algorithms. Calculations are
made using Tree structure
Page 81 of 165
DATAFLOW DIAGRAM:
Figure : Dataflow diagram of c-trend
C-TREND is the system implementation of
the temporal cluster-graph-based trend identification
and visualization technique; it provides an end user
with the ability to generate graphs from data and
adjust the graph parameters.C-TREND consists of
two main phases:
Offline preprocessing of the data

Online interactive analysis and graph


rendering (see Figure 2.2 below ).
uA1A SL1
A81l1lCn A81l1lCn A81l1lCn
Cu1u1
C

A A

S

S

Page 82 of 165
Figure 2.2: The C-TREND Process

l 1 C18Lnu
DATA PREPROCESSING
Data Clustering
C-TREND utilizes optimized dendrogram
data structures for storing and extracting cluster
solutions generated by hierarchical clustering
algorithms (see Figure 2.3 below for a dendrogram
example).While C-TREND can be extended to
support partition-based clustering methods (e.g., k-
means)

l L u n
2.1 Partition Of Dataset
Separating of data present in the dataset
according to the companies name and storing the
values in separate tables.
Given input : Result set (collection of
unrelated data)
Expected output : Clustered data
(separating and grouping relevant data)
M S

l M
2.2 Dendrogram Sorting
Apply dendrogram extract algorithm to sort
the value present in the partition.
DENDRO_EXTRACT starts at the root of the
dendrogram and traverses the dendrogram by
splitting the highest numbered node (where the
nodes are numbered according to how close they
are to the root, as numbered in Figure 2.3 ) in the
current set of clusters until k clusters are included in
the set.
Dendrogram Data Structure
A dendrogram data structure allows for
quick extraction of any specific clustering solution for
8 C
Page 83 of 165
each data partition when the user changes partition
zoom level ki. To obtain a specific clustering solution
from the data structure for data partition Di, C-
TREND uses the DENDRO_EXTRACT algorithm
(Algorithm 1), which takes the desired number of
clusters in the solution ki as an input and returns the
set CurrCl containing the clusters corresponding to
the ki-sized solution. Cluster attributes such as
center and size are then accessible from the
corresponding dendrogram data structure by
referencing the clusters in CurrCl.
Algorithm1: DENDRO_EXTRACT
INPUT : ki desired no. of clusters
I data partition indicator
begin
if ki >= N then
CurrCl = { Dendrogramrooti }
while |CurrCl| < ki
MaxCl = Dendrogramrooti -
|CurrCl| + 1
CurrCl = (CurrCl \ MaxCl ) U
{MaxCl.Left} U {MaxCl.Right}
return CurrCl
else request new ki
end
MaxCl represents the highest element in the current
cluster set CurrCl. It is easy to see that because of
the specific dendrogram structure, it is always the
case that
MaxCl = Dendrogram Rooti - |CurrCl| + 1.
Furthermore, the dendrogram data array maintains
the successive levels of the hierarchical solution in
order; therefore, replacing MaxCl by its children
MaxCl.Left (leftchild) and MaxCl.Right (right child) is
sufficient for identifying the next solution level in the
dendrogram.
DENDRO_EXTRACT is linear in time complexity
O(ki), which provides for the real-time extraction of
cluster solutions.
Given input : clustered column value
Expected output : Sorted sequence fort the
given column
Using above algorithm we sort all the value
present in each column using above algorithm. here
we calculate time for each table using current time
of the system .thus calculated time is displayed in
separate table.
2.3 Calculate And Display The Time Of Sorting:
The last step in preprocessing is the
generation of the node list, which contains all
possible nodes and their sizes, and the edge list,
which contains all possible edges and their weights,
for the entire data set.
Creating these lists in the preprocessing
phase allows for more effective (real-time)
visualization updates of the C-TREND graphs.Each
data partition possesses an array-based
dendrogram data structure containing all its possible
clustering solutions. The node list is simply an
aggregate list of all dendrogram data structures
indexed for optimal node lookup.It should be noted
that the results reported in Table 1 were calculated
holding the number of attributes in the data constant
at 10. Since this process requires the calculation of
a distance metric for each edge, the time it takes to
generate the edge list should increase linearly with
the number of attributes in the data.
Table 1 : Edge List Creation
Times
---------------------------------------------------------------------
------------
Data Partitions Max Clusters # Of Possible
Edges Edge List Creation Time
(t) (N)
(sec)


10 10 3249
0.006
100 10 35739
0.062
Page 84 of 165
10 100 356409
0.630
1000 10 360639
0.650
100 100 3920499
6.891
10 500 8982009
15.760
To demonstrate that this is indeed the case,
Figure 4.5 contains a plot of the increase in edge list
generation time as the number of attributes is being
increased from 10 to 100, holding N and t constant.

Figure 4.5 Times to produce an edge list


based on number of attributes.
2.4 Extracting values according to value of N
Given input : User defined value.
Expected output : Extracted value
according to user input.
According to the user input extract the
value present in the sorted table.
2.5 Comparison of current trend
Given input : Calculated total value.
Expected output : Graphical output.
All the values which are extracted based on
N is put on the same table so that we can compare
all the companies values with each other. Final out
put is shown in the form of a graph.
Advantages
The proposed method is shown
tooutperform the state-of-the-art methods
in terms of accuracy and efficiency.
N is user defined.

4. ALTERNATIVE METHODS FOR IDENTIFYING


TRENDS IN MULTIATTRIBUTE DATA
3.1 A Probabilistic Approach to Fast Pattern
Matching in Time Series Databases
Temporal data mining approaches depend
on the nature of the event sequence being studied.
Probably the most common form of temporal data
miningtime series analysis [1]is used to mine a
sequence of continuous real-valued elements and is
often regression based, relying on the prespecified
definition of a model.Moreover, standard time series
analysis techniques typically are examples of
supervised learning; in other words, they estimate
the effects of a set of independent variables on a
dependent variable.
Berndt and Clifford [2] use a dynamic programming
technique to match time series with predefined
templates. Keogh and Smyth [3] use a probabilistic
approach to quickly identify patterns in time series
by matching known templates to the data. Povinelli
and Feng [4] use concepts from data mining and
dynamical systems to develop a new framework and
method for identifying patterns in time series that are
significant for characterizing and predicting events .
3.2 Temporal data mining research in sequence
analysis
Another common area of temporal data
mining research is sequence analysis [5]. Sequence
analysis is often used when the sequence is
composed of a series of nominal symbols [6];
examples of sequences include genetic codes and
the click patterns of website users.Sequence
Page 85 of 165
analysis is designed to look for the recurrence of
patterns of specific events and typically does not
account for events described with multiattribute data
[6]. The identification of patterns in sequences of
events is an important problem that frequently
occurs in many disciplines such as molecular
biology and telecommunications.
3.3 Multiple Temporal Cluster Detection,
Biometrics
In the business intelligence context, trend
discovery may be better addressed using
unsupervised learning techniques, because models
of trends and specific relationships between
variables may not be known.Clustering is the
unsupervised discovery of groups in a data set [7].
The basic clustering strategies can be separated
into hierarchical and partitional, and all use some
form of a distance or similarity measure to determine
cluster membership and boundaries . For thorough
bibliographies of recent work on both the discovery
of temporal patterns and temporal clustering..
4. ISSUES
4.1 Mining temporial relationships
It address the problem of representing
trends in multiattribute temporal data using a
clustering based approach. Most data is
multiattribute (multidimensional) and temporal in
nature. Data mining and business intelligence
techniques are often used to discover patterns in
such data; however, mining temporal relationships
typically is a complex task. I propose a new data
analysis and visualization technique for representing
trends in multiattribute temporal data using a
clustering based approach.
Mining temporal relationships typically is a
complex task. In case of the share market each and
every company values is calculated and displayed .
There is no comparision of one company with others
which will be over come in our project. Here
we are using new algorithm called as dendrogram
algorithm.
Cluster-based Temporal Representation of
EveNt Data (C-TREND), a system that implements
the temporal cluster graph construct, which maps
multiattribute temporal data to a two-dimensional
directed graph that identifies trends in dominant data
types over time.
5. EXPERIMENTAL RESULTS
The database for the study comprises of 4
companies results of a stock market according to a
period of time. we can choose any no. of
companies. as no. of companies i.e N is user
defined . and compared the results with other
companies results. so the person can easily find out
the trends of each company with the other one over
a period of time . And this give him insight about the
feature decision. Here I am using Clustering
approach so that similarities can easily defined. and
the efficiency is achieved.
6. CONCLUSION
By harnessing computational techniques of
data mining, we have developed a new temporal
clustering technique for discovering, analyzing, and
visualizing trends in multi attribute temporal data.
The proposed technique is versatile,and the
implementation of the technique as the C-TREND
system gives significant data representation power
to the userdomain experts have the ability to
adjust parameters and clustering mechanisms to
fine-tune trend graphs.
REFERENCES
[1] P. Brockwell and R. Davis, Time Series: Theory
and Methods. Springer, 2001.
[2]D.J. Berndt and J. Clifford, Finding Patterns in
Time Series: A Dynamic Programming Approach,
Page 86 of 165
Advances in Knowledge Discovery and Data Mining,
pp. 229-248, 1995.
[3] E. Keogh and P. Smyth, A Probabilistic
Approach to Fast Pattern Matching in Time Series
Databases, Proc. ACM SIGKDD, 1997.
[4] R.J. Povinelli and X. Feng, A New Temporal
Pattern Identification Method for Characterization
and Prediction of Complex Time Series Events,
IEEE Trans. Knowledge and Data Eng., vol. 15, no.
2, pp. 339-352, Mar./Apr. 2003.
[5] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H.
Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, Mining
Sequential Patterns by Pattern-Growth: The
PrefixSpan Approach, IEEE Trans. Knowledge and
Data Eng., vol. 16, no. 10, pp. 1-17, Oct. 2004.
[6] C.M. Antunes and A.L. Oliveira, Temporal Data
Mining: An Overview, Proc. ACM SIGKDD
Workshop Data Mining, pp. 1-13, Aug. 2001.
[7] A. Jain, M. Murty, and P. Flynn, Data
Clustering: A Review, ACM Computing Surveys,
vol. 31, no. 3, pp. 264-323, 1999.
Page 87 of 165
EMBEDDED SYSTEM BASED IMPLEMENTATION OF DRIP
IRRIGATION
D.Supriya
Department of ECE, KCG College of Technology, Karapakkam, Chennai-600097
e-mail: supriyadp@gmail.com
ABSTRACT:-
The green house based modern agriculture
industries are the recent requirement in every
part of agriculture in India. In this technology,
the humidity and temperature of plants are
precisely controlled. Due to the variable
atmospheric circumstances these conditions
sometimes may vary from place to place in
large farmhouse, which makes very difficult to
maintain the uniformity at all the places in the
farmhouse manually. Therefore there is an
intense need to develop such Micro controller
based embedded system, which could maintain
the physical parameters uniform and also could
keep the records for analytical studies. We
present in this paper an auto-control network
for agriculture industry, which could give the
facilities of maintaining uniform environmental
conditions. The second part of the paper will
explain the concepts of irrigation systems. The
third part will explain the design methodology
and their construction. The fourth part will
conclude the paper.
INTRODUCTION:-
The continuous increasing demand of the food
requires the rapid improvement in food
production technology. In a country like India,
where the economy is mainly based on agriculture
and the climatic conditions are isotropic, still we
are not able to make full use of agricultural
resources. The main reason is the lack of rains &
scarcity of land reservoir water. The continuous
extraction of water from earth is reducing the
water level due to which lot of land is coming
slowly in the zones of un-irrigated land. Another
very important reason of this is due to unplanned
use of water due to which a significant amount of
water goes waste. In the modern drip irrigation
systems, the most significant advantage is that
water is supplied near the root zone of the plants
drip by drip due to which a large quantity of water
Page 88 of 165
is saved. At the present era, the farmers have
been using irrigation technique in India through
the manual control in which the farmers irrigate
the land at the regular intervals. This process
sometimes consumes more water or sometimes
the water reaches late due to which the crops get
dried. Water deficiency can be detrimental to
plants before visible wilting occurs. Slowed
growth rate, lighter weight fruit follows slight
water deficiency. This problem can be perfectly
rectified if we use automatic micro controller
based drip irrigation system in which the
irrigation will take place only when there will
be intense requirement of water.
Irrigation system uses valves to turn irrigation
ON and OFF. These valves may be easily
automated by using controllers and solenoids.
Automating farm or nursery irrigation allows
farmers to apply the right amount of water at the
right time, regardless of the availability of labor
to turn valves on and off. In addition, farmers
using automation equipment are able to reduce
runoff from over watering saturated soils, avoid
irrigating at the wrong time of day, which will
improve crop performance by ensuring adequate
water and nutrients when needed. Automatic
Drip Irrigation is a valuable tool for accurate
soil moisture control in highly specialized
greenhouse vegetable production and it is a
simple, precise method for irrigation. It also
helps in time saving, removal of human error in
adjusting available soil moisture levels and to
maximize their net profits. The entire
automation work can be divided in two sections,
first is to study the basic components of
irrigation system thoroughly and then to design
and implement the control circuitry. So we will
first see some of the basic platform of drip
irrigation system.
CONCEPT OF MODERN IRRIGATION
SYSTEM:-
The conventional irrigation methods like overhead
sprinklers, flood type feeding systems usually wet
the lower leaves and stem of the plants. The entire
soil surface is saturated and often stays wet long
after irrigation is completed. Such condition
promotes infections by leaf mold fungi. The flood
type methods consume large amount of water and
the area between crop rows remains dry and
receives moisture only from incidental rainfall.
On the contrary the drip or trickle irrigation is a
type of modern irrigation technique that slowly
applies small amounts of water to part of plant
root zone. Drip irrigation method is invented by
Israelis in 1970s.
Water is supplied frequently, often daily to
maintain favorable soil moisture condition and
prevent moisture stress in the plant with proper
use of water resources. Drip irrigation requires
about half of the water needed by sprinkler or
surface irrigation.
Page 89 of 165
FIGURE-1
Lower operating pressures and flow rates result
in reduced energy costs. A higher degree of
water control is attainable. Plants can be
supplied with more precise amounts of water.
Disease and insect damage is reduced because
plant foliage stays dry. Operating cost is usually
reduced.
Federations may continue during the irrigation
process because rows between plants remain
dry. Fertilizers can be applied through this type
of system. This can result in a reduction of
fertilizer and fertilizer costs. When compared
with overhead sprinkler systems, drip irrigation
leads to less soil and wind erosion.
Drip irrigation can be applied under a wide
range of field conditions. A typical Drip
irrigation assembly is shown in figure (2)
below.
A wetted profile developed in the plant's root zone
is as shown in Figure (1). Its shape depends on
soil characteristics.
Drip irrigation saves water because only the
plant's root zone receives moisture. Little water is
lost to deep percolation if the proper amount is
applied. Drip irrigation is popular because it can
increase yields and decrease both water
requirements and labour.
FIGURE-2
DESIGN OF MICRO CONTROLLER BASED
DRIP IRRIGATION SYSTEM: -
The key elements that should be considered while
designing a mechanical model: -
a) Flow: -You can measure the output of
your water supply with a one or five
gallon bucket and a stopwatch. Time how
long it takes to fill the bucket and use that
number to calculate how much water is
Page 90 of 165
available per hour. Gallons per minute x
60=number of gallons per hour.
b) Pressure (The force pushing the flow):
- Most products operate best between 20
and 40 pounds of pressure. Normal
household pressure is 40-50 pounds.
c) Water Supply & Quality: - City and
well water are easy to filter for drip
irrigation systems. Pond, ditch and
some well water have special filtering
needs. The quality and source of water
will dictate the type of filter necessary
for your system. .
d) Soil Type and Root Structure: - The
soil type will dictate how a regular drip
of water on one spot will spread. Sandy
soil requires closer emitter spacing as
water percolates vertically at a fast rate
and slower horizontally. With a clay
soil water tends to spread horizontally,
giving a wide distribution pattern.
Emitters can be spaced further apart
with clay type soil. A loamy type soil
will produce a more even percolation
dispersion of water. Deep-rooted plants
can handle a wider spacing of emitters,
while shallow rooted plants are most
efficiently watered slowly (low gph
emitters) with emitters spaced close
together. On clay soil or on a hillside,
short cycles repeated frequently work
best. On sandy soil, applying water with
higher gph emitters lets the water spread
out horizontally better than a low gph
emitter.
e) Elevation: - Variations in elevation can
cause a change in water pressure within
the system. Pressure changes by one
pound for every 2.3 foot change in
elevation. Pressure-compensating emitters
are designed to work in areas with large
changes in elevation.
f) Timing: - Watering in a regular scheduled
cycle is essential. On clay soil or hillsides,
short cycles repeated frequently work best
to prevent runoff, erosion and wasted
water. In sandy soils, slow watering using
low output emitters is recommended.
Timers help prevent the too-dry/too-wet
cycles that stress plants and retard their
growth. They also allow for watering at
optimum times such as early morning or
late evening.
g) Watering Needs: - Plants with different
water needs may require their own
watering circuits. For example, orchards
that get watered weekly need a different
circuit than a garden that gets watered
daily. Plants that are drought tolerant will
need to be watered differently than plants
requiring a lot of water.
The components of micro controller based drip
irrigation system are as follows: -
I) Pump
II) Water Filter
III) Flow Meter
IV) Control Valve
V) Chemical Injection Unit
VI) Drip lines with Emitters
Page 91 of 165
VII) Moisture and Temperature
Sensors.
VIII) Micro controller Unit (The
brain of the system).
The micro controller unit is now explained in
detail: -
The automated control system consists of
moisture sensors, temperature sensors, Signal
conditioning circuit, Digital to analog converter,
LCD Module, Relay driver, solenoid control
valves, etc. The unit is expressed in Figure (3)
below.
The important parameters to be measured for
automation of irrigation system are soil
moisture and temperature.
FIGURE-3: Controller Unit
The entire field is first divided in two small
sections such that each section should contain
one moisture sensor and a temperature sensor.
RTD like PT100 can be used as a temperature
sensor while Tensiometer can be used as the
moisture sensor to detect moisture contents of
soil. These sensors are buried in the ground at
required depth.
Once the soil has reached desired moisture level
the sensors send a signal to the micro controller to
turn off the relays, which control the valves.
FIGURE-4: Application to field
The signal send by the sensor is boosted upto the
required level by corresponding amplifier stages.
Then the amplified signal is fed to A/D converters
of desired resolution to obtain digital form of
sensed input for microcontroller use.
A 16X1 line LCD module can be used in
the system to monitor current readings of all the
sensors and the current status of respective valves.
The solenoid valves are controlled by
microcontroller though relays. A Chemical
injection unit is used to mix required amount of
fertilizers, pesticides, and nutrients with water,
whenever required.
Page 92 of 165
Varying speed of pump motor can control
pressure of water. It can be obtained with the
help of PWM output of microcontroller unit. A
flow meter is attached for analysis of total water
consumed. The required readings can be
transferred to the Centralized Computer for
further analytical studies, through the serial port
present on microcontroller unit. While applying
the automation on large fields more than one
such microcontroller units can be interfaced to
the Centralized Computer.
The microcontroller unit has in-built timer in it,
which operates parallel to sensor system. In
case of sensor failure the timer turns off the
valves after a threshold level of time, which
may prevent the further disaster. The
microcontroller unit may warn the pump failure
or insufficient amount of water input with the
help of flow meter.
ADVANTAGES:-
1 .Are relatively simple to design and install.
2. This is very useful to all climatic conditions
any it is economic friendly.
3. This makes increase in productivity and
reduces water consumption.
4. Here we are micro controllers so there is
error free.
5. This is safest and no manpower is required.
Permit other yard and garden work to continue
when irrigation is taking place, as only the
immediate plant areas are wet.
6. Reduce soil erosion and nutrient leaching.
7. Reduce the chance of plant disease by keeping
foliage dry.
8. May be concealed to maintain the beauty of the
landscape, and to reduce vandalism and liability
when installed in public areas.
9. Require smaller water sources, for example,
less than half of the water needed for a sprinkler
system.
DISADVANTAGES:-
1 .This is only applicable for large size farms.
2. Equipment is costlier.
3. Require frequent maintenance for efficient
operation.
3. Have limited life after installation due to the
deterioration of the plastic components in a hot,
arid climate when exposed to ultraviolet light.
4. Are temporary installations and must be
expanded or adjusted to the drip line as plants
grow.
CONCLUSION:-
The Microcontroller based drip irrigation system
proves to be a real time feedback control system
which monitors and controls all the activities of
drip irrigation system efficiently. The present
proposal is a model to modernize the agriculture
industries at a mass scale with optimum
expenditure. Using this system, one can save
manpower, water to improve production and
ultimately profit.
Page 93 of 165
REFERENCES:-
1. Clemmens, A.J. 1990.Feedback Control for
Surface Irrigation Management in: Visions of
the Future. ASAE Publication 04-90. American
Society of Agricultural Engineers, St. Joseph,
Michigan, pp. 255-260.
2. Fangmeier, D.D., Garrot, D.J., Mancino, F.
and S.H. Husman. 1990. Automated Irrigation
Systems Using Plant and Soil Sensors. In:
Visions of the Future. ASAE Publication 04-90.
American Society of Agricultural Engineers, St.
Joseph, Michigan, pp. 533-537.
3. Gonzalez, R.A., Struve, D.K. and L.C.
Brown. 1992. A computer-controlled drip
Irrigation system for container plant production.
Hort Technology. 2(3):402-407.
Page 94 of 165
HUMAN EYE ENCHANCEMENT USING BLUE EYES TECHNOLOGY
RAJESH N
1
, RAJANIGANDHA JADHAV
2
, T.S.NIVEDH
3
Department of Computer Science & Engineering
Gojan School Of Business And Technology
ABSTRACT
Human error is still one of the most frequent
causes of catastrophes and ecological
disasters. The main reason is that the
monitoring systems concern only the state of
the processes whereas human contribution to
the overall performance of the system is left
unsupervised. Since the control instruments
are automated to a large extent, a human
operator becomes a passive observer of the
supervised system, which results in weariness
and vigilance drop. Thus, he may not notice
important changes of indications causing
financial or ecological consequences and a
threat to human life. It therefore is crucial to
assure that the operators conscious brain is
involved in an active system supervising over
the whole work time period. BlueEyes - the
system developed intended to be the complex
solution for monitoring and recording the
operators conscious brain involvement as well
as his physiological condition. This required
designing a Personal Area Network linking all
the operators and the supervising system. As
the operator using his sight and hearing senses
the state of the controlled system, the
supervising system will look after his
physiological condition.
INTRODUCTION
BlueEyes system provides technical means for
monitoring and recording the operators basic
physiological parameters. The most important
parameter is saccadic activity, which enables
the system to monitor the status of the
operators visual attention along with head
acceleration, which accompanies large
displacement of the visual axis. Complex
industrial environment can create a danger of
exposing the operator to toxic substances,
which can affect his cardiac, circulatory and
pulmonary systems. Thus, on the grounds of
plethysmographic signal taken from the
forehead skin surface, the system computes
heart beat rate and blood oxygenation.The
BlueEyes system checks above parameters
against abnormal oxygenation or undesirable
values and triggers user-defined alarms when
necessary. This paper is about the hardware,
software, benefits and interconnection of
various parts involved in the blue eye
technology.
What Is Blue Eyes?: BLUE EYES is a
technology, which aims at creating
computational machines that have perceptual
and sensory abilities like those of human
beings. The basic idea behind this technology
is to give computer human power.
IMPLEMENTATION
The major parts in the Blue eye system are
Data Acquisition Unit and Central System
Unit. The tasks of the mobile Data Acquisition
Unit are to maintain Bluetooth connections, to
get information from the sensor and sending it
over the wireless connection, to deliver the
alarm messages sent from the Central System
Unit to the operator and handle personalized
ID cards. Central System Unit maintains the
other side of the Bluetooth connection, buffers
incoming sensor data, performs on-line data
analysis, records the conclusions,exploration
and provides visualization interface.
Page 95 of 165
Figure 1. Overall system diagram
PART OF BLUE EYE TECHNOLOGY
The main parts in the Blue eye system are
1. Data Acquisition Unit
2. Central System Unit
Data Acquisition Unit (DAU):Data
Acquisition Unit is a mobile part of the Blue
eyes system. Its main task is to fetch the
physiological data from the sensor and to send
it to the central system to be processed. To
accomplish the task the device must manage
wireless Bluetooth connections Personal ID
cards and PIN codes provide operator's
authorization. Communication with the
operator is carried on using a simple 5-key
keyboard, a small LCD display and a beeper.
When an exceptional situation is detected the
device uses them to notify the operator. Voice
data is transferred using a small headset,
interfaced to the DAU with standard mini-jack
plugs. To provide the Data Acquisition Unit
with necessary physiological data we decided
to purchase an off-shelf eye movement sensor
Jazz Multisensor. It supplies raw digital data
regarding eye position, the level of blood
oxygenation, acceleration along horizontal and
vertical axes and ambient light intensity.Eye
movement is measured using direct infrared
oculographic transducers.The eye movement is
sampled at 1kHz, the other parameters at 250
Hz. The sensor sends approximately 5,2kB of
data per second.
2. Hardware specification
We have chosen Atmel 8952 microcontroller
to be the core of the Data Acquisition Unit
1. Physiological data sensor
Figure 2. Jazz Multisensor
since it is a well-established industrial
standard (i.e. high speed serial port) at a low
price. The figure shows DAU components .
Since the Bluetooth module we received
supports synchronous voice data transmission
(SCO link) we decided to use hardware PCM
codec to transmit operators voice and central
system sound feedback. The codec that we
have employed reduces the microcontrollers
tasks and lessens the amount of data being sent
over the UART.
Page 96 of 165
Figure 3. DAU hardware diagram
Additionally, the Bluetooth module performs
voice data compression, which results in
smaller bandwidth utilization and better sound
quality. Communication between the
Bluetooth module and the microcontroller is
carried on using standard UART interface..
The alphanumeric LCD display gives more
information of incoming events and helps the
operator enter PIN code The LED indicators
show the results of built-in self-test, power
level and the state of wireless connection. The
simple keyboard is used to react to incoming
events (e.g. to silence the alarm sound) and to
enter PIN code while performing authorization
procedure. ID card interface helps connect the
operators personal identification card to the
DAU. After inserting the card authorization
procedure starts. In the commercial release a
cryptographic processor should be used
instead. Each ID card is programmed to
contain: operators unique identifier, device
access PIN code the operator enters on
inserting his ID card and system access PIN
code that is used on connection authentication.
The operators unique identifier
3. Microcontroller software specification
All the DAU software is written in 8051
assembler code, which assures the highest
program efficiency and the lowest resource.
Central System Unit (CSU):Central System
Unit hardware is the second peer of the
wireless connection. The box contains a
Bluetooth module and a PCM codec for voice
data transmission. The module is interfaced to
a PC using a parallel, serial and USB cable.
The audio data is accessible through standard
mini jack sockets. To program operator's
personal ID cards we developed a simple
programming device. The programmer is
interfaced to a PC using serial and PS/2(power
source) ports. Inside, there is Atmel 89C2051
microcontroller, which handles UART
transmission and I2C EEPROM (ID card)
programming. In this section we describe the
four main CSU modules (see Fig. 1):
Connection Manager, Data Analysis, Data
Logger and Visualization.
1.Connection Manager:It is responsible for
managing the wireless communication
between the mobile Data Acquisition Units
and the central system. The Connection
Manager handles:
communication with the CSU
hardware
searching for new devices in the
covered range
establishing Bluetooth connections
connection authentication
incoming data buffering
sending alerts
2. Data Analysis Module:The module
performs the analysis of the raw sensor data in
order to obtain information about the
operators physiological condition. The
separately running Data Analysis Module
supervises each of the working operators. The
module consists of a number of smaller
analyzers extracting different types of
Page 97 of 165
information. Each of the analyzers registers at
the appropriate Operator Manager or another
analyzer as a data consumer and, acting as a
producer, provides the results of the analysis.
An analyzer can be either a simple signal filter
or a generic data extractor variance, a custom
detector module. As we are not able to predict
all the supervisors needs, the custom modules
are created by applying a supervised machine
learning algorithm to a set of earlier recorded
examples containing the characteristic features
to be recognized. In the prototype we used an
improved C4.5 decision tree induction
algorithm. The computed features can be e.g.
the operators position (standing, walking and
lying) or whether his eyes are closed or
opened. As built-in analyzer modules we
implemented a saccade detector, visual
attention level, blood oxygenation a saccade

The saccade detector registers as an eye
movement and accelerometer signal variance
data consumer and uses the data to signal
saccade occurrence. Since saccades are the
fastest eye movements the algorithm calculates
eye movement velocity and checks
physiological constraints. The visual attention
level analyzer uses as input the results
produced by the saccade detector. Low
saccadic activity (large delays between
subsequent saccades) suggests lowered visual
attention level (e.g. caused by th Thus, we
propose a simple algorithm that calculates the
visual attention level (L
va
): L
va
= 100/t
s10
,
where t
s10
denotes the time occupied by the last
ten saccades. Scientific research has proven
that during normal visual information
oughtfulness). intake the time between
consecutive saccades should vary from 180 up
to 350 ms. this gives L
va
at 28 up to 58 units.
Figure 4. Saccade occurrences and visual
attention level
The values of L
va
lower than 25 for a longer
period of time should cause a warning
condition. The following figure shows the
situation where the visual attention lowers for
a few seconds. The Pulse rate analyzer
registers for the oxyhemoglobin and
deoxyhemoglobin level data streams. Since
both signals contain a strong sinusoidal
component related to heartbeat, the pulse rate
can be calculated measuring the time delay
between subsequent extremes of one of the
signals
3. Data Logger Module
The module provides support for storing the
monitored data in order to enable the
supervisor to reconstruct and analyze the
course of the operators duty. The module
registers as a consumer of the data to be stored
in the database. Each working operators data
is recorded by a separate instance of the Data
Logger.
4. Visualization Module
The module provides user interface for the
supervisors. It enables them to watch each of
the working operators physiological condition
along with a preview of selected video source
and his related sound stream. All the incoming
alarm messages are instantly signaled to the
supervisor. Moreover, the visualization
module can be set in the off-line mode, where
Page 98 of 165
all the data is fetched from the database. The
physiological data is presented using a set of
custom-built GUI controls: a pie-chart used to
present a percentage of time the operator was
actively acquiring the visual information
TOOLS USED:During the implementation of
the DAU there was a need of a piece of
software to establish and test Bluetooth
connections. Therefore a tool had been created
called BlueDentist The tool provides support
for controlling the currently connected
Bluetooth device. Its functions are: local
device management and connection
management. To test the possibilities and
performance of the remaining parts of the
Project Kit (computer, camera and database
software) BlueCapture had been created (Fig.
6). The tool supports capturing video data
from various sources (USB web-cam,
industrial camera) and storing the data in the
MS SQL Server database. Additionally, the
application performs sound recording. After
filtering and removing insignificant fragments
(i.e. silence) the audio data is stored in the
database. Finally, the program plays the
recorded audiovisual stream. The software was
used to measure database system performance
and to optimize some of the SQL queries.
Since all the components of the application
have been tested thoroughly they were reused
in the final software, which additionally
reduced testing time. A simple tool was
created for recording Jazz Multisensor
measurements. The program reads the data
using a parallel port and writes it to a file.
Figure 5. BlueDentist
Figure 6. BlueCapture
To program the operators personal ID card a
standard parallel port is used, as the
EEPROMs and the port are both TTL-
compliant. A simple dialog-based application
helps to accomplish the task.
ADVANTAGES
visual attention monitoring
physiological condition monitoring
operator's position detection
DISADVANTAGES
Doesnt predict nor interfere with
operators thoughts.
Cannot force directly the operator to
work.
APPLICATIONS
1. It can be used in the field of security &
controlling, where the contribution of human
operator required in whole time.
2. Engineers at IBM's office:smarttags
Research Center in San Jose, CA, report that a
number of large retailers have implemented
surveillance systems that record and interpret
customer movements, using software from
Almaden's.
Page 99 of 165
CONCLUSION
BlueEyes need for a real-time monitoring
system for a human operator. The approach is
innovative since it helps supervise the operator
not the process, as it is in presently available
solutions. This system in its commercial
release will help avoid potential threats
resulting from human errors, such as
weariness, oversight, tiredness or temporal
indisposition. In future it is possible to create a
computer which can interact with us as we
interact each other with the use of blue eye
technology. It seems to be a fiction, but it will
be the life lead by BLUE EYES in the very
near future. Ordinary household devices --
such as televisions, refrigerators, and ovens --
may be able to do their jobs when we look at
them and speak to them.
REFERENCES
1. www.cs.put.poznan.pl/achieve/load/C
SIDC01Poznan.ppt
2. ndt-equipment-supply.com/.../blue-
eyes-monitoring-human-operator-
system
3. www.cs.put.poznan.pl/csidc/2001/ima
ges/html/en_jazz.html
4. www.slideshare.net/jainshef/blue-eye-
technology
5. www.almaden.ibm.com
6. www.scribd.com
7. www.ibmresearchcenter.com
Page 100 of 165
DESIGNING AND IMPLEMENTING AN INTELLIGENT
ZIGBEE-ENABLED ROBOT CAR.
C.S.Subha
1
, Shaziya Naseem S.A
2
, Sowmiya.R
3

Department of Computer Science &Engineering
Gojan School of Business and Technology
ABSTRACT
The past several years have witnessed a
rapid development in the wireless
network area. So far wireless networking
has been focused on high-speed and long
range applications. However, there are
many wireless monitoring and control
applications for industrial and home
environments which require longer
battery life, lower data rates and less
complexity than those from existing
standards. ZigBee is a wireless
technology for creating personal
networks operating in the 2.4 GHz
unlicensed band. Networks are usually
formed ad-hoc from portable devices
such as cellular phones, handhelds and
laptops. Unlike the other popular wireless
technology, Wi-Fi, ZigBee offers higher
level service profiles.
KEYWORDS: zigbee, microcontroller,
sensors, DC motor, embedded C, LCD
Display
1. INTRODUCTION
Robots can roughly be divided into three
groups: industrial robots, mobile service
robots and personal robots. According to
the International Federation of Robotics
(IFR), there are over one million
industrial robots, over 50000 Mobile
service robots and over four million
personal robots in use worldwide
[1]. In addition, according to the Japan
Robotic Association(JARA), there will
be over 100 million personal robots in
use worldwide by the year 2025 [2].
Moreover, according to the Abi
Research, the worldwide revenue of
personal robots will be over $15 billion
already in 2015 [3]. Thus, it is evident
that robotics, especially personal.
2. EXISTING SYSTEM
Earlier, the robot car is remote
controllable. i.e., using lower level RF
communication and by using any of the
medium, the robot is controlled. Also it
does not analyze the different kinds of
situations nearer to it through sensors. By
using Bluetooth Communication system
is also limited in its range and power
consumption There are many wireless
monitoring and control applications for
industrial and home environments which
require longer battery life, lower data
rates and less Complexity than those
from existing standards. What the market
need is globally defined standard that
meets the requirement for reliability,
security, low power and low cost.
3. PROPOSED SYSTEM (OUR
IMPLEMENTATION)
In this proposed system, using ZigBee
technology the robot used here is
controlled; also the environmental details
surrounding it are sensed and intimated.
Page 101 of 165
Using keypad, which is interfaced to the
microcontroller unit the robot car, is
controlled by giving the control input
from keypad that corresponding data is
transferred through the ZigBee modem
At the other side, these details are
received and the robot connected to it is
controlled. All these actions are made by
Microcontroller unit, according to
predefined program fetched to it. While
the robot car on the move the humidity
level and temperature of the environment
are calculated, these details also gets
transferred through ZigBee modem and
intimated through LCD and Alarm unit
in the control unit. Robot controls here
made by using relays, relays are the
electromagnetic switch which activates
the motors in it according to given
instruction by the microcontrollers.
OBJECTIVE
Moving forward/reverse
Moving left/right
Embedded interface
ZigBee interface
Gas detecting
Person identification
Obstacle detecting
Image restoration, processing,
transfer
Sensing environmental conditions
5. EMBEDDED SYSTEM
Embedded systems are playing
important roles in our lives every day,
even though they might not necessarily
be visible. Some of the embedded
systems we use every day control
keil c complier is the software used as it
plays a vital role in microcontro
applications.
Using keypad, which is interfaced to the
microcontroller unit the robot car, is
the control input
that corresponding data is
ZigBee modem.
At the other side, these details are
received and the robot connected to it is
controlled. All these actions are made by
according to the
predefined program fetched to it. While
the robot car on the move the humidity
level and temperature of the environment
are calculated, these details also gets
transferred through ZigBee modem and
LCD and Alarm unit
t. Robot controls here
made by using relays, relays are the
electromagnetic switch which activates
the motors in it according to given
ction by the microcontrollers.
Image restoration, processing,
Sensing environmental conditions
Embedded systems are playing
important roles in our lives every day,
even though they might not necessarily
be visible. Some of the embedded
systems we use every day control .the
is the software used as it
ys a vital role in microcontrollers for
Figure 2. Hardware Model
COMPONENTS IN THE ROBOT
CAR
LCD Character 2 x 16
The Liquid Crystal Display (LCD) was
first developed at RCA around 1971.
LCDs are optically passive displays (they
do not produce light) The LCD Character
standard requires 3 control lines as well
as either 4 or 8 I/O lines for the data bus.
The user may select whether the LCD is
to operate with a 4-bit data bus or an 8
bit data bus.
Figure 1. LCD Displ
If a 4-bit data bus is used the
require a total of 7 data lines (3 control
lines plus the 4 lines for the data bus). If
an 8-bit data bus is used the LCD will
require a total of 11 data lines
control lines are referred to as EN, RS,
and RW.
Hardware Model
COMPONENTS IN THE ROBOT
The Liquid Crystal Display (LCD) was
first developed at RCA around 1971.
LCDs are optically passive displays (they
The LCD Character
standard requires 3 control lines as well
as either 4 or 8 I/O lines for the data bus.
The user may select whether the LCD is
bit data bus or an 8-
Figure 1. LCD Display
bit data bus is used the LCD will
require a total of 7 data lines (3 control
lines plus the 4 lines for the data bus). If
bit data bus is used the LCD will
nes The three
control lines are referred to as EN, RS,
Page 102 of 165
CONTROL UNIT
ROBOT UNIT
BUZZER
A buzzer connected to port P2.5 of the
micro controller through a driver
transistor. The buzzer requires 12 volts at
a current of around 50ma, which cannot
provide by the micro controller. So the
driver transistor is added. The buzzer is
used to audible indication for valid user
and error situation and alarm mode. As
soon as pin of the micro controller goes
high, the buzzers.
PIR SENSORS A Passive Infrared
sensor (PIR sensor) is an electronic
device that measures infrared (IR) light
radiating from objects in its field of
view.PIR sensors are often used in the
construction of PIR-based motion
detectors (see below).
ULTRASONIC SENSOR (SRF05)
Ultrasonic sensors generate high
frequency sound waves and evaluate the
echo which is received back by the
sensors.
Further applications include: humidifiers,
sonar, medical ultrasonography, burglar
alarms and non-destructive testing.

80
51
Mi
cr
o
Co
nt
rol
ler
Power
Supply
Re
la
y
Dr
Re
la
y

r
o
b
o
t
Tempe
rature
Sensor
Signal
Conditio
ning
Circuit
UL
Sensor
PIR Sensor
G
A
S
S
e
n
so
Wireless
Camera

8051
Micro
Controller
Power
Supply
LCD
keypad
Alarm
Wireless
Video
Receiver
Page 103 of 165
TEMPERATURE SENSOR
The LM35 series are precision
integrated-circuit temperature sensors,
whose output voltage is linearly
proportional to the Celsius (Centigrade)
temperature. The LM35 thus has an
advantage over linear temperature
sensors calibrated in Kelvin, as the user
is not required to subtract a large constant
voltage from its output to obtain
convenient Centigrade scaling.

GAS SENSORS (MQ5)
Gas sensors interact with a gas to initiate
the measurement of its concentration.
The gas sensor then provides output to a
gas instrument to display the
measurements. Common gases measured
by gas sensors include ammonia,
aerosols, arsine, bromine, carbon
Diborane, dust, fluorine, germane,
halocarbons or refrigerants,
hydrocarbonoxide, carbon monoxide,
chlorine, chlorine dioxide,
SENSOR
The LM35 series are precision
circuit temperature sensors,
whose output voltage is linearly
proportional to the Celsius (Centigrade)
temperature. The LM35 thus has an
advantage over linear temperature
sensors calibrated in Kelvin, as the user
is not required to subtract a large constant
voltage from its output to obtain
convenient Centigrade scaling.
Gas sensors interact with a gas to initiate
the measurement of its concentration.
ovides output to a
gas instrument to display the
Common gases measured
by gas sensors include ammonia,
, arsine, bromine, carbon -di -
Diborane, dust, fluorine, germane,
halocarbons or refrigerants,
oxide, carbon monoxide,
hydrogen, hydrogen chloride, hydrogen
cyanide, hydrogen fluoride, hydrogen
selenide, hydrogen sulfide, mercury
vapor, nitrogen dioxide, nitrogen oxides,
nitric oxide, organic solvents, oxygen,
ozone, phosphine, silane, sulfur dioxide,
and water vapor.
WIRELESS TRANSMISSION
ZIGBEE
What is ZIGBEE???
ZigBee is an Ad-hoc networking
technology for LR-WPAN.
IEEE 802.15.4 standard that defines the
PHY and Mac Layers for ZigBee.
Intended for 2.45 GHz, 868 MHz and
915 MHz Band. Low in cost, complexity
& power consumption as compared to
competing technologies. Intended to
network inexpensive devices Data rates
touch 250Kbps for 2.45 GHz, 40 Kbps
915 MHz and 20Kbps for 868 MHz
band. ZigBee is an established set of
specifications for wireless personal area
networking (WPAN), i.e., digital radio
connections between computers and
related devices. This kind of network
eliminates use of physical data buses like
USB and Ethernet cables. The devices
could include telephones, hand
digital assistants, sensors and controls
located within a few meters of each
other. ZigBee is one of the global
standards of communication protocol
formulated by the relevant task force
under the IEEE 802.15 working group.
The fourth in the series, WPAN Low
Rate/ZigBee is the newest and provides
specifications for devices that have low
data rates, consume very low power and
are thus characterized by long battery
life. Other standards like Bluetooth
IrDA address high data rate applications
hydrogen, hydrogen chloride, hydrogen
cyanide, hydrogen fluoride, hydrogen
selenide, hydrogen sulfide, mercury
vapor, nitrogen dioxide, nitrogen oxides,
nitric oxide, organic solvents, oxygen,
ozone, phosphine, silane, sulfur dioxide,
WIRELESS TRANSMISSION
hoc networking
WPAN. Based on
IEEE 802.15.4 standard that defines the
Layers for ZigBee.
Intended for 2.45 GHz, 868 MHz and
915 MHz Band. Low in cost, complexity
& power consumption as compared to
competing technologies. Intended to
network inexpensive devices Data rates
touch 250Kbps for 2.45 GHz, 40 Kbps
for 868 MHz
ZigBee is an established set of
specifications for wireless personal area
networking (WPAN), i.e., digital radio
connections between computers and
related devices. This kind of network
eliminates use of physical data buses like
thernet cables. The devices
could include telephones, hand-held
digital assistants, sensors and controls
located within a few meters of each
other. ZigBee is one of the global
standards of communication protocol
formulated by the relevant task force
er the IEEE 802.15 working group.
The fourth in the series, WPAN Low
Rate/ZigBee is the newest and provides
specifications for devices that have low
data rates, consume very low power and
are thus characterized by long battery
Bluetooth and
IrDA address high data rate applications
Page 104 of 165
such as voice, video and LAN
communications.
Figure 4. XBee Pro10
DC MOTOR
A DC motor is an electric motor that
runs on direct current (DC) electricity. A
DC motor consists of a stator, an
armature, a rotor and a commutator with
brushes. Opposite polarity between the
two magnetic fields inside the motor
cause it to turn. DC motors are the
simplest type of motor and are used in
household appliances, such as electric
razors.
Figure 3. DC MOTOR
ATMEL MICROCONTROLLER
The AT89C51 is a low-power, high-
performance CMOS 8-bit Microcomputer
with 4K bytes of Flash programmable
and erasable read only memory
(PEROM). The device is manufactured
using Atmels high-density nonvolatile
memory technology and is compatible
with the industry-standard MCS-51
instruction set and pin out.
Figure 5. Microcontroller
The on-chip Flash allows the program
memory to be reprogrammed in-system
or by a conventional nonvolatile memory
programmer. By combining a versatile 8-
bit CPU with Flash on a monolithic chip,
Page 105 of 165
the Atmel AT89C51 is a powerful
microcomputer which provides a highly
flexible and cost effective solution
flexible and cost-effective solution to
many embedded control applications
Advantages
This Paper Functionality Of Zigbee
Enabled Can Be Enhanced In Future
To Address Various Critical Data
Collection As Follows:
1. This robot can be enhanced by the
GSM and GSP technology for the long
distance communication.
2. This robot can also be used in rescue
operation where there is a difficulty to
involve human
interactions/involvements.
3. This robot can be used in mines to
identify the inert gases and alcoholic
gases to avoid environmental hazards.
4. This robot can be used to store the
images during the image transmission
and this can be enhanced By
implementing rotating HD cameras.
CONCLUSION
This paper is designed with integrated
intelligence for network setup and
message routing by using ZIG BEE
TECHNOLOGY the information,
instructions and commands are
transferred .here zigbee transceiver is
used for data transmission and reception.
Freedom of robot movement angle is
achieved in our software as well as
hardware. Future enhancement our
system, without major modifications is
specified.
REFERENCES
1.Industrial robotics by Groover,
Wises, Nagel, Odrey. Mcgraw Hill
publications.
2.Robotics by K.S.F.U, R.C.Gonzalez,
C.S.G.Lee. McGRAW Hill publications.
3.Robotics Engineering by Richard
D.Klafter, Thomas A.Chmielewski,
Michael Negin, PHI Pvt Ltd.
WORLD WIDE WEB PAGES:
1.http://www.robotics.com
2.http://www.robocup.com
3.http://www.roboprojects.com
4.http://www.bbc.co.uk/science/robots/
5.http://www.howstuffworks.com
Page 106 of 165
Mining the Shirt sizes for Indian men by Clustered Classification
M.Martin Jeyasingh
1
, Kumaravel Appavoo
2
1
Associate Professor,
2
Dean & Professor,
1
National Institute of Fashion Technology ,Chennai.
2
Bharath Institute of Higher Education and Research , Chennai -73,Tamilnadu,India.
1
mmjsingh@rediffmail.com,
2
drkumaravel@gmail.com

Abstract-- In garment production engineering, sizing
system plays an important role for manufacturing of
clothing. This research work intend to introduce a
strong approach that it could be used for developing
sizing systems by data mining techniques using
Indian anthropometric data. By using a new
approach of two-stage data mining procedure shirt
size type of Indian men determined. This approach
included two phases. First of all, cluster analysis, then
cases sorted to the cluster results to extract the most
significant classification algorithms of the shirt size
based on the results of cluster and classification
analysis. A sizing system developed for the Indian
men age between 25 and 66 years, based on the chest
size determined in the data mining procedure.
In the sizing system, definition of the size label is a
critical issue that it determines quickly locating the
right garment size for further consideration for
customers as an interface. In this paper, we have
obtained classifications of mens shirt attributes
based on clustering techniques.
Keywords-- Data Mining ,Clustering, Classifiers, IBK
KNN,Logitboost, Clothing industry,Anthropometric
data.
I. INTRODUCTION
Garment sizing systems were originally
based on those developed by tailors in the late 18th
century. Professional dressmakers and craftsmen were
developed various sizing methods in the past years.
They used unique techniques for measuring and fitting
their customers. In the 1920s, the demand for the mass
production of garments created the need for a standard
sizing system. Many researchers started working on
developing sizing system by the different methods and
data collecting approaches. It has proved that garment
manufacturing is the highest value-added industry in
textile industry manufacturing cycle [1]. Mass
production by machines in this industry has replaced
manual manufacturing, so the planning and quality
control of production and inventory are very
important for manufacturer. Moreover, this type of
manufacturing has demand to certain standards and
specifications. Furthermore each country has its own
standard sizing systems for manufacturers to follow
and fit in with the figure types of the local population.
A sizing system classifies a specific
population into homogeneous subgroups based on
some key body dimensions [2]. Persons of the same
subgroup have the garment size. Standard sizing
systems can correctly predict manufacturing quantity
and proportion of production, resulting more accurate
production planning and control of materials [3, 4].
The standard unique techniques for measuring and
fitting their sizing systems have been used as a
communication tool among manufacturers, retailers
and consumers.
It can provide manufacturers with size
specification, design development, pattern grading
and market analysis. Manufacturers, basing their
judgments on the information, can produce different
type of garments with the various allowances for
specific market segmentation. Thus, establishing
standard sizing systems are necessary and important.
Many researchers worked on developing the sizing
system are necessary and important by many
approaches. They found very extensive data were
made by using anthropometric data [2].People have
changed in body shape over time. Workman [5]
demonstrated that the problem of ageing contributes
to the observed changes in body shape and size, more
than any other single factor, such as improved diet
and longer life expectancy [6]. Sizing concerns will
grow as the number of ageing consumers is expected
to double by the year 2030. This presents a marketing
challenge for the clothing industry since poor sizing is
the number one reason for returns and markdowns,
resulting in substantial losses. Therefore, sizing
systems have to be updated from time to time in order
to ensure the correct fit of ready-to-wear apparel.
Many countries have been undertaking sizing surveys
in recent years. Since sizing practices vary from
country to country, in 1968 Sweden originated the
first official approach to the International
Organization for Standardization (ISO) on the subject
of sizing of clothing, it being in the interest of the
general public that an international system be created.
After lengthy discussions and many proposals,
members of technical committee TC133 submitted
documents relating to secondary body dimensions,
their definitions and methods of measuring. This
eventually resulted in the publication of ISO 8559
Garment Construction and Anthropometric Surveys -
Body Dimension, which is currently used as an
international standard for all types of size survey [7].
Figure type plays a decisive role in a sizing
system and contributes to the topic of fit. So to find a
sizing system, different body types are first divided
from population, based on dimensions, such as height
Page 107 of 165
Cluster

Analysis
Classification
Analysis
or ratios between body measurements. A set of size
categories is developed, each containing a range of
sizes from small to large. The size range is generally
evenly distributed from the smallest to the largest size
in the most countries. For men's wear, the body length
and drop value are the two main measurements
characterizing the definition of figure type. Bureau of
Indian Standards (BIS) identified three body heights;
short (166 cm), regular (174 cm) and tall (182 cm)
[20] recommended the use of the difference in figure
types as the classification of ready-to-wears and
developed a set of procedures to formulate standard
sizes for all figure types. In early times, the
classification of figure types was based on body
weight and stature. Later on, anthropometric
dimensions were applied for classification. This type
of sizing system has the advantages of easy grading
and size labeling. But, the disadvantage is that the
structural constraints in the linear system may result
in a loose fit. Thus, some optimization methods have
been proposed to generate a better fit sizing system,
such as an integer programming approach [10] and a
nonlinear programming approach [11]. For the
development of sizing systems using optimization
methods, the structure of the sizing systems tends to
affect the predefined constraints and objectives.
Tryfos [10] indicated that the probability of purchase
depended on the distance between the sizing system
of a garment and the real size of an individual. In
order to optimize the number of sizes so as to
minimize the distance, an integer programming
approach was applied to choose the optimal sizes.
Later on, McCulloch, et al. [11] constructed a sizing
system by using a nonlinear optimization approach to
maximize the quality of fit. Recently, Gupta, et al.
[12] used a linear programming approach to classify
the size groups. Using the optimization method has
the advantages of generating a sizing system with an
optimal fit, but the irregular distribution of the optimal
sizes may increase the complexity in grading and the
cost of production. On the other hand, in recent years,
data mining has been widely used in area of science
and engineering. The application domain is quite
broad and plausible in bioinformatics, genetics,
medicine, education, electrical power engineering,
marketing, production, human resource management,
risk prediction, biomedical technology and health
insurance. In the field of sizing system in clothing
science, data mining techniques such as neural
networks [13], cluster analysis [14], the decision tree
approach [15] and two stage cluster analysis [16] have
been used. Clustering is the classification of objects
into different groups, or more precisely, the
partitioning of a data set into subsets (clusters), so that
the data in each subset (ideally) share some common
trait. Cluster analysis was used as an exploratory data
analysis tool for classification. In the clothing a
cluster which is typically grouped by the similarity of
its members shirt sizes can grouped by the K-means
cluster analysis method to classify the upper garment
sizing system. The pitfall of these methods is that it
requires one to pre-assign the number of clusters to
initialize the algorithm and it is usually subjectively
determined by experts. To overcome these
disadvantages, a two stage-based data mining
procedure include cluster analysis and classification
algorithms, is proposed here to eliminate the
requirement of subjective judgment and to improve
the effectiveness of size classification[8].
II.DATA MINING TECHNIQUES
A. Data Preparation
After the definition of industry problem, first
stage of data mining, data preparation selected to
increase the efficiency and ensure the accuracy of its
analysis through the processing and transformation of
the data. Before starting to mine the data, they must
be examined and proceed with all missing data and
outliers. By examining the data before the application
of a multivariate technique, the researcher gains
several critical insights into the characteristics of the
data. In this research work, we used an
anthropometric database which was collected from
BIS and from Clothing industrialists. Anthropometric
data of 620 Indian men with the age ranged from 25 to
66 years from the database were obtained. The data
mining process as shown Fig.1.
Fig. 1 Data mining process
B.Cluster Analysis:
First step of data mining approach was
undertaken, XMeans clustering in the cluster analysis.
X-Means is K-Means extended by an improve-
structure part, In this part of the algorithm the centers
are attempted to be split in its region. The decision
between the children of each center and itself is done
comparing the BIC-values of the two structures. With
the difference between the age and the other
Raw data
Data cleaning
Data Transformation
Validation
Accuracy
Prediction
Page 108 of 165
attributes, we determined the cluster numbers. In the
cluster analysis, K-means method implemented to
determine the final cluster categorization.
C.Classification Techniques
C.1. K-nearest neighbour
K-nearest neighbour algorithm (K-nn) is a
supervised learning algorithm that has been used in
many applications in the field of data mining,
statistical pattern recognition, image processing and
many others. K-nn is a method for classifying objects
based on closest training examples in the feature
space.The k-neighborhood parameter is determined in
the initialization stage of K-nn. The k samples which
are closest to new sample are found among the
training data. The class of the new sample is
determined according to the closest k-samples by
using majority voting [9]. Distance measurements like
Euclidean, Hamming and Manhattan are used to
calculate the distances of the samples to each other.
C.2.RandomTree
In this classifier the class for constructing a tree that
considers K randomly chosen attributes at each node.
It performs no pruning. Also has an option to allow
estimation of class probabilities based on a hold-out
set (backfitting). Sets the number of randomly chosen
attributes by KValue. To allow the unclassified
instances, maximum depth of the tree, the minimum
total weight of the instances in a leaf and the random
number seed used for selecting attributes could
parameterised, numFolds -- Determines the amount of
data used for backfitting and one fold is used for
backfitting, the rest for growing the tree. (Default: 0,
no backfitting) .
C.3.Logitboost
In this classifier the class for performing additive
logistic regression. This class performs classification
using a regression scheme as the base learner, and can
handle multi-class problems. Can do efficient internal
cross-validation to determine appropriate number of
iterations. This classifier may output additional
infomation to the console, threshold on improvement
in likelihood, the number of iterations to be
performed, number of runs for internal cross-
validation, weight threshold for weight pruning
(reduce to 90 for speeding up learning process) are
parameterised, numFolds -- number of folds for
internal cross-validation (default 0 means no cross-
validation is performed) to be specified.
III. METHODOLOGY AND DATA DESCRIPTION
A. Description of Dataset
Data processing : The data types like
nominal(text), numeric or the missing data has been
filled with meaningful assumptions in the database.
Database specification with description and table
structure as shown in Table 1.
TABLE 1. SPECIFICATION OF DATABASE
B. Data source
For this experiments we choose a dataset
from BIS based dataset which has total of 620
records. This dataset records are at present available
measurements which is authorized by anthropometric
experts. Then the records has been preprocessed, after
that cluster and classification techniques of data
mining is experimented for prediction of accuracy for
the best performance.
C. The Application of Data Mining
Data mining could be used to uncover
patterns. The increasing power of computer
technology has increased data collection and storage.
Automatic data processing has been aided by
computer science, such as neural networks, clustering,
genetic algorithms, decision trees, Digital printing and
support vector machines. Data mining is the process
of applying these methods to the prediction of
uncovering hidden patterns [18]. It has been used for
many years by businesses, scientists to sift through
volumes of data. The application of data mining in
fashion product development for production detect
and forecasting analysis by using classification and
clustering methods as shown in Fig. 2.
Field
No.
Field Name Description Data
Type
1 Age To refer the age Numeric
2 Back length To refer the back
length
Numeric
3 Front
length
To refer the front
length
Numeric
4 Shoulder
length
To refer the shoulder
length
Numeric
5 Chest girth To refer the chest
girth
Numeric
6 Waist
length
To refer the waist
length
Numeric
7 Hip To refer the hip Numeric
8 Sleeve
length
To refer the sleeve
length
Numeric
9 Arm depth To refer the arm
depth
Numeric
10 Cuff length To refer the cuff
length
Numeric
11 Cuff width To refer the cuff
width
Numeric
12 Label To refer the Size
labels
Nominal
(Text)
Page 109 of 165
Fig.2 . Application of data mining in Fashion Industry
IV. EXPERIMENTAL RESULTS
A. Distribution of Classes
This dataset has different characteristics
such as: the number of attributes, the
classes, the number of records and the percentage
class occurrences. Like the test dataset,
types of shirt sizes are broadly categorized in
groups as XS, S, M, L, XL,XXL. The Distribution of
Classes in the actual training data for classifiers
evaluation and the occurrences as given in Table II
The percentage of size Categories using Pie chart as
shown in Fig.4 and after clustered size
shown in Fig.5.
TABLE II
DISTRIBUTION OF CLASSES IN THE ACTUAL TRAINING SET
Class Category

No. of
Records
Percentage of
Class
Occurrences (%)
XS 86
S 83
M 115
L 121
XL 93
XXL 122
Total 620
Fig.4. Percentage of size Categories

W^
y^
^
D
>
y>
yy>
Industry
different characteristics
such as: the number of attributes, the number of
percentage of
Like the test dataset, 620 different
are broadly categorized in six
The Distribution of
Classes in the actual training data for classifiers
evaluation and the occurrences as given in Table II.
Categories using Pie chart as
and after clustered size categories
DISTRIBUTION OF CLASSES IN THE ACTUAL TRAINING SET
Percentage of
Occurrences (%)
14
13
18
20
15
20
100
B. Experimental Outcomes
To estimate the performance of the
method, we compared the results generated by
with the results generated by original sets of attr
for the chosen dataset. In the experiments,
mining software called weka 3.6.4 which has been
implemented in Java with latest windows 7 operating
system in Intel Core2Quad@2.83 GHz processor and
2 GB memory, These dataset has been applied and
then evaluated for accuracy by using 10
Validation strategy [19]. The predicted result values
of various classifiers with prediction accuracy as
given Table III.The dataset with original features and
clustered form of the dataset are classified with
algorithms K-nn[17] with 5 neighbours,
Logitboost without pruning. Both of the obtained
classification results are compared. In each phase of a
cross validation, one of the yet unprocessed sets w
tested, while the union of all remaining sets was u
as training set for classification by the
algorithms. Classifiers with prediction accuracy and
difference is given in Table III. Performance of the
classifiers as shown in Fig.6. Difference between
original and clustered classification accuracy rate
been shown in Fig.7.
TABLE III . CLASSIFIERS WITH PREDICTION ACCURACY

Fig.5. Percentage of size Categories after clustered
y^
D
y>
yy>

W^

Algorithms Original Clustered


Jrip 98.871 98.871
DecisionTable 99.5161 99.1935
RandomTree 99.3548 99.5161
IBK 3 97.7419 96.9355
IBK 4 96.2903 97.0968
Bagging 99.3548 99.0323
logitboost 99.6774 99.8387
REPtree 99.3548 98.871
MultilayerPerceptron 99.5161 98.0645
To estimate the performance of the cluster
method, we compared the results generated by cluster
with the results generated by original sets of attributes
chosen dataset. In the experiments, the data
mining software called weka 3.6.4 which has been
implemented in Java with latest windows 7 operating
Intel Core2Quad@2.83 GHz processor and
These dataset has been applied and
d for accuracy by using 10-fold Cross
The predicted result values
of various classifiers with prediction accuracy as
The dataset with original features and
dataset are classified with the
neighbours, Random tree,
Both of the obtained
In each phase of a
cross validation, one of the yet unprocessed sets was
tested, while the union of all remaining sets was used
classification by the above
Classifiers with prediction accuracy and
Performance of the
Difference between
original and clustered classification accuracy rate has
CLASSIFIERS WITH PREDICTION ACCURACY
after clustered
W^
y^
^
D
>
y>
yy>
Clustered Difference
98.871

99.1935

99.5161

96.9355

97.0968

99.0323

99.8387

98.871

98.0645

Page 110 of 165


Fig. 6 . Clustered Classification results
Fig. 7. Comparison between clustered and original accuracy
V.CONCLUSION
In this research work, Cluster classification method is
used to improve and achieve the shirt size grouping by
classification accuracy.In the first phase, the dataset has
been clustered to acquire the system defined size
grouping by Xmeans clustering. Second phase, we
tested the performance of this approach with the popular
algorithms such as K-nn, Jrip, Random tree, decision
table,Multilayerperceptron. When one searches for
higher accuracy ,IBK Knn-4 gives highest difference
between the original and clustered classification
accuracy level.
VI. FUTURE WORK
The effect of cluster and classification
techniques on clothing size database experimented
shown the accuracies for the classifiers in this
context.These results not only capable to use in the
clothing industry such as sizing system, but also could
be used in the other fields like physiology, medical,
human ecology, sports and so on.This can be practically
utilized by the clothing manufacturers or fashion
forecasters for production of profitable business and to
satisfy the anticipated consumers end.
REFERENCES
[1].Chang, C.F., 1999. The model analysis of female
body size measurement from 18 to 22, J. Hwa Gang
Textile, 6: 86-94.
[2].Fan, J., W. Yu and H. Lawrance, 2004. Clothing
appearance and fit: Science and technology, Woodhead
Publishing Limited, Cambridge, England.
[3].Tung, Y.M. and S.S. Soong, 1994. The demand side
analysis for Taiwan domestic apparel market, J. the
China Textile Institute, 4: 375-380.
[4].Hsu, K.M. and S.H. Jing, 1999. The chances of
Taiwan apparel industry, J. the China Textile Institute,
9: 1-6.
[5].Workman, J.E., 1991. Body measurement
specification for fit models as a factor in apparel size
variation, Cloth Text Res. J., 10(1): 31-36.
[6].LaBat, K.L. and M.R. Delong, 1990. Body cathexis
and satisfaction with fit of apparel, Cloth Text Res. J.,
8(2): 42-48.
[7].ISO 8559, 1989. Garment Construction and
Anthropoetric Surveys - Body Dimensions, International
Organization for Standardization.
[8].1R.Bagherzadeh,M.Latifi and A.R.Faramarzi,
2010,Employing a Three-Stage Data Mining Procedure
to Develop Sizing System, World Applied Sciences
Journal 8 (8): 923-929.
[9]G.Shakhnarovish,T. Darrell and P. Indyk, 2005,
Nearest Neighbor Methods in Learning and Vision,
MIT Press, Informatics, vol. 37, no. 6, December, 2004,
pp. 461-470
[10].Tryfos, P., 1986. An integer programming
approach to the apparel sizing problem, J. the
Operational Research Society, 37(10): 1001-1006.
[11].McCulloch, C.E., B. Paal and S.A. Ashdown, 1998.
An optimal approach to apparel sizing, J. the
Operational Res. Society, 49: 492-499.
[12].Gupta, D., N. Garg, K. Arora and N. Priyadarshini,
2006. Developing body measurement charts for
garments manufacture based on a linear programming
approach, J. Textile and Apparel Technology and
Management, 5(1): 1-13.

Z
K

Page 111 of 165


The Robot Tree- An Astonishing Solution for Global Warming
Deepa. M
deepafriendly2u@yahoo.com
9600833776
ABSTRACT:
1
kC8C1 1kLL 1 k


A



1 k
A
L 1




1



INTRODUCTION:
We all know that forests are the
treasures of our earth. But now, mankind himself
has started to destroy forests the treasures of our
earth. By cutting trees, not only that the rainfall
will be reduced, also the temperature will raise
enormously, which results in global warming.
This causes harm to the whole mankind. Thus the
scientists are giving call to protect forest and sa
mankind. Research is going on regarding the
issue. In our paper we propose an astonishing
solution to save our earth from global warming.
WHAT IS GLOBAL WARMING???
Global Warming is defined as
the increase of the average temperature on
Earth. As the Earth is getting hotter, disasters
like hurricanes, droughts and floods are getting
more frequent.
An Astonishing Solution for Global Warming
deepafriendly2u@yahoo.com
Aruna. V

8870780198

kC8C1 1kLL 1 k






1 k
k





1



We all know that forests are the
treasures of our earth. But now, mankind himself
the treasures of our
ing trees, not only that the rainfall
will be reduced, also the temperature will raise
enormously, which results in global warming.
This causes harm to the whole mankind. Thus the
scientists are giving call to protect forest and save
going on regarding the
issue. In our paper we propose an astonishing
solution to save our earth from global warming.
Global Warming is defined as
the increase of the average temperature on
Earth is getting hotter, disasters
like hurricanes, droughts and floods are getting
Over the last 100 years, the
average temperature of the air near the Earths
surface has risen a little less than
0.18C, or 1.3 0.32 Fahrenheit). Does not
seem all that much? It is responsible for the
conspicuous increase in storms,
forest fires we have seen in the last ten years,
though, say scientists.
Their data show that an increase of
one degree Celsius makes the Earth warmer now
than it has been for at least a thousand years. Out
of the 20 warmest years on record, 19 have
occurred since 1980. The three hottest years ever
observed have all occurred in the last eight years,
even.

An Astonishing Solution for Global Warming

8870780198
Over the last 100 years, the
average temperature of the air near the Earths
surface has risen a little less than 1 Celsius (0.74
0.18C, or 1.3 0.32 Fahrenheit). Does not
seem all that much? It is responsible for the
conspicuous increase in storms, and raging
forest fires we have seen in the last ten years,
Their data show that an increase of
one degree Celsius makes the Earth warmer now
than it has been for at least a thousand years. Out
of the 20 warmest years on record, 19 have
occurred since 1980. The three hottest years ever
ved have all occurred in the last eight years,
MAIN
CAUS
ES
FOR
GLOB
AL
WAR
MING
:

Carbon
dioxid
Page 112 of 165
e, water vapour, nitrous oxide, methane and ozone
are some of the natural gases causing global
warming.
HEALTH AND ENVIRONMENTAL
EFFECTS:
Greenhouse gas emissions could cause a
1.8 to 6.3 Fahrenheit rise in temperature
during the next century, if atmospheric
levels are not reduced.
Produce extreme weather events, such as
droughts and floods.
Threaten coastal resources and wetlands
by raising sea level.
Increase the risk of certain diseases by
producing new breeding sites for pests and
pathogens.
Agricultural regions and woodlands are
also susceptible to changes in climate that
could result in increased insect populations
and plant disease.
The degradation of natural ecosystems
could lead to reduced biological
diversity.
WHAT GLOBAL WARMING EFFECTS
ARE EXPECTED FOR THE FUTURE?
To predict the future global warming effects,
several greenhouse gas emission scenarios were
developed and fed into computer models.
They project for the next century that, without
specific policy changes
Global mean temperature should
increase by between 1.4 and
5.8C (2.5 to 10F).
The Northern Hemisphere cover
should decrease further, but the
Antarctic ice sheet should
increase.
The sea level should rise by
between 9 and 88 cm (3.5" to
35").
Other changes should occur,
including an increase in some
extreme weather events.
After 2100, human induced global warming
effects are projected to persist for many centuries.
The sea level should continue rising for thousands
of years after the climate has been stabilized. We
have weather up to 40 degree Celsius now.
CARBON DIOXIDE
Ninety-three percent of
all emissions
Generating power by
burning carbon based
fossil fuels like natural
gas, oil, and coal,
decomposition,
accounting for about
one quarter of all
global emissions.
METHANE
Twenty times more
effective in trapping
heat in our atmosphere
25 times as potent as
carbon dioxide
Agricultural activities,
landfills.
NITROUS OXIDE Agricultural soil
management, animal
manure management,
sewage treatment,
mobile and stationary
combustion of fossil
fuel, adipic acid
production, and nitric
acid production.
OZONE Automobile exhaust
and industrial
processes.
HYDROFLURO
COMPOUNDS
(HFCs).
Industrial processes
such as foam
production,
refrigeration, dry
cleaning, chemical
manufacturing, and
semiconductor
manufacturing.
PERFLURONIATED
COMPOUNDS (PFCs).
Smelting of aluminium
Page 113 of 165
IMPACTS OF RISE IN MAJOR GREEN
HOUSE GAS CO2:
In air the carbon dioxide concentration
should be approximately 330 ppm (parts per
million).But due to environmental researchers the
carbon dioxide content will increase as follows,
2025 405 to 469 ppm
2050 445 to 640 ppm
2100 540 to 970 ppm
We have weather up to 40 degree
Celsius now. It is expected that the weather will
increase in Tamil Nadu as follows.
In 2025 0.4 to 1.1 degree Celsius
In 2050 0.8 to 2.6 degree Celsius
In 2100 1.4 to 5.8 degree Celsius
SOLUTION WE PROPOSE:
We all know that forests are the
treasures of our earth. But, man started to destroy
forests and the scientists are giving call to save
forest. We all know that forests help to protect the
earth from global warming. By cutting trees, not
only that the rainfall will be reduced, also the
temperature will raise enormously, which causes
harm to the whole mankind. The research is going
on all the time to save the mankind from global
warming. Now, it has been found that robot trees
will help to tackle the problem of global warming.
In the air, the carbon dioxide content should be
330 ppm (part per million). Day by day it is
increasing which results in global warming.
WHAT IS ROBOT TREE???
The scientists are trying to make robot
to perform various activities to reduce the physical
and mental work of human being. The
combination of nature and robots is called
Robotany. The scientists Jill Coffin, John
Taylor and Daniel Bauen are researching on
robot tree. The robot tree does not look like our
ordinary tree. The structures of the stem, roots and
leaves are present in the robot tree. Does robot
tree help to solve the problem of global warming?
I have read in a magazine recently
that the experiment done by the researchers at
Madurai Kamaraj University on robot tree is
successful. Hats off to them. It is really happy
news. We have studied in history that the kings of
olden days had planted trees on both sides of the
road. In the same way we hope that all the roads
will have robot trees on both sides in future to
prevent global warming and save the earth. It is
said that one robot tree is equal to 1000 natural
trees. Each robot tree looks more like a giant fly
swatter so as to remain as guards of mankind
Klaus Lackner, a professor of
Geophysics at Columbia University, is working
on an interesting concept: "synthetic trees".
The idea is to reproduce the process of
photosynthesis to capture and store massive
amounts of CO2 gas. Nearly 90,000 tons of
carbon dioxide a year, roughly the amount emitted
annually by 15,000 cars, could be captured by the
structure. Paired with a windmill, the carbon-
capture tree would generate about 3 megawatts of
power, Lackner calculates, making the operation
self-sufficient in energy.

The scientists are trying to make robot
to perform various activities to reduce the physical
and mental work of human being. The
combination of nature and robots is called
Robotany. The scientists Jill Coffin, John Taylor
and Daniel Bauen are currently researching on
robot tree.
HOW DOES A ROBOT TREE
FUNCTION???
Page 114 of 165
Just imagine a normal tree. A
normal will have a root, stem and leaves. In
the same way, the robot tree also has root,
stem, branch and leaf like normal tree. Some
plastic poles are fixed in the stem part and in
between solar plates are fixed which act as
leaves. In the big poles small holes are made
and small poles are fixed. This will absorb
carbon dioxide in the air. In the inside of big
poles there will be calcium hydroxide liquid
and the absorbed carbon dioxide will be
dissolved in it.
The solar plates produce current
and pass current inside the stem, which will
separate carbon and oxygen. Oxygen,
hydrogen and vapour will come out. The
carbon will act with water and become
carbonic acid.




A computer-generated image
of Lackners synthetic trees.
Synthetic trees dont exactly
look like your average tree with green
leaves and roots. Although the design is
not finalized, Lackner predicts that the
device would look more like a post with
venetian blinds strung across it; a box-
shaped extractor raised about 1,000 feet
tall, adorned with scaffolding lined with
liquid sodium hydroxide (commonly known
as lye). When exposed, sodium hydroxide
(lye) is an absorbent of CO2. So, as air
flows through the venetian blind leaves of
the tree, the sodium hydroxide will bind
the CO2, sifting out cleaner, about 70-
90% less CO2 concentrated air on the
other side. Lackner estimates that an area
of sodium hydroxide about the size of a
large TV screen (a 20 inch diagonal) and
a meter in depth could absorb 20 tons of
CO2 a year. Paired with a windmill, a
carbon-capture tree could generate about 3
megawatts of power.
l5 l1 l45l8L
d

K

been known for years but


the question of whether it can be done in
an affordable energy efficient manner has
not yet been fully answered. Constructing
and erecting the collector device is only
20% of the cost; the remainder of the cost
Page 115 of 165
involves prying the CO
2
loose from the
absorbent and storing it- an energy
intensive process. The back of the
envelope calculation of total cost supposes
3000 to 5000 rupees per ton captured,
which is large as compared to the 1000-
2000 rupees per ton on cost that
proponents of a carbon tax or cap-and-
trade scheme believe will stabilize
atmospheric emissions of CO2. It may
seem like too steep a cost to closely
consider, but Lackner believes its worth
looking at things that start out even five
times too expensive .
BASIC CHEMICAL REACTIONS
WITH ROBOT TREE:
The reaction of sodium
hydroxide with carbon dioxide (as
carbonic acid) occurs essentially in two
steps, first a reaction from carbonic acid to
bicarbonate and then to carbonate. This is
a simple acid-base reaction. CO
2
is an acid
anhydrite and NaOH is a base. So reaction
gives the salt Na
2
CO
3
and water
another possible product is the salt
Na
2
CO
3
that is produced if a 1:2 ratio of
CO
2
and NaOH is used. A small
percentage of moisture present in the
absorbent material, (about 3%) is
important. CO
2
reacts with this moisture to
form carbonic acid,
CO
2
+H
2
O->H
2
CO
3
Which in turn reacts with the
hydroxide to form the salt of carbonic
acid, or sodium carbonate? The absorption
of carbon dioxide is expressed as follows:
The products of reaction are sodium
carbonate and water.
H
2
CO
3
+ NaOH -> NaHCO
3
+ H
2
O
NaHCO
3
+ NaOH -> Na
2
CO
3
+ H
2
O
DECARBITE:
DECARBITE is an
absorbing product. This product is
sodium hydroxide carried on a silica base.
The natural affinity of sodium hydroxide
to acid gasses makes it a desirable material
to use in the absorption of the acid gas
carbon dioxide. The sodium hydroxide
content in DECARBITE is high,
approximately 90%, and accounts for the
aggressive product performance and
exceptional capacity for absorption of
carbon dioxide.
The sodium hydroxide content
in DECARBITE is high, approximately
90%, and accounts for the aggressive
product performance and exceptional
capacity for absorption of carbon dioxide.
The universally accepted
Carbon Dioxide absorbent,
DECARBITE is a consumable
chemical absorbent. It is a specially
formulated mixture of Sodium Hydroxide
on to an inert silica carrier providing a
surface area especially suited for the rapid,
high performance and total absorption of
CO
2
on contact.
EFFICIENCY:
For the rapid or high
performance quantative absorption of CO
2
in the ppm range
DECARBITE is color
indicating, changing from greenish brown
to white upon carbon dioxide saturation.
The absorption of carbon dioxide removal
or any acid gas using DECARBITE is a
chemical reaction, not a physical one.
Carbon dioxide reacts with the sodium
hydroxide based absorbent and undergoes
a complete chemical change. This change
is irreversible; therefore the absorbent
Page 116 of 165
cannot be regenerated for reuse. This
change is clearly perceptible and indicates
when spent material is to be discarded.
Occasionally, a condition
known as channeling can occur when the
gas flow finds holes or areas of least
resistance and a channel is formed. The
gas flow follows these channels through
the absorbent defeating the purpose of
scrubbing out the carbon dioxide.
DECARBITE eliminates this problem in
several ways; the silica binding to the
sodium hydroxide keeps the particles from
bonding in the presence of moisture which
is formed as a byproduct of the absorption
reaction. It also aids in preventing the
absorbent to coalesce into a solid mass
blocking gas flow and causing back
pressure across the absorption bed.
The association reaction of
NaOH with CO
2
is at least 40 times faster
than NaOH + HCl at all altitudes below
the Na layer. Na species will not affect
stratospheric ClO
x
and O
3
chemistry. The
conversion of carbon dioxide to
bicarbonate is complete at pH
8.3.Phenolphthalein can be used as a color
indicator for the titration.
CONCLUSION:
Energy is really a place where
more technology is absolutely necessary.
Nearly one and half a lakh robot trees are
enough for purifying carbon dioxide in the
air for one year, approximately. One robot
tree is said to have the capacity to absorb
90,000 tons of carbon dioxide every year.
It is the amount of carbon dioxide released
by 15,000 cars in one year. But robot trees
will not help to bring rain. It will protect
the earth from global warming.


Cost :( app)
10 feet robot tree-50,00
200 feet robot tree-5 lakhs.
The cost of the robot tree may
be high, but the cost of not having
prominent technology for global warming
may cost even higher. There arent that
many large scale sources of energy could
be tapped at the scale the world needs
them. Hydro will never be enough, and
neither will wind. Solar, nuclear, and fossil
could be enough, but they all have flaws.
If we dont place big bets on all three, we
could find ourselves with none of them
working, and well have energy crisis of
unprecedented proportions. We have
studied in history that the kings of olden
days had planted trees on both sides of the
road. In the same way we hope that all the
roads will have robot trees on both sides
to prevent global warming and save the
earth. It is said that one robot tree is equal
to 1000 natural trees.
Implement robot trees
Prevent global warming,
And thus,
Save the Earth!!!

REFERENCES:
www.globalwarming.accuwhether.com
www.effectofglobalwarming.com
www.ebaumworld.com
www.solcomhouse.com
www.leamingfundamendamentals.com
Page 117 of 165
BRAIN COMPUTER INTERFACE
Vijayaganth.R,AP
Department of Computer
Science & Engineering
Gojan School of Business &
Technology
Vijayaganth.r@gmail.com
Ramya.C
Gojan School of Business &
Technology
chandram91@gmail.com
Sahana.S
Gojan School of Business &
Technology
Sahana1413@gmail.com
Abstract:
Brain computer interface presents a direct
communication channel from the brain. The
BCI processes the brain activity and
translates it into system commands using
feature extraction and classification
algorithms. The overarching goal is to make
a start in the field of BCI. BCI represents a
direct interface between the brain and a
computer or any other system. BCI is a
broad concept and comprehends any
communication between the brain and a
machine in both directions: effectively
opening a completely new communication
channel without the use of any peripheral
nervous system or muscles. In principle this
communication is thought to be two way.
But present day BCI is mainly focusing on
communication from the brain to the
computer. The general picture of a BCI is
that the user is actively involved with a task
which can be measured and recognized by
the brain waves. This task consists of the
following: evoked attention, spontaneous
mental performance or mental imagination,
active recording and then converts the
command into input control for a device,
such as control cursor movement, select
letters or icons. BCI operation depends on
effective interaction between two adaptive
controllers, the user who encodes his or her
commands in the electro physiological input
provided to the BCI and device.
INTRODUCTION
A Brain-computer interface (BCI) is a
communication channel connecting the brain
to a computer or another electronic
device.BCI represents a direct interface
between the brain and a computer or any
other system. BCI is a broad concept and
comprehends any communication between
the brain and a machine in both directions:
effectively opening a completely new
communication channel without the use of
any peripheral nervous system or muscles.
In principle this communication is thought
to be two way. But present day BCI is
mainly focusing on communication from the
brain to the computer. To communicate in
the other direction, inputting information in
to the brain, more thorough knowledge is
required concerning the functioning of the
brain. From here on focusCommunication
directly from the brain to the computer. Two
basic requirements are met for a
communication channel between the brain
and computer:
Page 118 of 165
1) Features that are useful to distinguish
several kinds of brain state; 2) methods for
the detection and classification of such
features implemented in real time. Most
commonly the electrical activity (fields)
generated by the neurons is measured.
Various techniques are now available to
monitor brain function, e.g.,
electroencephalography (EEG), magneto
encephalography(MEG), functional
magnetic resonance imaging, and positron
emission tomography.
Here we going to display the
applications range from simple decision
programs to manipulation of the
environment, from spelling programs and to
control the system. These examples can be
generalized in communication, environment
control and movement restoration.
BCI AND BRAIN WAVES
The brain computer interface is a man made
device that creates a communication port to
the brain. A brain computer interface would
take information from the brain, transform it
to make it usable, analyze it to see what the
brain wants to do, and send the data to an
external device. At the external device, the
data will be processed and the brains
command will be executed.
Typically, the cerebral cortex is the
area of interest in the brain computer
interface. The cerebral cortex is the area of
the brain responsible for playing a key role
in memory, attention, perceptual awareness,
thought, language, consciousness and motor
function. The way these BCIs work is
technical but can generally works as
follows. The electrode is placed in the area
of the brain responsible for the desired
motor function. These electrode recognize
brain waves that measure the minute
differences in voltages across active
neurons, and interpret this as a signal. But in
this concept we going to use the brain waves
through wireless EEG. These signals are
stored and then synthesized using various
complex transforms and run through a
program, typically something like Mat lab or
C++.
Figure: Basic Layout
The information that the program
produces comes in packets of waves,
typically spikes. The number and size of
these spikes correspond to a desired action.
On the external device is a computer, which
has been programmed to recognize which
wave forms correspond to which action. The
external device then performs the desired
action.
It is well known that the brain is an
electrochemical organ; researchers have
speculated that a fully functioning brain can
generate as much as 10 watts of electrical
power. Other more conservative
investigators calculate that if all 10 billion
interconnected nerve cells discharged at one
time that a single electrode placed on the
human scalp would record something like
five millionths to 50 millionths of a volt.
Page 119 of 165
Electrical activity emanating from the
brain is displayed in the form of brainwaves.
There are three categories of these
brainwaves, ranging from the most activity
to the least activity.
1. Alpha
2. Beta
3. Delta
The frequency of beta waves ranges
from 15 to 40 cycles a second. Beta waves
are characteristics of a strongly engaged
mind.
The frequency ranges from 9 to 14
cycles per second. A person who has
completed a task and sits down to rest is
often in an alpha state.
The final brainwave state is delta.
This frequency range is normally between 5
and 8 cycles a second. They never go down
to zero because that would mean that you
were brain dead. But, deep dreamless sleep
would take you own to the lowest
frequency.
Methods:
EEG, Electroencephalography involves
recording the (very weak) electrical field
generated by action potentials of neurons
in the brain using small metal electrodes.
ECoG, Electrocorticography involves
the electrophysiology of extra-cellular
currents. Has both high temporal as good
spatial resolution. It is a form of invasive
EEG where electrodes are technique is
invasive and therefore
MEG, magneto encephalography directly
measures the cortical magnetic fields
produced by electrical currents.
FMRI, functional Magnetic Resonance
Imaging provides information on brain
metabolism using BOLD (Blood Oxygen
Level Dependent).
The reasons for selecting EEG as a
measurement method of brain activity are
based on the ease of appliance, portability,
excellent time resolution and the financial
picture.
Signal pre-processing
The signals coming from electrodes
connected to the brain range from 0Hz and
upwards. And can contain every possible
variation and distortion. Therefore the
quality of the signal must be improved.
Figure : Arrangement of electrodes
This quality is defined by the Signal-to-
Noise Ratio (SNR). This ratio defines that a
higher quality signal has a higher SNR. The
SNR can be improved by applying filters
Page 120 of 165
that reduce the noise and amplify the desired
signal, as well as by removing undesirable
artifacts or data.
Amplification & A/D-converter :
Brain signals are very weak, therefore to do
any processing at all, they need to be
amplified. The amplified signal should not
be distorted more than before the process.
After amplification the analogue signals
from the brain are converted to digital using
an A/D-converter.
Filters:
Reference filters :
A reference filters improves the quality of
the input by referencing the value of interest
to its neighbouring values. Different
methods exist to perform this operation, the
most commonly used filters in BCI follow
here, however the optimal use of a certain
method depends on the circumstances:
Laplacian filter (small/large), the
Laplacian spatial filter enhances local
activity and reduces diffuse activity.
The small Laplacian filter subtracts the
average of the four surrounding
channels from the channel of interest.
The large Laplacian filter subtracts the
average of the next-neighbours from the
channel of interest.
Common Average filter (CAR), works
as a high pass spatial filter. It reduces a
large part of the total population of
channels. It subtracts the average value
of all electrodes from the electrode of
interest.

=
=
n
j
ER
j
ER
i
CAR
i
v
n
v v
1
1

Bandpass filter :
After amplification the signals are passed
through a bandpass filter. The band pass
filter virtually lays a band over the incoming
signal. Every frequency outside this band is
removed from the signal. Using this filter for
instance the mu-rhythm can be extracted
from the input by discarding every
frequency less than 10Hz and over 12Hz,
leaving the desired band.The bandpass filter
can be implemented using a FIR (Finite
Impulse Response) filter algorithm. FIR
does not distort the signal.
Artifacts :
The EEG signals are always imperfect and
always contaminated with artifacts.
Artifacts are undesirable disturbances in the
signal. These artifacts range from
bioelectrical potentials produced by
movement of body parts like, eyes, tongue,
arms or hart or fluctuation in skin resistance
(sweating). And can also have a source
outside the body like interference of
electrical equipment nearby or varying
impedance of the electrodes.
Figure: Example of eye-blink artifacts
All these issues influence the EEG data
and should ideally be completely removed.
Page 121 of 165
A possible way to remove them is detection
and recognition. This is not a trivial task.
Recognition of for instance limb
movement can be facilitated by using EMG.
Whenever EMG activity is recorded, the
EEG readings should be discarded (artefact
rejection). The other artefact sources like
eye-movement (measured by
Electrooculargraphy (EOG)) and heart
activity (measured by Electrocardiography
(ECG)) can also be removed. However most
of the artifacts are continuously present and
typically more present than EEG. If it is
known when and where they occur, they can
be compensated for.
Artifact removal :
To increase the effectiveness of BCI systems
it is necessary to find methods of increasing
the signal-to-noise ratio (SNR) of the
observed EEG signals.
The noise, or artefact, sources
include: line noise from the power grid, eye
blinks, eye movements, heartbeat, breathing,
and other muscle activity.
Improving technology can decrease
externally generated artifacts, such as line
noise, but biological artefact signals must be
removed after the recoding process.
The maximum signal fraction (MSF)
transformation is an alternative to the two
most common techniques: principal
component analysis (PCA) and independent
component analysis (ICA). A signal
separation method based on canonical
correlation analysis (CCA) is also used. The
simplest approach is to discard a fixed
length segment, perhaps one second, from
the time an artefact is detected. Discarding
segments of EEG data with artifacts can
greatly decrease the amount of data
available for analysis. No quantitative
evaluation was done on the removal but it
was visually observed that the artifacts were
extracted into a small number of
components that would allow their removal.
In online filtering systems, artefact
recognition is important for achieving their
automatic removal. One approach to
recognition of noise components is based on
measuring structure in the signal.
Whenever artifacts are detected the
affected portion of the signal can be
rejected. This can be a valid pre-processing
step and does not have to be a problem.
However the problem with deleting a
specific piece of data is that it can result in
strange anomalies where the two pieces are
connected. Secondly, EEG data in general is
relatively scarce. For that reason a better
approach is to remove the artefact from the
EEG data. This goes one step further than
artefact rejection.For practical purposes in
an online system, it is undesirable to throw
away every signal that is affected with an
artefact. Retrieving the signal with 100%
correctness is impossible; it is simply
unknown what the data would have looked
like without for instance the eye blink. For
offline systems this is less critical, since it
does not matter if some commands are lost.
In the online case however, the user
demands that every command that is issued
to the system is recognized and executed.
The user doesnt want to keep trying
endlessly for a good trial.
Rejection therefore is not desirable.
The goal is to keep a continuous signal.
Ideally the artifact must be removed and the
desired signal preserved. This can be done
Page 122 of 165
using filters or higher-order statistical
elimination and separation, like for instance
independent component analysis.
Independent component analysis:
ICA was originally developed for blind
source separation whose goal is to recover
mutually independent but unknown source
signals from their linear mixtures without
knowing the mixing coefficients. ICA can
be seen as an extension to Principal
Component Analysis and Factor Analysis.
Algorithm:
Where g = u exp(u2 2) , x observed data
and w is a weight matrix that does ICA.
Note that convergence means that the old
and new values of w point in the same
direction, i.e. their dot-product are (almost)
equal to 1.
Artifacts removal using ICA and GA:
The step of proposed method as follows: At
first using ICA algorithm extract
Independent components (ICs) of each trial
then GA select the best and related ICs
among the whole ICs. The proposed
approach to the use of GAs for Artefact
removal involves encoding a set of d, ICs as
a binary string of the elements, in which a 0
in the string indicates that the corresponding
IC is to be omitted, and a 1 that it is to be
included. This coding scheme represents the
presence or absence of a particular IC from
the IC space. The length of chromosome
equal to IC space dimensions. Then the
selected ICs used as input data for
classifiers. This paper used the fitness
function shown below to combine the two
terms:
Fitness = classification error + * (Number
of Active Gens) Where error corresponds to
the classification error that used elected ICs
and active Gens corresponds to the number
of ICs selected. In this function is
considered between (0, 1) and the higher
results in less selected features. In this paper
= 0.01 is chosen.
Translation Algorithm:
The translation algorithm is the main
element of any BCI system. The function of
the translation algorithm is to convert the
EEG input to device control output. It can be
split in to feature extraction and feature
classification.
Feature Extraction
Feature extraction is the process of selecting
appropriate features from the input data.
These features should characterize the data
as good as possible. The goal of feature
extraction is to select those features which
have similar values for objects of one
category and differing values for other
categories. Reducing overlapping features
will aid in the quest against poor
generalization and computational
complexity. The goal is to select of few
features from the total feature vector, which
still describe the data.Feature extraction can
be performed using:
Page 123 of 165
Time-frequency analysis, using Fast
Fourier Transform (FFT)
Autoregressive modelling (AR)
Common spatial patterns (CSP)
Linear Discrimination (LD)
Genetic algorithm (GA)
Feature classification:
The next step in the translation algorithm is
the classification of the acquired features.
When presented with new data, the BCI
should be able to tell which brain state it
represents. Therefore it must recognize
patterns in the data provided by the features.
Classification will result in device control:
acquiring the commands from the user. It
can be achieved in various ways:
Linear Vector Quantization (LVQ)
Neural network (NN)
Support Vector Machines (SVM)
Application
In the application domain the issue
of self- versus evoked control is important.
Good thought must be given to how this
control is realized. Special care should be
given to real-time control of devices that
interact with the physical environment.
Synchronous control of your wheelchair
does not seem very handy. Although the
intermediate approach using an on/off
switch offers possibilities.The current
maximum information transfer rate of about
25 bits per minute, strongly limits the
application range and its applicability for
mass society. Moreover the dimensionality
of the commands is today mostly one
dimensional. This depends on the number of
different brain states that can be recognized
per time unit. For good mouse control for
instance, two dimensional controls is a
necessity.
Figure : Example of BCI spelling program.
CONCLUSION
Like other communication and control
systems, BCIs have inputs, outputs, and
translation algorithms that convert the
former to the latter. BCI operation depends
on the interaction of two adaptive
controllers, the users brain, which produces
the input (i.e., the electrophysiological
activity measured by the BCI system) and
the system itself, which translates that
activity into output (i.e., specific commands
that act on the external world). Successful
BCI operation requires that the user acquire
and maintain a new skill, a skill that consists
not of muscle control but rather of control of
EEG or single-unit activity.
REFERENCES
1. T Hong LIU , Lin-pei ZHAIV , Ying
GAO, Wen-ming LI, Jiu-fei ZHOU,
Image Compression Based on
BiorthogonalnWavelet Transform,
IEEE Proceedings of ISCIT2005.
Page 124 of 165
2. De Vore , et al.,nImage
Compression through Wavelet
Transform Coding, IEEE
Transaction on Information Theory.
A Comparative Study of Image
Compression Techniques Based on
Svd, Dwt-Svt , Dwt-Dct ICSCI2008
proceedings pg 494-496.
3. http://www.electrodesales.com.
4. http://www.analog.com/en/processors-
dsp/content/reference_design
5. . http://www.analog.com/en/processors-
dsp/content/reference_designs
6. http://www.analog.com/library/analogdi
alogue/archives/39-
06/mixed_signal.zip:MixedSignal
7. http://www.analog.com/en/digital-to-
analog-converters/da-
converters/ist/233/pst.html
Page 125 of 165
Information Sharing Across Databases Using Anonymous Connection
B.Janani, Lecturer
Department of Computer
Science & Engineering
Gojan School of Business &
Technology
jananib@gojaneducation.com
R.Jeyavani
IV CSE
Gojan School of Business &
Technology
jeyavani.r@gmail.com
S.T.Ramya Raja Rajeshwari
IV CSE
Gojan School of Business &
Technology
ramyst.rajeshwari@gmail.com
AbstractSuppose Alice owns a k-
anonymous database and needs to determine
whether her database, when inserted with a
tuple owned by Bob, is still k-anonymous.
Also, suppose that access to the database is
strictly controlled, because for example data
are used for certain experiments that need to
be maintained confidential. Clearly, fllowing
Alice to directly read the contents of the
tuple breaks the privacy of Bob (e.g., a
patients medical record); on the other hand,
the confidentiality of the database managed
by Alice is violated once Bob has access to
the contents of the database. Thus, the
problem is to check whether the database
inserted with the tuple is still k-anonymous,
without letting Alice and Bob know the
contents of the tuple and the database,
respectively. In this paper, we propose two
protocols solving this problem on
suppression-based and generalization-based
k-anonymous and confidential databases.
The protocols rely on well-known
cryptographic assumptions, and we provide
theoretical analyses to proof their soundness
and experimental results to illustrate their
efficiency.
Index TermsPrivacy, anonymity, data
management, secure computation.
Introduction
It is today well understood that databases
represent an important asset for many
applications and thus their security is
crucial. Data confidentiality is particularly
relevant because of the value, often not only
monetary, that data have. For example,
medical data collected by following the
history of patients over several years may
represent an invaluable asset that needs to be
adequately protected. Such a requirement
has motivated a large variety of approaches
aiming at better protecting data
confidentiality and data ownership. Relevant
approaches include query processing
techniques for encrypted data and data
watermarking techniques. Data
confidentiality is not, however, the only
requirement that needs to be addressed.
Today there is an increased concern for
privacy. The availability of huge numbers of
databases recording a large variety of
information about individuals makes it
possible to discover information about
specific individuals by simply correlating all
the available databases. Although
confidentiality and privacy are often used as
synonyms, they are different concepts: data
confidentiality is about the difficulty (or
impossibility) by an unauthorized user to
learn anything about data stored in the
database. Usually, confidentiality is
achieved by enforcing an access policy, or
possibly by using some cryptographic tools.
Privacy relates to what data can be safely
disclosed without leaking sensitive
information regarding the legitimate owner
[5]. Thus, if one asks whether confidentiality
is still required once data have been
anonymized, the reply is yes if the
anonymous data have a business value for
the party owning them or the unauthorized
disclosure of such anonymous data may
damage the party owning the data or other
parties. (Note that under the context of this
paper, the term anonymized or
Page 126 of 165
anonymization means identifying
information is removed from the original
data to protect personal or private
information. There are many ways to
perform data anonymization. We only focus
on the k-anonymization approach [28],
[32].) To better understand the difference
between confidentiality and anonymity,
consider the case of a medical facility
connected with a research institution.
Suppose that all patients treated at the
facility are asked before leaving the facility
to donate their personal health care records
and medical histories (under the condition
that each patients privacy is protected) to
the research institution, which collects the
records in a research database. To guarantee
the maximum privacy to each patient, the
medical facility only sends to the research
database an anonymized version of the
patient record. Once this anonymized record
is stored in the research database, the
nonanonymized version of the record is
removed from the system of the medical
facility. Thus, the research database used by
the researchers is anonymous. Suppose that
certain data concerning patients are related
to the use of a drug over a period of four
years and certain side effects have been
observed and recorded by the researchers
in the research database. It is clear that these
data (even if anonymized) need to be kept
confidential and accessible only to the few
researchers of the institution working on this
project, until further evidence is found about
the drug. If these anonymous data were to be
disclosed, privacy of the patients would not
be at risk; however the company
manufacturing the drug may be adversely
affected. Recently, techniques addressing
the problem of privacy via data
anonymization have been developed, thus
making it more difficult to link sensitive
information to specific individuals. One
well-known technique is k-anonymization
[28], [32]. Such technique protects privacy
by modifying the data so that the probability
of linking a given data value, for example a
given disease, to a specific individual is very
small. So far, the problems of data
confidentiality and anonymization have been
considered separately. However, a relevant
problem arises when data stored in a
confidential, anonymity-preserving database
need to be updated. The operation of
updating such a database, e.g., by inserting
a tuple containing information about a given
individual, introduces two problems
concerning both the anonymity and
confidentiality of the data stored in the
database and the privacy of the individual to
whom the data to be inserted are related: 1)
Is the update database still
privacypreserving? and 2) Does the database
owner need to know the data to be inserted?
Clearly, the two problems are related in the
sense that they can be combined into the
following problem: can the database owner
decide if the updated database still preserves
privacy of individuals without directly
knowing the new data to be inserted? The
answer we give in this work is affirmative.
It is important to note that assuring that a
database maintains the privacy of
individuals to whom data are referred is
often of interest not only to these
individuals, but also to the organization
owning the database. Because of current
regulations, like HIPAA [19], organizations
collecting
Page 127 of 165
data about individuals are under the
obligation of assuring individual privacy. It
is thus, in their interest to check the data that
are entered in their databases do not violate
privacy, and to perform such a verification
without seeing any sensitive data of an
individual.
Problem Statement
Fig. 1 captures the main participating parties
in our application domain. We assume that
the information concerning a single patient
(or data provider) is stored in a single tuple,
and DB is kept confidentially at the server.
The users in Fig. 1 can be treated as medical
researchers who have the access to DB.
Since DB is anonymous, the data providers
privacy is protected from these researchers.
(Note that to follow the traditional
convention, in Section 4 and later sections,
we use Bob and Alice to represent the data
provider and the server, respectively.) As
mentioned before, since DB contains
privacy-sensitive data, one main concern is
to protect the privacy of patients. Such task
is guaranteed through the use of
anonymization. Intuitively, if the database
DB is anonymous, it is not possible to infer
the patients identities from the information
contained in DB. This is achieved by
blending information about patients. See
Section 3 for a precise definition. Suppose
now that a new patient has to be treated.
Obviously, this means that the database has
to be updated in order to store the tuple t
containing the medical data of this patient.
The modification of the anonymous
database DB can be naively performed as
follows: the party who is managing the
database or the server simply checks
whether the updated database DB [ ftg is
still anonymous. Under this approach, the
entire tuple t has to be revealed to the party
managing the database server, thus violating
the privacy of the patient. Another
possibility would be to make available the
entire database to the patient so that the
patient can verify by himself/herself if the
insertion of his/her data violates his/her own
privacy. This approach however, requires
making available the entire database to the
patient thus violating data confidentiality. In
order to devise a suitable solution, several
problems need to be addressed: Problem 1:
without revealing the contents of t and DB,
how to preserve data integrity by
establishing the anonymity of DB [ ftg?
Problem 2: once such anonymity is
established, how to perform this update?
Problem 3: what can be done if database
anonymity is not preserved? Finally,
problem 4: what is the initial content of the
database, when no data about users has been
inserted yet? In this paper, we propose two
protocols solving Problem 1, which is the
central problem addressed by our paper.
However, because the other problems are
crucial from a more practical point of view,
we discuss them as well in Section 7. Note
that to assure a higher level of anonymity to
the party inserting the data, we require that
the communication between this party and
the database occurs through an anonymous
connection, as provided by protocols like
Crowds [27] or Onion routing [26]. This is
necessary since traffic analysis can
potentially reveal sensitive information
based on users IP addresses. In addition,
sensitive information about the party
inserting the data may be leaked from the
access control policies adopted by the
anonymous database system, in that an
important requirement is that only
authorized parties, for example patients,
should be able to enter data in the database.
Therefore, the question is how to enforce
authorization without requiring the parties
inserting the data to disclose their identities.
An approach that can be used is based on
techniques for user anonymous
authentication and credential verification
[20]. The above discussion illustrates that
Page 128 of 165
the problem of anonymous updates to
confidential databases is complex and
requires the combination of several
techniques, some of which are proposed for
the first time in this paper. Fig. 1
summarizes the various phases of a
comprehensive approach to the problem of
anonymous updates to confidential
databases, while Table 1 summarizes the
required techniques and identifies the role of
our techniques in such approach.
Proposed Solutions
All protocols we propose to solve Problem 1
rely on the fact that the anonymity of DB is
not affected by inserting it if the information
contained in t, properly anonymized, is
already contained in DB. Then, Problem 1 is
equivalent to privately checking whether
there is a match between (a properly
anonymized version of) t and (at least) one
tuple contained in DB. The first protocol is
aimed at suppression-based anonymous
databases, and it allows the owner of DB to
properly anonymize the tuple t, without
gaining any useful knowledge on its
contents and without having to send to its
owner newly generated data. To achieve
such goal, the parties secure their messages
by encrypting them. In order to perform the
privacy-preserving verification of the
database anonymity upon the insertion, the
parties use a commutative and homomorphic
encryption scheme. The second protocol
(see Section 5) is aimed at generalization-
based anonymous databases, and it relies on
a secure set intersection protocol, such as the
one found in [3], to support privacy-
preserving updates on a generalization-based
k-anonymous DB. The paper is organized as
follows: Section 2 reviews related work on
anonymity and privacy in data management.
Section 3 introduces notions about
anonymity and privacy that we need in order
to define our protocols and prove their
correctness and security. The protocols are
defined, respectively, in Section 4 and
Section 5 with proofs of correctness and
security. Section 6 analyzes the complexity
of the proposed protocol and presents
experimental complexity results, we
obtained by running such protocols on real-
life data. Section 7 concludes the paper and
outlines future work.
Private Update For Suppression Based
Anonymous Connection
In this section, we assume that the database
is anonymized using a suppression-based
method. Note that our protocols are not
required to further improve the privacy of
users other than that provided by the fact
that the updated database is still k-
anonymous. Suppose that Alice owns a k-
anonymous table T over the QI attribute.
Alice has to decide whether T [ twhere t is
a tuple owned by Bobis still k-
anonymous, without directly knowing the
values in t (assuming t and T have the same
schema). This problem amounts to decide
whether t matches any tuple in T on the
nonsuppressed QI attributes. If this is the
case, then t, properly anonymized, can be
inserted into T. Otherwise, the insertion of t
into T is rejected. A trivial solution requires
as a first step Alice to send Bob the
suppressed attributes names, for every tuple
in the witness set f_1; . . . ; _wg of T. In this
way, Bob knows what values are to be
suppressed from his tuple. After Bob
computes the anonymized or suppressed
versions _ti of tuple t, 1i we, he and Alice
Page 129 of 165
can start a protocol (e.g., the Intersection
Size Protocol in [3]) for privately testing the
equality of _ti and _i. As a drawback, Bob
gains knowledge about the suppressed
attributes of Alice. A solution that addresses
such drawback is based on the following
protocol. Assume, Alice and Bob agree on a
commutative and product-homomorphic
encryption scheme E and QI fA1; . .
.;Aug. Further, they agree on a coding c_
(4) as well. Since other non-QI attributes do
not play any role in our computation,
without loss of generality, let _i hv0 1; . . .
; v0 si be the tuple containing only the s
nonsuppressed QI attributes of witness wi,
and t hv1; . . . ; vui. Protocol 4.1 allows
Alice to compute an anonymized version of
t without letting her know the contents of t
and, at the same time, without letting Bob
know what the suppressed attributes of the
tuples in T. are The protocol works as
follows: At Step 1, Alice sends Bob an
encrypted version of _i, containing only the
s nonsuppressed QI attributes. At Step 2,
Bob encrypts the information received from
Alice and sends it to her, along with
encrypted version of each value in his tuple
t. At Steps 3-4, Alice examines if the
nonsuppressed QI attributes of _i is equal to
those of t.

Conclusion/Future work
In this paper, we have presented two secure
protocols for privately checking whether a
k-anonymous database retains its anonymity
once a new tuple is being inserted to it.
Since the proposed protocols ensure the
updated database remains k-anonymous, the
results returned from a users (or a medical
researchers) query are also k-anonymous.
Thus, the patient or the data providers
privacy cannot be violated from any query.
As long as the database is updated properly
using the proposed protocols, the user
queries under our application domain are
always privacy-preserving.
References
[1] G. Aggarwal, T. Feder, K. Kenthapadi,
R. Motwani, R. Panigrahy, D. Thomas, and
A. Zhu, Anonymizing Tables, Proc. Intl
Conf. Database Theory (ICDT), 2005.
[2] E. Bertino and R. Sandhu, Database
SecurityConcepts, Approaches and
Challenges, IEEE Trans. Dependable and
Secure Computing, vol. 2, no. 1, pp. 2-19,
Jan.-Mar. 2005.
[3] J.W. Byun, T. Li, E. Bertino, N. Li, and
Y. Sohn, Privacy- Preserving Incremental
Data Dissemination, J. Computer Security,
vol. 17, no. 1, pp. 43-68, 2009.
[4] S. Chawla, C. Dwork, F. McSherry, A.
Smith, and H. Wee, Towards Privacy in
Public Databases, Proc. Theory of
Cryptography Conf. (TCC), 2005.
[5] B.C.M. Fung, K. Wang, A.W.C. Fu, and
J. Pei, Anonymity for Continuous Data
Page 130 of 165
Publishing, Proc. Extending Database
Technology Conf. (EDBT), 2008.
[6] Y. Han, J. Pei, B. Jiang, Y. Tao, and Y.
Jia, Continuous Privacy Preserving
Publishing of Data Streams, Proc.
Extending Database Technology Conf.
(EDBT), 2008.
[7] J. Li, B.C. Ooi, and W. Wang,
Anonymizing Streaming Data for
Privacy Protection, Proc. IEEE Intl Conf.
Database Eng. (ICDE), 2008.
[8] A. Trombetta and E. Bertino, Private
Updates to Anonymous Databases, Proc.
Intl Conf. Data Eng. (ICDE), 2006.
[9] K. Wang and B. Fung, Anonymizing
Sequential Releases, Proc.ACM
Knowledge Discovery and Data Mining
Conf. (KDD), 2006.
Page 131 of 165
Preventing Node and Link Failure in
IP Network Recovery Using MRC
1
N.GOWRISHANKAR
3
S.ARAVINDH
2
S.JAI KRISHNAN Senior Lecturer, Co-Guide
UG-Student, Gojan School of Business And
Gojan School of Business and Technology, Chennai-52
Technology,Chennai-52

gowthamshankar26@gmail.com aravindhgojan@gmail.com

Abstract
As the Internet takes an increasingly
central role in our communications
infrastructure, the slow convergence of routing
protocols after a network failure becomes a
growing problem. To assure fast recovery from
link and node failures in IP networks, we
present a new recovery scheme called Multiple
Routing Configurations (MRC). Our proposed
scheme guarantees recovery in all single
failure scenarios, using a single mechanism to
handle both link and node failures, and
without knowing the root cause of the failure.
MRC is strictly connectionless, and assumes
only destination based hop-by-hop forwarding.
MRC is based on keeping additional routing
information in the routers, and allows packet
forwarding to continue on an alternative
output link immediately after the detection of a
failure. It can be implemented with only minor
changes to existing solutions. In this paper we
present MRC, and analyze its performance
with respect to scalability, backup path lengths,
and load distribution after a failure. We also
show how an estimate of the traffic demands in
the network can be used to improve the
distribution of the recovered traffic, and thus
reduce the chances of congestion when MRC is
used.
Index Terms -Availability, computer network
reliability, communication system fault
tolerance, communication system routing, and
protection.
I. INTRODUCTION
This network-wide IP re-convergence is a time
consuming process, and a link or node failure is
typically followed by a period of routing
instability. Much effort has been devoted to
optimizing the different steps of the convergence
of IP routing, i.e., detection, dissemination of
information and shortest path calculation. The
IGP convergence process is slow because it is
reactive and global. It reacts to a failure after it
has happened, and it involves all the routers in
the domain. In this paper we present a new
scheme for handling link and node failures in IP
networks. Multiple Routing Configurations
(MRC) is a proactive and local protection
mechanism that allows recovery in the range of
milliseconds. MRC allows packet forwarding to
continue over preconfigured alternative next-
hops immediately after the detection of the
failure.
Using MRC as a first line of defense
against network failures, the normal IP
convergence process can be put on hold. MRC
guarantees recovery from any single link or
node failure, which constitutes a large majority
of the failures experienced in a network. MRC
makes no assumptions with respect to the root
cause of failure, e.g., whether the packet
forwarding is disrupted due to a failed link or a
failed router.
The main idea of MRC is to use the
network graph and the associated link weights to
produce a small set of backup network
configurations. The link weights in these backup
configurations are manipulated so that for each
link and node failure, and regardless of whether
it is a link or node failure, the node that detects
the failure can safely forward the incoming
packets towards the destination on an alternate
link. MRC assumes that the network uses
shortest path routing and destination based hop-
by-hop forwarding.
This gives great flexibility with respect
to how the recovered traffic is routed. The
Page 132 of 165
backup configuration used after a failure is
selected based on the failure instance, and thus
we can choose link weights in the backup
configurations that are well suited for only a
subset of failure instances. The rest of this paper
is organized as follows.
II. MRC OVERVIEW
MRC is based on building a small set of
backup routing configurations that are used to
route recovered traffic on alternate paths after a
failure. The backup configurations differ from
the normal routing configuration in that link
weights are set so as to avoid routing traffic in
certain parts of the network. We observe that if
all links attached to a node are given sufficiently
high link weights, traffic will never be routed
through that node. The failure of that node will
then only affect traffic that is sourced at or
destined for the node itself. Similarly, to exclude
a link (or a group of links) from taking part in
the routing, we give it infinite weight. The link
can then fail without any consequences for the
traffic. Our MRC approach is threefold. First,
we create a set of backup configurations, so that
every network component is excluded from
packet forwarding in one configuration. Second,
for each configuration, a standard routing
algorithm like OSPF is used to calculate
configuration specific shortest paths and create
forwarding tables in each router, based on the
configurations.
The use of a standard routing algorithm
guarantees loop-free forwarding within one
configuration. Finally, we design a forwarding
process that takes advantage of the backup
configurations to provide fast recovery from a
component failure. In our approach, we
construct the backup configurations so that for
all links and nodes in the network, there is a
configuration where that link or node is not used
to forward traffic. Thus, for any single link or
node failure, there will exist a configuration that
will route the traffic to its destination on a path
that avoids the failed element. Also, the backup
configurations must be constructed so that all
nodes are reachable in all configurations and
node in a network. Using a standard shortest
path calculation, each router creates a set of
configuration-specific forwarding tables.
When a router detects that a neighbor
can no longer be reached through one of its
interfaces, it does not immediately inform the
rest of the network about the connectivity
failure. Instead, packets that would normally be
forwarded over the failed interface are marked
as belonging to a backup configuration, and
forwarded on an alternative interface towards its
destination. The packets must be marked with a
configuration identifier, so the routers along the
path know which configuration to use.
It is important to stress that MRC does
not affect the failure free original routing, i.e.,
when there is no failure, all packets are
forwarded according to the original
configuration, where all link weights are normal.
Upon detection of a failure, only traffic reaching
the failure will switch configuration. All other
traffic is forwarded according to the original
configuration as normal. If a failure lasts for
more than a specified time interval, a normal re-
convergence will be triggered. MRC does not
interfere with this convergence process, or make
it longer than normal. However, MRC gives
continuous packet forwarding during the
convergence, and hence makes it easier to use
mechanisms that prevent micro-loops during
convergence, at the cost of longer convergence
times.
III. GENERATING BACKUP
CONFIGURATIONS
A. Configurations Structure
Definition: A configuration is an ordered pair of
The graph and a function that assigns an integer
weight to each link. We distinguish between the
normal configuration and the backup
configurations. In the normal configuration, , all
links have normal weights .
We assume that is given with finite integer
weights. MRC is agnostic to the setting of these
weights. In the backup configurations,
selected links and nodes must not carry any
transit traffic. Still, traffic must be able to depart
from and reach all operative nodes. These traffic
regulations are imposed by assigning high
weights to some links in the backup
configurations:
Page 133 of 165
The purpose of the restricted links is to isolate a
node from routing in a specific backup
configuration, such as node 5 to the left in many
topologies, more than a single node can be
isolated simultaneously. In the example to the
right in Restricted and isolated links are always
given the same weight in both directions.
However, MRC treats links as unidirectional,
and makes no assumptions with respect to
symmetric link weights for the links that are not
restricted or isolated.
Definition: A configuration backbone , consists
of all non-isolated nodes in and all links that are
neither isolated nor restricted:

Definition: A backbone is connected if and only
if
B. Algorithm
The number and internal structure of backup
configurations in a complete set for a given
topology may vary depending on the
construction model. If more configurations are
created,
Fewer links and nodes need to be isolated per
configuration, giving a richer (more connected)
backbone in each configuration. On the other
hand, if fewer configurations are constructed,
the state requirement for the backup routing
information storage is reduced. However,
calculating the minimum number of
configurations for a given topology graph is
computationally demanding.
1) Description: Algorithm 1 loops through all
nodes in the topology, and tries to isolate them
one at a time. A link is isolated in the same
iteration as one of its attached nodes. The
algorithm terminates when either all nodes or
links in the network are isolated in exactly one
configuration, or a node that cannot be isolated
is encountered.
a) Main loop: Initially, backup configurations
are created as copies of the normal
configuration. A queue of nodes and a queue of
links are initiated. The node queue contains all
nodes in an arbitrary sequence. The link queue is
Initially empty, but all links in the network will
have to pass through it. Method returns the first
item in the queue, removing it from the queue.
b) Isolating links: Along with, as many as
possible of its attached links are isolated. The
algorithm runs through the links attached to. It
can be shown that it is an invariant in our
algorithm that in line 1, all links in are attached
to node . In the case that the neighbor node was
not isolated in any configuration, we isolate the
link along with if there exists another link not
isolated with. If the link cannot be isolated
together with node, we leave it for node to
isolate it later.
2) Output: We show that successful execution
of Algorithm 1 results in a complete set of valid
backup configurations.
Page 134 of 165
Proposition 3.3: If Algorithm 1 terminates
successfully, the produced backup
configurations.
Proof: Links are only given weights or in the
process of isolating one of its attached nodes.
Proposition 3.4: If Algorithm 1 terminates
successfully, the backup configurations set are
complete, and all configurations are valid.
Proof: Initially, all links in all configurations
have original link weights. Each time a new
node and its connected links are isolated in a
configuration we verify that the backbone in that
configuration remains connected. When the
links are isolated, it is checked that the node has
at least one neighbor not isolated in. When
isolating a node, we also isolate as many as
possible of the connected links. Algorithm 1). If
it does terminate with success, all nodes and
links are isolated in one configuration, thus the
configuration set is complete.
3) Termination: The algorithm runs through all
nodes trying to make them isolated in one of the
backup configurations and will always terminate
with or without success. If a node cannot be
isolated in any of the configurations, the
algorithm terminates without success. However,
the algorithm is designed so that any bi
connected topology will result in a successful
termination, if the number of configurations
allowed is sufficiently high.
Proposition 3.5: Given a bi-connected graph,
there will exist , so that Algorithm 1 will
terminate successfully.
Proof: Assume. The algorithm will create
backup configurations, isolating one node in
each backup configuration. In bi-connected
topologies this can always be done. Along with
a node, all attached links except one, say, can be
isolated. By forcing node to be the next node
processed, and the link to be first among node
and link will be isolated in the next
configuration. This can be repeated until we
have configurations so that every node and link
is isolated. This holds also for the last node
processed, since its last link will always lead to a
node that is already isolated in another
configuration. Since all links and nodes can be
isolated, the algorithm will terminate
successfully.
4) Complexity: The complexity of the proposed
algorithm is determined by the loops and the
complexity of the method. This method
performs a procedure similar to determining
whether a node is an articulation point in a
graph, bound to worst case . Additionally, for
each node, we run through all adjacent links,
whose number has an upper bound in the
maximum node degree.
IV. LOCAL FORWARDING
PROCESS
When a packet reaches a point of failure, the
node adjacent to the failure, called the detecting
node, is responsible for finding a backup
configuration where the failed component is
isolated. The detecting node marks the packet as
belonging to this configuration, and forwards the
packet. From the packet marking, all transit
routers identify the packet with the selected
backup configuration, and forward it to the
egress node avoiding the failed component.

Fig: 1 1Pocket Forwarding state diagram
we can distinguish between two cases. If,
forwarding can be done in configuration, where
both and will be avoided. In the other case, the
challenge is to provide recovery for the failure
of link when node is operative. Our strategy is to
forward the packet using a path to that does not
contain. Furthermore, packets that have changed
configuration before (their configuration ID is
different than the one used in ), and still meet a
Page 135 of 165
failed component on their forwarding path, must
be discarded.
A. Implementation Issues
The forwarding process can be implemented in
the routing equipment as presented above,
requiring the detecting node to know the backup
configuration for each of its neighbors. Node
would then perform at most two additional next-
hop look-ups in the case of a failure. However,
all nodes in the network have full knowledge of
the structure of all backup configurations.
Hence, node can determine in advance the
correct backup configuration to use if the normal
next hop for a destination has failed. This way
the forwarding decision at the point of failure
can be simplified at the cost of storing the
identifier of the correct backup configuration to
use for each destination and failing neighbor.
V. PERFORMANCE EVALUATION
MRC requires the routers to store additional
routing configurations. The amount of state
required in the routers is related to the number
of such backup configurations.
A. Evaluation Setup
For each topology, we measure the
minimum number of backup configurations
needed by our algorithm to isolate every node
and link in the network. Recall from Section III-
B that our algorithm for creating backup
configurations only takes the network topology
as input, and is not influenced by the link
weights.
Loop-Free Alternates (LFA) . LFA is a cheaper
fast reroute technique that exploits the fact that
for many destinations, there exists an alternate
next-hop that will not lead to a forwarding loop.
If such alternate paths exist for all traffic that is
routed through a node, we can rely on LFA
instead of protecting the node using MRC.
They define a piecewise linear cost function that
is dependent on the load on each of the links in
the network. is convex and resembles an
exponentially growing function.
Fig:2 The Cost239 network
They define a piecewise linear cost function that
is dependent on the load on each of the links in
the network. is convex and resembles an
exponentially growing function.
B. Number of Backup Configurations
The table also shows how many nodes that are
covered by LFAs, and the number of
configurations needed when MRC is used in
combination with LFAs. Since some nodes and
links are completely covered by LFAs, MRC
needs to isolate fewer components, and hence
the number of configurations decreases
Fig.3. The number of backup configurations
required for a wide range of BRITE generated
topologies. As an example the bar name wax-2-
16 denotes that the Waxman model is used with
a links-to-node ratio of 2, and with 16
for some topologies. This modest number of
backup configurations shows that our method is
Page 136 of 165
implementing able without requiring a
prohibitively high amount of state information.
B. Backup Path Lengths
The numbers are based on 100 different
synthetic Waxman topologies with 32 nodes and
64 links. All the topologies have unit weight
links, in order to focus more
Fig 4 Backup path lengths in the case of a node
failure.
Fig:5 Average backup path lengths in the case of
a node failure as a function of the number of
backup configurations.
on the topological characteristics than on a
specific link weight configuration. Algorithm 1
yields richer backup configurations as their
number increases.
C. Load on Individual Links
The maximum load on all links, which are
indexed from the least loaded to the most loaded
in the failure-free case.
Fig:6 Load on all unidirectional links in the
ailure free case, after IGP re-convergence, and
when RC is used to recover traffic. Shows each
individual links worst case scenario.
The results indicate that the restricted routing in
the backup topologies result in a worst case load
distribution that is comparable to what is
achieved after a complete IGP rerouting process.
However, we see that for some link failures,
MRC gives a some what higher maximum link
utilization in this network.

VI. RECOVERY LOAD DISTRIBUTION
MRC recovery is local, and the
recovered traffic is routed in a backup
configuration from the point of failure to the
egress node. If MRC is used for fast recovery,
the load distribution in the network during the
failure depends on three factors:
(a) The link weight assignment used in the
normal configuration,
(b) The structure of the backup configurations,
i.e., which links and nodes are isolated in each ,
(c) The link weight assignments used in the
backbones of the backup configurations.
The link weights in the normal configuration (a)
are important since MRC uses backup
configurations only for the traffic affected by the
failure, and all non-affected traffic is distributed
According to them. The backup configuration
structure (b) dictates which links can be used
used in the recovery paths for each failure. The
backup configuration link weight assignments
(c) determine which among the available backup
paths are actually used.
REFERENCES
[1] D. D. Clark, The design philosophy of
theDARPAinternet protocols, ACM SIGCOMM
Comput. Commun. Rev., vol. 18, no. 4,
[2] A. Basu and J. G. Riecke, Stability issues in
OSPF routing, in Proc. ACM SIGCOMM, San
Diego, CA, Aug. 2001, pp. 225236.
[3] C. Labovitz, A. Ahuja, A. Bose, and F.
Jahanian, Delayed internet routing
convergence, IEEE/ACM Trans. Networking,
vol. 9, no. 3, pp. 293306, Jun. 2001.
[4] C. Boutremans, G. Iannaccone, and C. Diot,
Impact of link failures on VoIP performance,
in Proc. Int. Workshop on Network and
perating System Support for Digital Audio and
Video, 2002, pp. 6371.
Page 137 of 165

Abstract
Product Authentication is one of the fundamental
procedures to ensure the standard and quality of
any product in the market. Counterfeit products
are often offered to consumers as being authentic.
Counterfeit consumer goods such as electronics,
music, apparel, and Counterfeit medications have
been sold as being legitimate. Efforts to control
the supply chain and educate consumers to
evaluate the packaging and labeling help ensure
that authentic products are sold and used.
However educating the consumer to evaluate the
product is a challenging task. Our work ensures
that the task is made as simple with the help of a
camera enabled mobile phone supported with QR
(Quick Response) Code Reader. We propose a
model whereby the application in the mobile
phone decodes the captured coded image and
sends it through the Cloud Data Management
Interface for authentication. The system then
forwards the message to product manufacturers
data center or any central database and the
response received from the cloud enables the
consumer to decide on the products authenticity.
The authentication system is made as a pay per
use model and thereby making it as a Security as a
Service (Saas) architecture. The system is being
implemented with a mobile toolkit in the cloud
environment provided by the simulator Cloud
Sim 1.0.
Index TermsCloud Computing, QR codes,
Authentication, 2D Codes, Security as a
Services.
I. INTRODUCTION
Authentication is one of the most important
process for any consumer to identify whether the
product we buy was from an authentic
manufacurer or any fictious company and also to
ensure that the product is well within the limit of
its expiry. In the recent times there are a lot of
duplicate and expired products present in the
market, the duplication of products has penetrated
into many products starting from basic provisions
to more important pharmaceuticals. The
consumers cannot judge whether the product is
original or duplicate on their own by checking the
manufactured date and the expired date. The lack
of awareness about a products authenticity was
well exposed in a recent issue where the
consumers faced an issue with the duplication of
medicines. It has been found that many expired
medicines has been recycled and sold in the
market as new ones [1]. This problem occurred
mainly because of improper authentication system
to find whether the product is an original one.
Thus to prevent this from happening again in
this pharmaceuticals field or with any other
counsumer product, a proper effective
authentication system must be implemented which
prevents the shop keepers or the stockiest to
modify any of the records regarding the originality
of the product. The present authentication systems
dealing with the product identification and
authentication are Barcode and Hologram.
Barcodes are the most common form of identiy
establishment technique where a series of black
vertical lines of various widths associated with
numbers is printed on every products. Being an
age old technique this is quite easily duplicated
The second and most efficient technique is the
Holograms. Holograms are photographic images
that are three-dimensional and appear to have
A Cloud Computing Solution for Product Authentication using QR Code
G.Bhagath, B.Harishkumar,
Department of Computer Science and Engineering,
Gojan School of Business and Technology,
bhagath.gopinath@gmail.com, harish_storm26@yahoo.com
Page 138 of 165
depth. Holograms work by creating an image
composed of two superimposed 2-dimensional
pictures of the same object seen from different
reference points. The hologram is printed onto a
set of ultra-thin curved silver plates, which are
made to diffract light, and this thin silver plates
are pasted on to the product for its authenticity.
But the technique of hologram stickers are a bit
expensive because of its cost of manufacturing
and hence authenticating a low price consumer
goods would not be a feasible solution. The draw
backs on the above techniqes are that on the one
end the bar code can be easiliy duplicated and on
the other extreme the hologram stickers are quite
sophesticated for a normal consumers to identify
the intricate details and come to a conclusion
about the originality of the product.

The drawbacks of the barcode and 3-D hologram
technique has led to the evloution of a new
technique called the QR code (Quick Response). It
is a plain old matrix code manufactured with the
intent of decoding it at very high speed. QR Code
was created as a step up from a bar code. QR
Code contains data in both vertical and horizontal
directions, whereas a bar code has only one
direction of data, usually the vertical one. QR
Code can also correspondingly hold more
information and are easily digested by scanning
equipment, and because it has potentially twice
the amount of data as bar code, it can increase the
effectiveness of such scanning. Further QR Code
can handle alphanumeric character, symbol,
binary, and other kinds of code. QR Code also has
an error-correction capability, whereby the data
can be brought back to full life even if the symbol
has been trashed. All of these features make QR
Code far superior to bar code.
All the products we buy will have a (QR)
code printed on its cover and it is unique for each
product which is going to be used in our
authentication system. This application reads the
codes printed on the external cover of the product
and it is encoded to get the data stored in the code.
Then the code is encrypted to add more security to
the code and it is sent to the central web server
which is in the cloud through SMS (Short
Messaging Service). The data can also be sent
through WAP (Wireless Access protocol) and
MMS (Multimedia Messaging Service); but the
cost factor is less in SMS when compared with
WAP and MMS.
The central server collects the data and checks
the data in the manufacturers server for the
products code. The code is searched with a
searching algorithm and if it is found, the data in
the manufacturers database is marked as bought
and a reply is sent to the central web server that
the product is original. If a match is not found
then the manufacturers server will return message
stating that the product is duplicate. The web
server can convey the message to the user.
II. RELATED WORK
The QR codes are now presently used in the
web pages to access the webpage directly from the
mobile phone without entering the URL in the
mobile phone but by capturing the QR code by the
camera device attached with the mobile phones.
The QR codes can also be used in the business
cards; the QR code encoded with the data about
the Person is created and printed in the business
cards of the person. If any of his friends wants to
add the details in the mobile phone contact list,
the QR code is just captured in the mobile with
the camera and the reader software in the mobile
phones decodes the data in the image and stores
the various details of the person in the mobiles
phone book.
In the present situation two authentication
systems are mainly used, there are disadvantages
with both the barcode and hologram. The problem
with the barcode is that any one can read the
numbers of a barcode and modify it to make their
duplicate product to look original and the
holograms do not contain any hidden data; it is
just an image with 3Dimensional effect. The
duplicator can create a new hologram which looks
similar to the original hologram which makes his
duplicate product original and also that Holograms
cannot be used for large quantity as the cost of
printing holograms is costlier.
Now the consumers when buying the product
has only two options for deciding the originality of
the product; either they must believe on the
present authentication system and believe that its
original or they have to decide based on the shop
keepers assurance. But both of this option will not
work all the time as the present authentication
system ( Barcode and holograms) can also be
duplicated and look like an original one and the
shop keeper might too tell that the product is
Page 139 of 165
original as he has to sell his product. So there is
no proper authentication to identify the originality
of the product.
III. PRODUCT AUTHENTICATION USING QR
CODE
Thus authentication of consumer products can be
done with the QR codes it is printed on the cover
of the product it is captured as an image through
the camera attached with the mobile phone. The
image is then opened with the QR code reading
application to extract the data from the code and is
sent to the central web server as an SMS. The web
server is connected to the cloud with through
internet; the web server on receiving the SMS
sends the data to the corresponding
manufacturers server in the cloud. The
manufacturers server using a searching algorithm
looks for the data in the corresponding database. If
the data is found a reply is sent to the central
server stating that the product is original and if the
corresponding record is not found then the
manufacturers server sends a message to the
central server stating that the product is a
duplicate one. The web server on receiving the
message from the manufacturers server sends a
message to the user stating the status of the
product and the user on receiving the message
from the central server can then decide on buying
the product.
The QR code which is used in our model is
better than the present Barcode and holograms as
the QR codes are not in human understandable
form, as no one can make changes to make it look
original it can only read by the QR code readers.
In this model the verification process is done by
the user itself and there is no shop keepers hand in
the complete authentication process. The user
sends the captured image and the final result is
also received only by the user with the help of the
QR code reader which helps in reading the data
printed in the form of QR code. For our model we
are using the QR code reader application which is
written in J2ME (Java 2 Micro Edition). J2ME
helps the QR code reader to work in all java
enabled mobile phones irrespective of its screen
size thus making our model to work with all
mobile phones with a small constraint that the
mobile phone should have a capturing device
attached to it.
In our model the computing technology used to
connect the mobile devices with central web
server is Cloud Computing which allows the users
from various locations to access the web server to
check the products originality. Cloud computing
helps in easy access to all the remote sites
connected in the internet. The central server sends
the reply from the manufacturers server to the
user who requests with a QR code to find the
originality of a product. The central server can
send the solution to the user in two ways; it can
either send a SMS with details about the
originality of the product or the web server can
send a voice message to the user about the
originality, the option of sending the reply is based
on the users selection while registering to the web
server in the beginning.

IV. MOBILE DEPLOYMENT MODEL
The mobile phone is the important device which
is used in our proposal as the user needs a device
to send the data and receive a reply from the web
server. It is found that by the end of 2009, 4
billion people are using mobile phones and by
2013, that number is projected to grow to 6
billion, which is much more than the personal
computer users which show that nearly everyone
has a mobile phone. So the same can be used for
our process rather than buying a new device for
authentication process. The mobile phones service
providers have also reduced the cost charged for
SMS which reduces the cost for the data transfer
when using a mobile phone. The most important
advantage in this model by using the mobile
phone is; the user can send the data and get the
reply without anybodys help or intervention thus
the privacy is maintained and The speed of
transfer is also high when compared with MMS..
The data from the mobile device to the central
web server is through the SMS as it is more
economic than the other data transfer modes like
MMS and WAP.
Page 140 of 165
V. CLOUD COMPUTING FOR AUTHENTICATION
Cloud computing has now come into the mobile
world as Mobile Cloud Computing, the cloud
computing provides general applications online
which can be accessed through a web browser
while all the software and data resides in the
server and the client can access those applications
and data without the complete knowledge about
the infrastructure.. The cloud computing has five
essential characteristics:
On demand self service,
Broad network access,
Resource pooling,
Rapid elasticity and
Measured service.
The figure 1.2 shows the cloud structure of the
Cloud Computing.
Fig 1.2
It can be explained in a simple way as it is a
Client-Server architecture where the clients
request a service and not a server. In general the
cloud computing users do not own their data, all
the data is placed in the cloud and the user can
access the data through a computer or a mobile
device. In our model cloud computing is chosen
because the manufacturers server will be located
in various locations and will have a huge amount
of data related to the products. In normal
computing technology we need to load the data in
from the server and check it for the required
record in the client machine. With the help of
cloud computing we can directly access the data
present in the manufacturers server and get the
data; this reduces the accessing time of the data
and increases the speed of the process. In our
model the manufacturers server and our central
server is located in the cloud and the user can
access the central server from any location in the
country and get the authentication information.
The central web server in the cloud searches for
the corresponding manufacturers server and sends
the data to it. As all the servers are in the cloud the
searching process is simple.
VI. IMPLEMENTATION AND RESULTS
The various steps involved in the process of
authenticating the products are as follows. First
the QR code is captured with the camera
attached to the mobile device and the captured
image is then encoded with the encode()
function. Then the encoded data is then sent to
the central server in the cloud through SMS
with the help of the sendEncoded() function.
The central server on receiving the data from the
mobile searches the respective server and
checks for the record. The reply is then sent to
the central server and then the server sends the
reply to the mobile device with the help of the
sendReply() function. The fig 1.3 shows the
process in a sequence.

The pseudo code for the proposed model is
shown below.
// Module for Encoding the data
encode()
{
if ( image == QRcode)
{
encode the code and get the data;
Page 141 of 165
}
else
return Encode Failed;
}

// Module to send the Encoded data
sendEncoded()
{
Store the encoded data in a buffer
send through SMS
if ( sending == success)
return sending success
else
return
sending failed
}
// Module to send data from server to mobile
sendReply()
{
if ( record == found )
return Original
else
return Duplicate
}
REFERENCES
[1] www.thehindu.com/2010/04/02/stories/2
010040252160400.htm-
[2] The Green Grid Consortium
http://www.gridbus.org/cloudsim/
[3] R.Buyya, C.S.Yeo, S.Venugopal, J.Broberg,
and I.Brandic. Cloud Computing and
Emerging IT platforms : Vision, Hype and
Reality for Delivering Computing as the 5
th
Utility. Future Generation Computer
Systems, 25(6):599-616, Elsevier, June 2009
.
Page 142 of 165
1
H HI I G GH HW WA AY Y T TR RA AN NS SP PO OR RT TA AT TI I O ON N U US SI I N NG G R RO OB BO OT TI I C CS S
N.Banupriya
III ECE
banubhuvi@gmail.com
C.S.Revathi
III ECE
revathi.aru66@gmail.com

ABSTRACT:
Highway travel is the lifeblood
of modern industrial nations. The
larger roads are sorely overburdened:
around the major cities, heavy usage
slows most peak-hour travel on
freeways to less than 60 kilometers per
hour. In all excessive traffic causes
more than five billion hours delay
every year; it wastes countless gallons
of fuel and needless multiplies exhaust
emissions. The main goal of this
project is to make the experience of
driving less burdensome and accident
less, especially on long trips. This can
be achieved by making the highway
itself part of the driving experience and
integrating roadside technologies that
would allow the overburdened highway
system to be used more efficiently.
The automobiles will have automatic
throttle, braking and steering control.
Here is a system to host these cars
consist of roadside sensors that obtain
information about current traffic
conditions and rely them to receivers in
the automobiles on the road. The
automobiles can be grouped together at
highway speeds, 65-70 MPH, no more
than a few feet apart, which make
better use of the available roadways. In
this manner, the traffic systems and the
automobiles work together to bring
passengers safely and quickly to their
destinations.
INTRODUCTION:
People now take for granted
automotive systems like emission
control and fuel injection. In fact, many
people do not realize how many systems
inside their automobiles are already
monitored and controlled by computers.
Fuel delivery, ignition, emission, air-
conditioning, and automatic transmission
systems are examples of the systems
used daily by a car that are computer
controlled or assisted
Now in the information age, people
have come to rely on the other driver
assistance technologies, such as mobile
phones and in-vehicle navigation
systems. The goal of these technologies
is to make the experience of driving less
burdensome, especially on long trip.
Even when cars were still young,
futurists began thinking about vehicles
that could drive themselves, without
human help. best known of. During the
last six decades interest in automated
vehicles rose and fell several times. Now
at the start of the new century, it's worth
taking a fresh look at this concept and
asking how automation might change
transportation and the quality of our
lives.
Consider some of the implications
of cars that could drive themselves.
We might eliminate the more than
ninety percent of traffic crashes that are
caused by human errors such as
Page 143 of 165
2
misjudgments and inattention,
We might reduce antisocial driving
behavior such as road rage,
rubbernecking delays, and unsafe
speeds, thereby significantly reducing
the stress of driving.
The entire population, including the
young, the old, and the infirm, might
enjoy a higher level of mobility without
requiring advanced driving skills.
The luxury of being chauffeured to
your destination might be enjoyed by the
general populace, not just the wealthiest
individuals, so we might all do whatever
we like, at work or leisure, while
traveling in safety.
Fuel consumption and polluting
emissions might be reduced by
smoothing traffic flow and running
vehicles close enough to each other to
benefit from aerodynamic drafting.
Traffic-management decisions might
be based on firm knowledge of vehicle
responses to instructions, rather than on
guesses about the choices that drivers
might make.
The capacity of a freeway lane might
be doubled or tripled, making it possible
to accommodate growing demands for
travel without major new construction,
or, equivalently, today's level of
congestion might be reduced, enabling
travelers to save a lot of time.
IS IT FEASIBLE?
Automating the process of driving
is a complex endeavor. Advancements in
information technology of the past
decade have contributed greatly, and
research specifically devoted to the
design of automated highway systems
has made many specific contributions.
This progress makes it possible for us to
formulate operational concepts and
prove out the technologies that can
implement them.
AN AUTOMATED DRIVE:
We can now readily visualize your
trip on an automated highway system:
Imagine leaving work at the end of the
day and needing to drive only as far as
the nearest on-ramp to the local
automated highway. At the on-ramp, you
press a button on your dashboard to
select the off-ramp closest to your home
and then relax as your car's electronic
systems, in cooperation with roadside
electronics and similar systems on other
cars, guide your car smoothly, safely,
and effortlessly toward your destination.
Enroute you save time by maintaining
full speed even at rush-hour traffic
volumes. At the end of the off-ramp you
resume normal control and drive the
remaining distance to your home, better
rested and less stressed than if you had
driven the entire way. The same
capability can also be used over longer
distances, e.g. for family vacations that
leave everybody, including the "driver,"
relaxed and well-rested even after a
lengthy trip in adverse weather.
Although many different technical
developments are necessary to turn this
image into reality, none requires exotic
technologies, and all can be based on
systems and components that are already
being actively developed in the
international motor vehicle industry.
These could be viewed as replacements
for the diverse functions that drivers
perform every day: observing the road,
observing the preceding vehicles,
steering, accelerating, braking, and
deciding when and where to change
course.
OBSERVING THE ROAD :
Cheap permanent magnets are
buried at four-foot intervals along the
lane centerline and detected by
Page 144 of 165
3
magnetometers mounted under the
vehicle's bumpers. The magnetic-field
measurements are decoded to determine
the lateral position and height of each
bumper at accuracies of less than a
centimeter. In addition, the magnets'
orientations (either North Pole or South
Pole up) represent a binary code (either
0 or 1), and indicate precise milepost
locations along the road, as well as road
geometry features such as curvature and
grade. The software in the vehicle's
control computer uses this information
to determine the absolute position of the
vehicle, as well as to anticipate
upcoming changes in the roadway.
Other researchers have used
computer vision systems to observe the
road. These are vulnerable to weather
problems and provide less accurate
measurements, but they do not require
special roadway installations, other than
well-maintained lane markings.
Both automated highway lanes
and intelligent vehicles will require
special sensors, controllers, and
communications devices to coordinate
traffic flow.
OBSERVING PRECEDING
VEHICLES :
The distances and closing rates to
preceding vehicles can be measured by
millimeter-wave radar or a laser
rangefinder. Both technologies have
already been implemented in
commercially available adaptive cruise
control systems in Japan and Europe.
The laser systems are currently less
expensive, but the radar systems are
more effective at detecting dirty vehicles
and operating in adverse weather
conditions. As production volumes
increase and unit costs decrease, the
radars are likely to find increasing
favour .
STEERING, ACCELERATING AND
BRAKING :
The equivalents of these driver
muscle functions are electromechanical
actuators installed in the automated
vehicle. They receive electronic
commands from the onboard control
computer and then apply the appropriate
steering angle, throttle angle, and brake
pressure by means of small electric
motors. Early versions of these actuators
are already being introduced into
production vehicles, where they receive
their commands directly from the
driver's inputs to the steering wheel and
pedals. These decisions are being made
for reasons largely unrelated to
automation. Rather they are associated
with reduced energy consumption,
simplification of vehicle design,
enhanced ease of vehicle assembly,
improved ability to adjust performance
to match driver preferences, and cost
savings compared to traditional direct
mechanical control devices.
DECIDING WHEN AND WHERE
TO CHANGE COURSE:
Computers in the vehicles and
those at the roadside have different
functions. Roadside computers are better
suited for traffic management, setting the
target speed for each segment and lane
of roadway, and allocating vehicles to
Page 145 of 165
4
different lanes of a multilane automated
facility. The aim is to maintain balanced
flow among the lanes and to avoid
obstacles or incidents that might block a
lane. The vehicle's onboard computers
are better suited to handling decisions
about exactly when and where to change
lanes to avoid interference with other
vehicles.
NEW FUNCTIONS :
Some additional functions have no
direct counterpart in today's driving.
Most important, wireless communication
technology makes it possible for each
automated vehicle's computer to talk
continuously to its counterparts in
adjoining vehicles. This capability
enables vehicles to follow each other
with high accuracy and safety, even at
very close spacing, and to negotiate
cooperative maneuvers such as lane
changes to increase system efficiency
and safety. Any failure on a vehicle can
be instantly known to its neighbors, so
that they can respond appropriately to
avoid possible collisions.
In addition, there should be
electronic "check-in" and "check-out"
stations at the entry and exit points of the
automated lane, somewhat analogous to
the toll booths on closed toll roads,
where you get a ticket at the entrance
and then pay a toll at the exit, based on
how far you traveled on the road. At
check-in stations, wireless
communication between vehicles and
roadside would verify that the vehicle is
in proper operating condition prior to its
entry to the automated lane. Similarly,
the check-out system would seek
assurance of the driver's readiness to
resume control at the exit. The traffic
management system for an automated
highway would also have broader scope
than today's traffic management systems,
because it would select an optimal route
for every vehicle in the system,
continuously balancing travel demand
with system capacity, and directing
vehicles to follow those routes precisely.
Most of these functions have
already been implemented and tested in
experimental vehicles. All except for
check-in, check-out, and traffic
management were implemented in the
platoon-scenario demonstration vehicles
of Demo '97. A single 166 MHz Pentium
computer (obsolete by standards of
today's normal desktop PCs) handled all
the necessary in-vehicle computations
for vehicle sensing, control, and
communications.
REMAINING TECHNICAL
CHALLENGES:
The key technical challenges that
remain to be mastered involve software
safety, fault detection, and malfunction
management. The state of the art of
software design is not yet sufficiently
advanced to support the development of
software that can be guaranteed to
perform correctly in safety-critical
applications as complex as road-vehicle
automation. Excellent performance of
automated vehicle control systems (high
accuracy with superb ride comfort) has
been proved under normal operating
conditions, in the absence of failures.
Elementary fault detection and
malfunction management systems have
already been implemented to address the
most frequently encountered fault
conditions, for use by well-trained test
drivers. However, commercially viable
implementations will need to address all
realistic failure scenarios and provide
safe responses even when the driver is a
completely untrained member of the
general public. Significant efforts are
still needed to develop system hardware
Page 146 of 165
5
8
0
5
2
Line
detector
Obstacle
detector
P
o
r
t
A
D
C
P
o
r
t
P
o
r
t
P
o
r
t
A
D
C
Steering
servo
Left
Right
R
C
L
PWM
E
n
a
b
l
e
DC motor
PWM
Direction
and software designs that can satisfy
these requirements.
NONTECHNICAL CHALLENGES :
The non technical challenges
involve issues of liability, costs, and
perceptions. Automated control of
vehicles shifts liability for most crashes
from the individual driver (and his or her
insurance company) to the designer,
developer and vendor of the vehicle and
roadway control systems. Provided the
system is indeed safer than today's
driver-vehicle-highway system, overall
liability exposure should be reduced. But
its costs will be shifted from automobile
insurance premiums to the purchase or
lease price of the automated vehicle and
toll for use of the automated highway
facility.
All new technologies tend to be
costly when they first become available
in small quantities, then their costs
decline as production volumes increase
and the technologies mature. We should
expect vehicle automation technologies
to follow the same pattern. They may
initially be economically viable only for
heavy vehicles (transit buses,
commercial trucks) and high-end
passenger cars. However, it should not
take long for the costs to become
affordable to a wide range of vehicle
owners and operators, especially with
many of the enabling technologies
already being commercialized for
volume production today.
The largest impediment to
introduction of electronic chauffeuring
may turn out to be the general perception
that it's more difficult and expensive to
implement than it really is. If political
and industrial decision makers perceive
automated driving to be too futuristic,
they will not pay it the attention it
deserves and will not invest their
resources toward accelerating its
deployment. The perception could thus
become a self-fulfilling prophecy.
It is important to recognize that
automated vehicles are already carrying
millions of passengers every day. Most
major airports have automated people
movers that transfer passengers among
terminal buildings. Urban transit lines in
Paris, London, Vancouver, Lyon, and
Lillie, among others, are operating with
completely automated, driverless
vehicles; some have been doing so for
more than a decade. Modern commercial
aircraft operate on autopilot for much of
the time, and they also land under
automatic control at suitably equipped
airports on a regular basis.
Given all of this experience in
implementing safety-critical automated
transportation systems, it is not such a
large leap to develop road vehicles that
can operate under automatic control on
their own segregated and protected
lanes. That should be a realistic goal for
the next decade. The transportation
system will thus gain substantial benefits
from the revolution in information
technology.
HARDWARE PLATFORM FOR
WORKING MODEL:
GENERAL BLOCK DIAGRAM
Page 147 of 165
6
INFRARED PROXIMITY
DETECTOR:
The IR Proximity detector uses
same technology found in a TV remote
control device. The detector sends out
modulated infra-red light, and looks for
reflected light coming back. When
enough light is received back to trigger
the detector circuit, the circuit produces
a high on the output line. Light is in the
form of a continuous string of bursts of
modulated square waves. Bursts
alternate between left and right LEDs. A
microprocessor generates the bursts, and
correlates the receiver output to burst.
The IRPD we have used makes use of a
Panasonic PNA4602M IR sensor
coupled with two IR LEDs to detect
obstacles. The Panasonic module
contains integrated amplifiers, filters,
and a limiter. The detector responds to a
modulated carrier to help eliminate
background noise associated with
sunlight and certain lighting fixtures.
The LEDs are modulated by an
adjustable free running oscillator. The
sensitivity of the sensor is controlled by
altering the drive current to LEDs. The
microcontroller alternatively enables the
LEDs and checks for a reflection. A
provided from the host microcontroller,
one for enabling the left IR LED, the
second for enabling the right IR LED. A
third analog output from the IRPD kit is
connected to an analog-to-digital
converter.
LINE DETECTOR:
The line detector is an infrared
reflective sensor that can be attached to
the front of the car to follow a white line
on a black background, or vice versa.
There are three reflective
sensors, which are made from one piece
of infrared LED and photo detector that
are directed at the surface below the
vehicle. Each of the sensors looks
reflected IR light. When one of the
sensors is positioned over dark or black
surface its output is low. When it is
moved to light or white surface its
output will be high. The microcontroller
accepts these signals and moves the
robot according to the diagram below.
The line detector works effectively when
thickness ranged between to .the
track can be white tape on a black
background or black tape on a white
background. The sensors can be at a
maximum height of .5 inches above the
ground.
The three IR-Detector pairs are
depicted on the right of the circuit
diagram. The base of each of the
transistors is passed through an inverter.
The lines from the inverter are passed to
microcontroller and to the LEDs
indicating the position of the line
detector on the road.
As the emitted light from the IR
LED is reflected from the road back to
the transistor the current starts flowing
through the emitter making the base low.
Te base is connected to the inverter
which causes the line to go at its output.
Since the output lines are also connected
to the LEDs, the corresponding LED
glows when the particular output line is
high.
STEERING SERVO :
A servo comprises of control, a
set of gears, a potentiometer and a
motor. The potentiometer is connected to
the motor via gear set .a control signal
gives the motor a position to rotate to
and the motor starts to turn. The
potentiometer rotates with motor and as
it does so its resistance changes. The
control circuit monitors its resistance, as
soon as its reaches its appropriate values
Page 148 of 165
7
the motor stop and the servo is in correct
position. A servo is a classic example of
a closed loop feedback system. The
potentiometer is coupled to the output
gear. Its resistance is proportional to the
position of the servos output shaft(0 to
180 degrees).
CONCLUSION:
National Highway Traffic and
Safety Administration is an ongoing
research on collision avoidance and
driver/vehicle interfaces. AHS was a
strong public/private partnership with
the goal to build a prototype system.
There are many things that can be done
in the vehicle, but if we do some of them
on the roadway it will be more efficient
and possibly cheaper. Preliminary
estimates show that rear-end, lane-
change, and roadway-departure crash-
avoidance systems have the potential to
reduce motor-vehicle crashes by one-
sixth or about 1.2 million crashes a year.
Such systems may take the form of
warning drivers, recommending control
actions, and introducing temporary or
partial automated control in hazardous
situations. AHS described in this paper
is functional, there is much room for
improvement. More research is needed
to determine if any dependencies exist
that influence velocity of the vehicle
maintaining proper following distance
while following a path. Assuming such
system is ever perfected, one would
imagine it would tend to render the great
tradition of the free-ranging car into
something approaching mass-transit.
After all, when your individual car
becomes part of a massive circulatory
system, with destination pre-selected and
all spontaneity removed, that makes your
travel any different than trip? Only that
you choose the starting time personally.
Page 149 of 165
A Comprehensive Stream Based Web Services Security Processing System
R.Chitra, Lecturer
Department of Computer
Science & Engineering
Gojan School Of Business
Technology
chitrar05@gmail.com
B.Abinaya
IV CSE
Gojan School Of Business
Technology
abinsb01@gmail.com
M.Geethanjali
IV CSE
Gojan School Of Business
Technology
anjalimthkmr@gmail.com
AbstractWith SOAP-based web services
leaving the stadium of being an explorative
set of new technologies and entering the
stageof mature and fundamental building
blocks for service-driven business
processesand in some cases even for
mission-critical systemsthe demand for
nonfunctional requirements including
efficiency as well as security and
dependability commonly increases rapidly.
Although web services are capable of
coupling heterogeneous information systems
in a flexible and cost-efficient way, the
processing efficiency and robustness against
certain attacks do not fulfill industry-
strength requirements. In this paper, a
comprehensive stream-based WS-Security
processing system is introduced, which
enables a more efficient processing in
service computing and increases the
robustness against different types of Denial-
of-Service (DoS) attacks. The introduced
engine is capable of processing all standard-
conforming applications of WS-Security in a
streaming manner. It can handle, e.g., any
order, number, and nesting degree of
signature and encryption operations, closing
the gap toward more efficient and
dependable web services.
Index TermsWeb services, SOAP, WS-
Security, streaming processing, DoS
robustness, efficient processing.
INTRODUCTION
ENTERPRISES are faced with greatly
changing requirementsinfluencing the way
businesses are created and operated. They
have become more pervasive with a mobile
workforce, outsourced data centers, different
engagements with customers, and
distributed sites. Information and
communication technology (ICT) is
therefore becoming a more and more critical
factor for business. ICT moves from a
business supporter to a business enabler and
has to be partly considered as a business
process on its own. In order to achieve the
required agility of the enterprise and its ICT,
the concept of Service-Oriented
Architectures [1] is increasingly used. The
most common technology for implementing
SOA-based systems is the SOAP-based web
services [2]. Some applications like
Software-as-a-Service (SaaS) [3], [4] or
Cloud Computing [5] are inconceivable
without web services. There are a number of
reasons for their high popularity. SOAP-
based web services enable flexible software
system integration, especially in
heterogeneous environments, and is a
driving technology for interorganization
business processes. Additionally, the large
amount of increasingly mature
specifications, the strong industry support,
and the large number of web service
frameworks for nearly all programming
languages have boosted its acceptance and
usage.
Since SOAP is an XML-based protocol, it
inherits a lot of the advantages of the text-
based XML such as message extensibility,
human readability, and utilization of
standard XML processing components. On
Page 150 of 165
the other hand of course, SOAP also inherits
all of XMLs issues. The main problems
used by critics since the start of web
services are verbosity of transmitted
messages and high resource requirements
for processing [6]. These issues are further
increased when using SOAP security [7]
through the need of handling larger
messages and performing cryptographic
operations. These issues possess
performance challenges which need to be
addressed and solved to obtain the efficiency
and scalability required by large (cross-
domain) information systems. These
problems are especially severe, e.g., in
mobile environments with limited
computing resources and low data rate
network connections [8], or for high-volume
web service transactions comprising a large
number of service invocations per second.
Further, high resource consumption is not
only an economic or convenience factor, it
also increases the vulnerability for resource
exhaustion Denial-of-Service (DoS) attacks.
To overcome the performance issues,
streaming XML processing provides
promising benefits in terms of memory
consumption and processing time. The
streaming approach is not new, but has not
found a widespread adoption yet. Reasons
therefore are manifold. The main issue
surely is the missing random access to
elements inside the XML document which
makes programming difficult. Therefore, a
current trend is using stream-based methods
for simple message preprocessing steps
(e.g., schema validation) and tree-based
processing inside the application. WS-
Security processing is double edged in this
sense. On one hand, high resource
consumption and the ability to detect
malicious messages makes security
processing an ideal candidate for streaming
methods. On the other hand, it requires
rather complex operations on the SOAP
message.
Thus, to date there exists no comprehensive
stream-based WS-Security engine. This
paper presents how a secured SOAP
message as defined in WS-Security can be
completely processed in streaming manner.
It can handle, e.g., any order, number, and
nesting degree of signature and encryption
operations. Thus, the system presented
provides the missing link to a fully streamed
SOAP processing which allows to leverage
the performance gains of streaming
processing as well as to implement services
with an increased robustness against Denial-
of-Service attacks.
STREAMING WS-SECURITY
PROCESSING
In this section, the algorithms for processing
WS-Security enriched SOAP messages in a
streaming manner are presented and
discussed. To understand the algorithms and
the problems solved by them, first of all, an
introduction to the WS-Security elements is
given.
WS-Security
In contrast to most classic communication
protocols, web services do not rely on
transport-oriented security means (like
TLS/SSL [25]) but on message-oriented
security. The most important specification
addressing this topic is WSSecurity [26],
defining how to provide integrity,
confidentiality, and authentication for SOAP
messages. Basically, WS-Security defines a
SOAP header (wsse:Security) that carries
the WS-Security extensions. Additionally, it
defines how existing XML security
standards like XML Signature [27] and
XML Encryption [28] are applied to SOAP
messages. For processing a WS-Security
enriched SOAP message at the server side,
the following steps must be performed (not
necessarily in this order):
. processing the WS-Security header,
. verifying signed blocks, and
Page 151 of 165
. decrypting encrypted blocks.
This implies that not only processing the
individual parts must be considered but also
the references between the WSSecurity
components. This is especially important in
the context of stream-based processing,
since arbitrary navigation between message
parts is not possible in this processingmodel.
Fig. 2 shows an example of a WS-Security
secured SOAP message containing these
references. Security tokens contain identity
information and cryptographic aterial
(typically an X.509 certificate) and are used
inside signatures and encrypted keys and are
backward referenced from those. Encrypted
key elements contain a (symmetric)
cryptographic key, which is asymmetrically
encrypted using the public key of the
recipient. This symmetric key is used for
encrypting message parts (at the client side)
and also for decrypting the encrypted blocks
(at the server side).
Encrypted keys must occur inside the
message before the corresponding encrypted
blocks. Finally, XML signatures have the
following structure:
<ds:Signature>
<ds:SignedInfo>
<ds:CanonicalizationMethod/>
<ds:SignatureMethod/>
<ds:Reference @URI>
<ds:Transforms>...</Transforms>
<ds:DigestMethod/>
<ds:DigestValue>...</DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>...</SignatureValue>
<ds:KeyInfo>...</KeyInfo>?
</ds:Signature>
The signature holdsin addition to
specifying the cryptographic algorithmsa
ds:Reference element for every signed
block, the cryptographic signature value of
the ds:SignedInfo element, and a reference
to the key necessary for validating the
signature. A ds:Reference element itself
contains a reference to the signed block,
optionally some transformations and the
cryptographic hash value of the signed
block. References to signed blocks can be
either backward or forward references. This
has to be taken into account for the
processing algorithm.
There are several possibilities for realizing
the reference. However, only references
according to the Xpointer specification [29]
are recommended (see WS-I Basic Security
Profile [23]). Thus, in the following, we
assume that the referenced element contains
an attribute of the form
Id=myIdentification and is referenced
using the URI #myIdentification inside
the ds:Reference element.
Architecture
Fig. 3 shows the architecture of the system
for stream-based processing of WS-Security
enriched SOAP messages called CheckWay
[30]. It operates on SAX events created by a
SAX parser and contains four types of Event
Page 152 of 165
Handlers. Instances of these Event Handler
types are instantiated on-demand
and linked together in an Event Handler
chain operating on the stream of XML
events [31]. The first handler is responsible
for processing the WS-Security header. As
the header has a fixed defined position
inside the SOAP message, the handler can
be statically inserted inside the handler
processing chain. For signed and encrypted
blocks, however, this is different. These may
occur at nearly arbitrary positions inside the
SOAP message, and can even be nested
inside each other. Thus, the Dispatcher
handler is responsible for detecting signed
and encrypted blocks and inserting a
respective handler into the processing chain
(at which position will be discussed below).
While detecting encrypted blocks is trivial
they start with the element
xenc:EncryptedData), detecting signed
blocks is more difficult as those elements
are not explicitly marked. For forward
references, the signature elements are
(regarding the document order) before the
signed block. Therefore, forward referenced
signed blocks can be detected by comparing
the ID attribute of that element with the list
of references from the signature elements
processed before. For backward references,
there is no possibility for a definite decision
if an element is signed or not. The following
solution for this problem has been
developed. Every element before the end of
SOAP header (only there backward
references are possible) that contains an ID
attribute is regarded as potentially signed
and therefore the signed block processing
is started. At theend of such a block, the ID
and the result of the signed block processing
(i.e., the digest of this block) are stored.
When processing a signature, the included
references are compared to the IDs stored
from the potentially signed blocks and the
stored digest is verified by comparison to
the one inside the signature element.
Encrypted Key Processing Automaton
Fig. 5 shows the automaton for processing
an xenc:EncryptedKey element contained in
the WS-Security header.
The processing starts with reading the
encryption algorithm alg (1).Inside the
ds:KeyInfo element (2), a hint to the key
pair keypriv and keypub is given (see above)
The key keypriv is used for initializing the
decryption algorithm inside the function
initDecryptionalg (4). The function
decryptchar decrypts then the content of
the xenc:CipherData element using this
algorithm in conjunction with keypriv. The
result is the (symmetric) key key, that is
used later to decrypt encrypted content. The
references stated inside the
xenc:ReferenceList claim the usage of the
current key for those encrypted blocks.
Thus, the storeKey. . . function adds the
pair ref; key to EncKey (7) to enable the
decryption of the appropriate encrypted
block (see below). Additionally, the pair
ref;Enc is added to the end of the list of
security references Ref.
Signature Processing Automaton
Fig. 6 shows the automaton for processing a
ds:Signature element from the WS-Security
header. For verifying the signature value, the
ds:SignedInfo block must be canonicalized
and hashed. Thus, at the beginning of that
element, the canonicalization and hashing is
started by the function startHashing (1).
Page 153 of 165
The canonicalization algorithms for the
ds:SignedInfo block are read (2). The WS-I
Basic Security Profile includes only
Exclusive C14N [33] as canonicalization
algorithm. The signature algorithm (e.g.,
RSA with SHA-1) is read (3). The
reference ref is read from the URI attribute
of the element ds:Reference (4) The
transformation algorithms are read and the
set of transformations is stored into t (6, 7).
The hashing algorithm for the signed block
is read. The digest value is read (10) and the
function checkDigestchar; t; ref is
executed. . If there exists a D with ref;D 2
CompletedDigest, ref is a backward
reference and thus the referenced
CONCLUSION
The paper introduces a comprehensive
framework for event-based WS-Security
processing. Although the streaming
processing of XML is known and
understood for almost as long as the
existence of the XML standard itself, the
exploitation of this processing model in the
presence of XML security is not. Due to the
lack of algorithms for eventbased processing
of WS-Security, most SOAP frameworks
include the option for streaming-based
processing only for unsecured SOAP
messages. As soon as these messages are
secured by WS-Security mechanisms, the
security processing is performed relying on
the DOM tree of the secured message, hence
loosing all advantages of the event-based
processing model. With this contribution,
the implementation of SOAP message
processing can be realized in a streaming
manner including the processing of security
means, resulting in significant
improvements compared to traditional and
currently mainly deployed tree-based
approaches. The main advantages of the
streaming model include an increased
efficiency in terms of resource consumption
and an enhanced robustness against different
kinds of DoS attacks. This paper introduces
the concepts and algorithms for a
comprehensive stream-based WS-Security
component. By implementing configurable
chains of stream processing components, a
streaming WS-Security validator has been
developed, which verifies and decrypts
messages with signed and/or encrypted parts
against a security policy.
The solution can handle any order, number,
and nesting degree of signature and
encryption operations filling the gap in the
stream-based processing chain toward more
efficient and dependable web services.
REFERENCES
[1] T. Erl, Service-Oriented Architecture:
Concepts, Technology, and Design. Prentice
Hall, 2005.
[2] G. Alonso, F. Casati, H. Konu, and V.
Machiraju, Web Services. Springer, 2004.
[3] M.P. Papazoglou, Service-Oriented
Computing: Concepts, Characteristics and
Directions, Proc. Intl Conf. Web
Information Systems Eng., p. 3, 2003.
[4] M. Turner, D. Budgen, and P. Brereton,
Turning Software into a Service,
Computer, vol. 36, no. 10, pp. 38-44, 2003.
[5] R. Buyya, C.S. Yeo, and S. Venugopal,
Market-Oriented Cloud Computing:
Vision, Hype, and Reality for Delivering IT
Services as Computing Utilities, Proc. 10th
IEEE Intl Conf. High Performance
Computing and Comm., pp. 5-13, 2008.
[6] M. Govindaraju, A. Slominski, K. Chiu,
P. Liu, R. van Engelen, and M.J. Lewis,
Toward Characterizing the Performance of
SOAP Toolkits, Proc. Fifth IEEE/ACM
Intl Workshop Grid Computing (GRID
04), pp. 365-372, 2004.
[7] H. Liu, S. Pallickara, and G. Fox,
Performance of Web Services Security,
Proc. 13th Ann. Mardi Gras Conf., Feb.
2005.
Page 154 of 165
Study on Web service Implementation in eclipse using apache CXF on JBoss
Platform; Towards Service Oriented Architecture Principles
M.Sanjay (Author)
Dep. Of Computer Science Engineering,
Bharath University,
Chennai 73 , India
S.Sivasubramanian M.Tech (Ph.D)
Asst. Professor, Dept. of CSE
Bharath University
Chennai 73 , India
Sivamdu2010@gmail.com
Abstract Webservice with SOA architecture is the mostly
used word in the IT industry. SOA is a powerful distributed
computing system provides the logic of divide the business
processes into services with the intention of promoting the
reusability. It converts the business process into loosely
coupled services to integrate the Enterprise applications to
reduce the IT Burden and increase the ROI. SOA platform is
more popular and widely used in the distributed systems with
the challenge of setting up the environment and integration,
this study has been carried out to setting up the webservice
environment and creation of the services using SOA
architecture. SOA Principles and the Open Software Tools
are used in this study.
Keywords- Service Oriented Architecture (SOA);
Webservices , Jboss , CXF
Introduction (Heading 1)
Service Oriented Architecture (SOA) & Webservices are the
emerging technology used in the IT industry. This study talks
about the Service Oriented Architecture , Web service with
SOA and the creation of webservice using CXF. Service
Oriented Architecture is an architectural model that enhance
the agility , cost effectives and reduce the IT burden. SOA
supports the service oriented computing. Service is an unit of
solution. Service serves as an Individual component and
achieves the stategic goals of an organization by reusing the
service. In SOA services are Created ,Executed and Evaluated.
The services are classified into Business services, Application
services and Infrastructure services. Services are aggregated
and achieved the business process. These services are exposed
as a webservice to access it from anywhere. The meta
information about the services are documented as WSDL
definition , which is nothing but the XML schema. In
Webservies the service functions are refered as a service
operations. This webservice can be designed using the Open
Software tool called Apache CXF. CXF is Celtix and XFire
communities. CXF has much friendlier experience and easy to
integrate with the spring framework.
.
SERVICE ENTITIEIS
Services are aggregated and given as a single interface called
coarser-grained. This uses the late binding because the
customer does not know the location of the service until the
runtime. The consumer knows the service details after
referring the registry during the runtime. Service consumer is
an application which sends the request the service with the
specified webservice contract input. Service provider is the
service which accepts the service request and executes.
Service provider publish the service contract in the registry to
access the consumer. After the execution, the service provider
sends the response to the service consumer according to the
contract output. Service registry is a place where all the
services are registered, so that the consumer can go and check
the available services in an organization. It provides
reusability. The tools are provided to enable the services to be
modeled , created and stored. Programmers are given access to
find out the services and also given the alert if any changes
happens in the registry. Service contract specifies the request
format and the response format with the preconditions and the
post conditions. The amount of time the service takes to
execute the method also specified as Quality of Service. The
service lease is used for the number of years the consumer can
use the service. Once the lease is over the consumer must
request a new lease in the service registry.Orchestration is
linking the services to achieve the business processes. This
allows the processes to be declared in business flow. It has the
ability to define and model the processes and analyze them to
understand the impacts of any changes. This supports for
monitoring and management capabilities at the process
level.The life cycle of SOA is Gather the requirements ,
Construct , test , Integrate the people process Information and
Manage the application services. The Elements of SOA is
Service , Provider , Requester and Directory. In this model
the service provider publishes the service based on the WSDL
contract in the Service Registry. The service consumers
discovers the service using the end point URL with the request
SOAP message. Messaging enables the service to
communicate and interact with multiple platform , it is
connection independent and it has the intelligent routing
capability.
Page 155 of 165
<?xml version="1.0" encoding="UTF-8"?>
<wsdl:definitions name="CXFWSDL"
targetNamespace="http://www.example.org/
CXFWSDL/"
xmlns:wsdl="http://schemas.xmlsoap.org/w
sdl/"
xmlns:tns="http://www.example.org/CXFWSD
L/"
xmlns:xsd="http://www.w3.org/2001/XMLSch
ema"
xmlns:soap="http://schemas.xmlsoap.org/w
sdl/soap/">
<wsdl:types>
<xsd:schema
targetNamespace="http://www.example.org/
CXFWSDL/">
<xsd:element name="NewOperation">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="in"
type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element
name="NewOperationResponse">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="out"
type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

S u
S
L C
8 S
u
A
S
1 C
S
u

WebServices
Webservices is a software program which is identified by the
Uniform Resource Locator. The interfaces and the bindings
are defined using XML. Its definition can be discovered by
other systems. Those systems interact with the webservices in
the specified definition using SOAP. The characteristic of
webservices are , it is platform , location, implemention and
format independence. This has three things. Discovery is the
one which search where the service is. Desciption is the one
which says how the service should be used and Messaing is the
one which says the communication. Webservice is using UDDI
for the discovery , WSDL is used for the Description and
SOAP is used for the messaging to communicate. Simple
Object Access Protocol is a specification describing how to
provide parameter to a service and how to receive the response
result from the service. This specify the information in a
XML Schema. Webservice Description Langualge is a
standard way to specify the service. The operations of the
services are provided in the WSDL and also what are the
arguments needed for the operation to execute. Address with
the port number is specified to locate the service. UDDI is
Universal Description Discovery and Integration is a platform
framework for describing the services , discovering the
business and integrating the business services. It stores the
information about the services. It is actually a directory of
webservices described by the WSDL. To implement the
webservice concept , the environment needs to be set up.
Once the environment is ready , the services can be written and
published in the server. The services can be developed by
using Top down or Bottom up approach. In our study the Top
down approach is followed. Once the service is published the
service can be accessed from the different systems. In market
different models are used to create the webservices. One of
them is CXF. In our study we are using CXF.
WEB SERVICE ENVIRONMENT CONFIGURATION
For creating the webservice the environment needs to be set up.
Following are the softwares are used for setting up the
environment.
Application Server : jboss 5.1.0 GA
IDE : Eclipse jee helios SR2 win 32
Webservice Model : apache-cxf-2.3.4
Server with CXF Integration : jbossws-cxf-3.4.0.GA
Build tool : apache-ant-1.8.2
Testing tool: SOAP UI 3.1.6

S
C
S

S
u
WSuL
L
WSuL
A SCA

S
Page 156 of 165
STEP 1:
USER ENVIRONMENT VARIABLE SETTINGS.
Go to
Windows -> Properties -> Advanced -> Environment Variables
Set the following environment variables
CXF-HOME
ANT-HOME
JAVA-HOME
STEP 2:
CONFIGURING JBOSSWS-CXF-3.4.0.GA
Change the ant.properteis.examples into ant.properties in
jbossws-cxf-3.4.0.GA folder and edit it in line number 6.
jboss510.home = Path where the jboss is installed
Run the ant command by using the following command
ant deploy jboss 510
Check if it is successfully build
STEP 3:
CONFIGURING ECLIPSE IDE
Setting up the CXF path
Go to Windows -> preferences -> webservice -> cxf2.x
preference
Give the root directory of CXF
STEP 4: CONFIGURING THE JBOSS SERVER
Go to
Windows -> preferences -> webservice -> server and runtime
Select jboss V5.0 server runtime
Select Apache CXF2.X for webservice run time

WEB SERVICE CREATION
The precondition is JBOSS server should be up and running.
Go to File -> new -> Dyanamic web project
Select Apache CXF2.x
Copy the WSDL under the webcontent folder.
Right click on the wsdl -> new -> other -> webservice
Click next -> next -> Finish.
Rename beans.xml created in project->webcontent->web-inf as
beans-delta.xml.
Change the value of parameter contextConfigLocation to
WEB-INF/beans-delta.xml in web.xml at project->webcontent-
>web-inf.
.ear file will be created and deployed in the JBOSS server.
Page 157 of 165
To check the .ear is deployed properly or not , use the
following steps.
Go to InterfaceImpl.java class and take the wsdl path and
run it in the internet explorer.
It should show the wsdl in the internet explorer.
webservice Access
Go to SOAP UI and create a new project by selecting
the above WSDL
Right click and create a new request.
Replace the ? with the proper value.
Run the request with the below address.
Check the server log about the status.
7. Conclusion
This study talks about setting up the webservice
environment , develop the code and deploy the war file in
the application server. There are lots of needs to be done in
the webservice configurations like ESB configurations and
JMS configurations etc.
Future work
Configuring the ESB environment and implement the
concept of ESB in JBOSS to achieve the concept of Routing
and transformation along with the security. Also the
messaging concept JMS . These would be the future work
for this case study.
8. References
WSDL -
http://oreilly.com/catalog/webservess/chapter/ch06.html
Page 158 of 165
Implementation of cryonics in Nano Technology
K.nandhini
III ECE
University college of engineering Arni
Knandhini68@gmail.com
A.Thenmathi
III ECE
University college of engineering Arni
tmathi1992@gmail.com
Abstract:

Today technology plays a vital
role in every aspect of life.
Increasing standards in technology
in many fields , has taken man
today to high esteem. But the
present available technologies are
unable to interact with the atoms,
such a minute particles. Hence
Nanotechnology has been developing.
Nanotechnology is nothing but a
technology which uses atoms with a
view to creating a desired product.
It has wider applications in all the
fields. The important application is
Cryonics.. Cryonics is nothing but an
attempt of raising the dead - making
them alive. First we preserve the
body then by using molecular
machines based nanotechnology we
could revive the patients by
repairing damaged cells.
In this technical paper we
would like to discuss cryonics, how
the process of cryonics goes on and
why nanotechnology is being used
and description of molecular
machines which has the capability
of repairing damaged cells.
Therefore Cryonics is an area in which
most of the work is to be done in
future.
Introduction:
Today technology plays a
vital role in every aspect of life.
Increasing standards in technology
in many fields particularly in
medicine, has taken man today to
high esteem. Nanotechnology is a
new technology that is knocking at
the doors. This technology uses
atoms with a view to creating a
desired product. The term
nanotechnology has been a
combination of two terms,nanoand
technology. The term nano is
derived from a Greek word nanos
which means dwarf. Thus
nanotechnology is dwarf technology.
A nanometer is one billionth of a
metre.
Our President A.P.J.Abdul
Kalam being a scientist made a
note about this technology that
nanotechnology would give us an
opportunity, if we take appropriate
and timely action to become one of
the important technological nations
in the world.
The main application of
nanotechnology is cryonics. Cryonics
is nothing but an attempt of raising
the dead. Cryonics is not a
widespread medical practice and
viewed with skepticism by most
scientists and doctors today.
History:
The first mention of
nanotechnology occurred in a talk
given by Richard Feynman in 1959,
entitled Theres plenty of Room at
the Bottom. Historically cryonics
began in 1962 with the publication
of The prospect of immortality
referred by Robert Ettinger, a
founder and the first president of
the cryonics institute. During 1980s
the extent of the damage from
Page 159 of 165
freezing process became much
clearer and better known, when the
emphasis of the movement began to
shift to the capabilities of
nanotechnology. Alcor Life
Extension Foundation currently
preserves about 70 human bodies
and heads in Scottsdale, Arizona and
the cryonics institute has about the
same number of cryonic patients in
its Clinton Township, Michigan
facility. There are no cryonics
service provided outside of the
U.S.A. also there are support groups
in Europe, Canada, Australia & U.K.
Four Generations :
Mihail Roco of the U.S.
National Nanotechnology Initiative has
described four generations of
nanotechnology development. The
current era, as Roco depicts it, is that
of passive nanostructures, materials
designed to perform one task. The
second phase, which we are just
entering, introduces active
nanostructures for multitasking; for
example, actuators, drug delivery
devices, and sensors. The third
generation is expected to begin
emerging around 2010 and will feature
nanosystems with thousands of
interacting components. A few years
after that, the first integrated
nanosystems, functioning much like a
mammalian cell with hierarchical
systems within systems, are expected
to be developed.

Cryonics:
The word "cryonics" is
the practice of freezing a dead body in
hopes of someday reviving it. A
Cryonics is the practice of cooling
people immediately after death to the
point where molecular physical decay
completely stops, in the expectation
that scientific and medical procedures
currently being developed will be able
to revive them and restore them to
good health later. A patient held in
such a state is said to be in 'cryonic
suspension. Cryonics is the practice of
cry preserving humans and pets (who
have recently become legally dead)
until the cry preservation damage can
be reversed and the cause of the fatal
disease can be cured (including the
disease known as aging). However,
there is a high representation of
scientists among cryonicists. Support
for cryonics is based on controversial
projections of future technologies and
of their ability to enable molecular-
level repair of tissues and organs
Cryonics patient prepares for the
future:
Page 160 of 165
How an Alcor patient's body is frozen
and stored until medical technology
can repair the body and revive the
patient, or grow a new body for the
patient.
Patient declared legally dead
On way to Alcor in Arizona,
blood circulation is maintained and
patient is injectedwithmedicineto
minimize problems with frozen tissue.
Cooling of body begun. (If body needs
to be flown, blood is replaced with
organ preservatives.)
At Alcor the body is cooled to 5
degrees
Chest opened, blood is replaced with a
solution (glycerol, water, other
chemicals) that enters the tissues,
pushing out water to reduce ice
formation. In 2 to 4 hours, 60% or
more of body water is replaced by
glycerol.
Freezing the body
The patient is placed in cold
silicone oil, chilling the body to 79C.
Then it's moved to an aluminum pod
and slowly cooled over 5 days in liquid
nitrogen to -196C (minus 320
Fahrenheit), then stored.
Actual process starts:
After preserving the body for some
days, they will start the surgery. As a
part of it, they will apply some
chemicals like glycerol and some
advanced chemicals to activate the
cells of the body. By doing so, 0.2% of
the cells in the body will be activated.
After that they will preserve the body
for future applications. The crayonists
strongly believe that future medicines
in 21
st
century will be useful to rapidly
increase those cells that will help to
retrieve the dead person back.
Storage vessel

Stainless-steel vats formed into a
large thermos-bottle-like container.
Vat for up to four bodies weighs about
a ton; stands 9 feet tall.
Transtime "recommends" that people
provide a minimum of $150,000 for
whole-body suspension. Part of this
sum pays for the initial costs of the
suspension. The balance is placed in a
trust fund, with the income used to pay
the continued cost of maintaining you
in suspension. Transtime can do
neurosuspensions but does not promote
the option. Transtime also charges a
Page 161 of 165
yearly fee of $96 for membership, with
the price halved to $48 for other family
members.
The Cryonics Institute in Clinton
Township, Michigan, charges $28,000
for a full-body suspension, along with
a one-time payment of $1,250. The
Cryonics Institute does not do
neurosuspension.
About 90 people in the United Stated
are already in suspension, with
hundreds more signed on for the
service. Probably the most famous
cryopreserved patient is Ted
WilliamsA cryopreserved person is
sometimes whimsically called a
corpsicle (a portmanteau of "corpse"
and "popsicle"). This term was first
used by science fiction author Larry
Niven, who credits its formulation to
Obstacles to success.
Revival process:
Critics have often quipped
that it is easier to revive a corpse than a
cryonically frozen body. Many
cryonicists might actually agree with
this, provided that the "corpse" were
fresh, but they would argue that such a
"corpse" may actually be biologically
alive, under optimal conditions. A
declaration of legal death does not
mean that life has suddenly ended
death is a gradual process, not a
sudden event. Rather, legal death is a
declaration by medical personnel that
there is nothing more they can do to
save the patient. But if the body is
clearly biologically dead, having been
sitting at room temperature for a period
of time, or having been traditionally
embalmed, then cryonicists would hold
that such a body is far less revivable
than a cryonically preserved patient,
because any process of resuscitation
will depend on the quality of the
structural and molecular preservation
of the brain.
Financial Issues:
Cryopreservation
arrangements can be expensive,
currently ranging from $28,000 at the
Cryonics Institute to $150,000 at Alcor
and the American Cryonics Society.
The biggest drawback to current
vitrification practice is a costs issue.
Because the most cost-effective means
of storing a cryopreserved person is in
liquid nitrogen, fracturing of the brain
occurs, a result of thermal stresses that
develop when cooling from 130C to
196C (the temperature of liquid
nitrogen). actually quite affordable for
the vast majority of those in the
industrialized world who really make
arrangements while still young.
Court Rules against Keeping:
The Conseil d'Etat ruled
cryonics - stopping physical decay
after death in the hope of future revival
- is illegal.
The court said relatives have two
choices over what to do with dead
bodies - burial or cremation. It said
relatives can scatter ashes after
cremation, but they have to bury
bodies in a cemetery or in a tomb on
private property after gaining special
permissionant it, especially if they
make arrangements while still young.
Why only nanotechnology is used in
cryonics ?
Biological molecules and
systems have a number of attributes
that make them highly suitable for
nanotechnology applications. Remote
control of DNA has proved that
electronics can interact with biology.
Gap between electronics and
biology is now closing.
The key to cryonics' eventual
success is nanotechnology,
manipulating materials on an atomic or
molecular scale, according to most
techies who are interested in cryonic
suspension. "Current medical science
does not have the tools to fix damage
that occurs at the cellular and
molecular level, and damage to these
systems is the cause of vast majority of
fatal illnesses. Nanotechnology is
Page 162 of 165
the ultimate miniaturization can
achieve. A nanometer is equivalent
to the width of six bonded carbon
items. A DNA molecule is 2.5nm
wide. Cryonics basically deals
with cells, these cells are in the
order of nanometers. At present
there is no other technology which
deals with such minute cells. Only
nanotechnology can have the ability
to deal with cells. Normally fatal
accidents could be walked away
from, thanks to range of safety
devices possible only with
nanotechnology.
Viruses, prions, parasites and
bacteria continue to mutate and
produce new diseases. Our natural
immune system may, or may not,
handle. In theory, a nano cell
sentinel could make our body
immune to any present or future
infectious disease.
Fracturing is a special
concern for new vitrification
protocol brought online by Alcor
for neuro patients. If advanced
nanotechnology is available for
patient recovery, then fracturing
probably causes little information
loss. Fracturing commits cryopatient
to the need for molecular repair at
cryogenic temperature a highly
specialized and advanced form of
nanotechnology. Whereas unfractured
patients may be able to benefit
sooner from simple forms of
nanotechnology developed for
more main stream medical
applications. Damaged caused by
freezing & fracturing is thought to
be potentially repairable in future
using nanotechnology which will
enable manipulation of matter at the
molecular level.
How nanotechnology is used in
cryonics?
MOLECULAR MACHINES
could revive patients by repairing
damaged cells but for making those
cell repair machines, we first need
to build a molecular assembler.
It is quite possible to
adequately model the behaviour of
molecular machines that satisfy two
constraints.
They are built from parts
that are so stable that small
errors in the empirical force
fields dont affect the shape
or stability of the parts.
The synthesis of parts is
done by using positionally
controlled reactions, where
the actual chemical reactions
involve a relatively small
number of atoms.
Drexlers assembler can be
built with these constraints.
Assembler made using current
methods:
The fundamental purpose of
an assembler is to position
atoms. Robotic arms are other
positioning devices are basically
mechanical in nature, and will
allow us to position molecular
parts during the assembly
process. Molecular mechanics
provides us with an excellent
tool for modeling the behaviour
of such devices. The second
requirement is the ability to
make and break bonds at
specific sites. While molecular
mechanics provides an excellent
tool for telling us where the tip
of the assembler arm is located,
current force fields are not
adequate to model the specific
chemical reactions that must
then take place at the tip/work
piece interface involved in building
an atomically precise part. For this
higher order ab initio calculations
are sufficient
The methods of computational
chemistry available today allow us
Page 163 of 165
to model a wide range of molecular
machines with an accuracy
sufficiently in many cases to
determine how well they will work.
Computational nano
technology includes not only the
tools and techniques required to
model the proposed molecular
machines it must also includes the
tools required to specify such
machine. Molecular machine
proposal that would require million
or even billions of atoms have been
made. The total atom count of an
assembler might be roughly a
billion atoms. while commercially
available molecular modeling
packages provide facilities to
specify arbitary structures it is
usually necessary to point and
click for each atom involved. This
is obviously unattractive for a
device as complex as an assembler
with its roughly one billion atoms.
The software required
to design and model complex
molecular machine is either already
available or can be readily develop
over the next few years. The
molecular compiler and other
molecular CAD tools needed for
this work can be implemented
using generally understood
techniques and methods from
computer science. Using this
approach it will be possible to
substantially reduce the
development time for complex
molecular machines, including
Drexlers assemblers.
FUTURE
ENHANCEMENTS:
1.with the knowledge of cryonics
cryonists are preserving the brains
of humans.we know that each
person alive today was once a
single cell,and a complete human
being can be grown in the natural
state.Thus they believe that genetic
programming of a single cell on the
surface of that brain begins a
process of growth and development
that perhaps appends to the brain a
complete young adult body.
Conclusion:
With the implementation of
Cryonics we can get back the
life.But Cryonics is a area
inwhich most of the work is to be
done in future and till now mainly
the concept of this area has been
proposed.So the Scientists are not
making long promises for the
future of this CryonicS
References:
1. Platzer, W. "The Iceman - 'Man
from the Hauslabjoch'." Universitt
Innsbruck. 12 November 2002
http://info.uibk.ac.at/c/c5/c552/For
schung/Iceman/iceman-en.html
2. "Cryonics." Merriam-Webster's
Collegiate Dictionary. 10th ed.
2001.
3. Iserson, K.V. Death To Dust:
What Happens To Dead Bodies?
2nd ed. Tucson: Galen Press, 2001.
4. Iserson, K.V. "RE: Cryonics
article." E-mail to the author. 11
November 2002.
5. "Frequently Asked Questions."
Alcor Life Extension Foundation.
12 November 2002
http://www.alcor.org/FAQs/index.
htm
6. Olsen, C.B. "A Possible Cure for
Death." Medical Hypotheses 26
(1988): 77-88
Page 164 of 165
Page 165 of 165

You might also like