This document summarizes a thesis on developing a distributed architecture for web browsing troubleshooting. The architecture integrates network measurement and analysis techniques through real-time data sharing across peers. It applies a dynamic distributed K-means clustering algorithm to partition browsing data from different locations into clusters representing performance issues. Experiments using the architecture identified local and non-local network problems affecting different peers. Future work could include testing with more peers, optimizing resource usage, and developing a user-friendly interface.
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Report
Share
1 of 13
Download to read offline
More Related Content
presentation_SB_v01
1. A Distributed Architecture for Web Browsing
Troubleshooting
Relatore
Ch.mo prof. Antonio Pescapé
Tesi di Laurea Magistrale
Anno Accademico 2011-2012
Candidato
Salvatore Balzano
Matr. M63/220
Correlatore
Ch.mo prof. Ernst Biersack
2. Context and Contribution
Context
• Analysis of Network Problems
• Web Browsing Quality of Experience
Measurement
Contribution
• Design, implementation and development of
a Dynamic Distributed Architecture for Web
Browsing Troubleshooting
3. QoE of Web Browsing
Objective and subjective measure of customer's experiences with
interactive services that are based on the HTTP protocol and are
accessed via web browsers.
Web Browsing QoE gives a real idea of user-perceived web performance
QoS is a set of technologies for
managing network traffic in a cost effective
manner to enhance user experiences for
home and enterprise environment
QoE system will try to measure metrics
that user will directly perceive as a quality
parameter
QoE can be described as an extension of
the traditional QoS
“...the longer users have to wait for the
web page to arrive (or transactions to
complete), the more they tend to become
dissatisfied with the service...”
QoE is related but differs from QoS
4. Troubleshooting User's
Network connections
Tools for web pages debugging or monitoring
such as Firebug and Google Chrome Developer
Tools:
1. Inspect HTML and modify style and layout
2. Accurately analyze network usage and performance
3. Visualize CSS metrics
4. Javascript Debugger
Tools to measure web browsing performance with
page properties (e.g. number of objects):
Different tools and methodologies of troubleshooting user's network
connections have been proposed in the literature. They can be
classified into three different categories:
Tools for Network Troubleshooting based on TCP
packet transmitted analysis:
1. Root cause diagnosis for TCP throughput limitations of
long connections
1. Can include also user participation during web page
browsing
2. Allow us to investigate about user satisfaction
5. Limitations
Tools for web page debugging or monitoring lack a suitable
diagnosis system for network troubleshooting and they also
introduce a significant execution overhead
Most of the tools for network
troubleshooting exploit one single
measurement point
Web connections are often quite short
in terms of the number of packets
transmitted
Tools are NOT able to handle a end-
users dynamic network (e.g. End user
leaving during the experiment)
3. TCP packets limitation
2. Single measurement point
4. Static environment
1. Lack of systematic troubleshooting models
5. Central database to diagnose problems
6. Real-Time Network Diagnosis
Architecture
integration of mechanisms
Extensive network measurement and
analysis based on quantitative metrics
Real-Time knowledge sharing in order to
troubleshoot bad performance experiences
Integration
of the
methodologies in a
single architecture
Complementarity
of previous
mechanisms
Intensive web pages browsing from
different clients in different locations
Network parameters local storage thanks
to a Firefox plugin
Real-Time updating database
Computing quantitative metrics from
passively measured data
Dynamic Distributed K-means Clustering
through P2P communication
Public relay-server to forward messages
to and from peers behind NAT
Analysis and Troubleshooting
7. Dynamic Distributed
K-Means Clustering Algorithm
Algorithm for K-Means Clustering on data distributed over a P2P network
Dynamic - It can easily adapt to a dynamic P2P network where existing nodes can drop out and
new nodes can join in during the execution of the algorithm
Distributed - It takes a completely decentralized approach where peers (nodes) only synchronize
with their neighbors in the underlying communication network.
Light - Compared to the Centralized K-Means Clustering algorithm, it alleviates traffic
pressure on communication channel
Smart - If data sources are distributed over a large-scale P2P network, collecting the data at
central location before clustering is not an attractive and pratical option
Distributed Approach
K-means Clustering partitions a collection of
data tuples (X) into K disjoint, exhaustive
groups (clusters), where K is a user-specified
parameter.
The goal is to find the clustering which
minimizes the sum of the distances between
each data tuple and the centroid of the cluster
to which it is assigned.
Thanks to this algorithm, we address the
problem of carrying out K-means clustering
when X is distributed over a large P2P
network
- K-Means Clustering Overview -
8. Real World Deployments
and Experimentations
Four peers in four different locations
EURECOM - Eurecom Institute (Sophia
Antipolis, France)
HOME 1 - Private Flat (Nice, France)
HOME 2 - Student Residence (Sophia
Antipolis, France)
NAPOLI - University “Federico II” of Naples
URLs Browsing List
We wish to include popular web sites that
most users visit.
Web user activity occur on sites that are
not as universally known, but rather reflect
individual tastes.
Random word Google and Bing searching
to better identify the real user browsing.
Browsing web sites whose server are far
from our territory is an interesting topic.
Browsing from completely different locations allow us to compare the
performances of the access to the same web page in order to identify the
influence of specific area problems:
9. System Evaluation
Is a measure of similarity between two
vectors of N dimensions by finding the
cosine of the angle between them.
In order to evaluate the clustering
results, we compute the cosine
similarity between each sample
belonging to the final cluster
membership and the global cluster
centroids.
1. Cosine Similarity 2. Threshold and
Convergence
Threshold is user-specified
parameters and it affects the accuracy
of the clustering. Bigger threshold gets
more precise results.
Threshold variation will influence the
iterations number needed by the
algorithm to converge
10. Experimental Results (1/3)
Local Network Problems
Google works very hard to keep page
download times as low as possible
by decreasing the number of
elements of its web pages and by
placing servers close to the users.
Clustering results show that HOME2
has high TCP and HTTP delays
especially in the first slot time.
Google case
Public shared wireless connection is
frequently overloaded during the
night and apparently it is the only
main reason for bad performances.
11. Experimental Results (2/3)
Non-Local Problems
‘Baidu’ web page case
All peers suffer the same problems
and they need at least 10 seconds to
reach the load event.
Around 50% of HTTP and TCP
delays are larger than 1 second.
‘Server-side’ delay plays a critical
role in the end-users performance of
content delivery.
DNS Delays HTTP Delays TCP Delays
12. Experimental Results (3/3)
DNS Resolver Problems
‘Sina’ web page case
HOME2 has much larger TCP and
HTTP delays compared to the others
peers.
Bad performance does NOT change
during the night.
HOME2’s samples are isolated.
Around 70% of samples from NAP, EUR
and HOME1 are lower than 100 ms.
This dissimilarity can be motivated by use
of CDN server.
13. Conclusions and Future Works
Conclusions
The distributed k-means clustering
architecture implemented can deal
with dynamic behaviors of end-
users
Theoretically the architecture will
totally get rid of the central
machine, since P2P system does
not need any central point
A manual inspection of the results
by an experienced user is still
needed
Future Works
Because of limitation of testing
environment, tests with more than
four nodes have not been done.
Tests in larger network should be
done in order to enhance the
robustness of the system.
Up to now, the execution of the
Python scripts make the CPU
usage high. It requires efforts to
examine the whole system.
User-friendly web interface to
analyze the results is necessary..