Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
A Distributed Architecture for Web Browsing
Troubleshooting
Relatore
Ch.mo prof. Antonio Pescapé
Tesi di Laurea Magistrale
Anno Accademico 2011-2012
Candidato
Salvatore Balzano
Matr. M63/220
Correlatore
Ch.mo prof. Ernst Biersack
Context and Contribution
Context
• Analysis of Network Problems
• Web Browsing Quality of Experience
Measurement
Contribution
• Design, implementation and development of
a Dynamic Distributed Architecture for Web
Browsing Troubleshooting
QoE of Web Browsing
Objective and subjective measure of customer's experiences with
interactive services that are based on the HTTP protocol and are
accessed via web browsers.
Web Browsing QoE gives a real idea of user-perceived web performance
QoS is a set of technologies for
managing network traffic in a cost effective
manner to enhance user experiences for
home and enterprise environment
QoE system will try to measure metrics
that user will directly perceive as a quality
parameter
QoE can be described as an extension of
the traditional QoS
“...the longer users have to wait for the
web page to arrive (or transactions to
complete), the more they tend to become
dissatisfied with the service...”
QoE is related but differs from QoS
Troubleshooting User's
Network connections
Tools for web pages debugging or monitoring
such as Firebug and Google Chrome Developer
Tools:
1. Inspect HTML and modify style and layout
2. Accurately analyze network usage and performance
3. Visualize CSS metrics
4. Javascript Debugger
Tools to measure web browsing performance with
page properties (e.g. number of objects):
Different tools and methodologies of troubleshooting user's network
connections have been proposed in the literature. They can be
classified into three different categories:
Tools for Network Troubleshooting based on TCP
packet transmitted analysis:
1. Root cause diagnosis for TCP throughput limitations of
long connections
1. Can include also user participation during web page
browsing
2. Allow us to investigate about user satisfaction
Limitations
Tools for web page debugging or monitoring lack a suitable
diagnosis system for network troubleshooting and they also
introduce a significant execution overhead
Most of the tools for network
troubleshooting exploit one single
measurement point
Web connections are often quite short
in terms of the number of packets
transmitted
Tools are NOT able to handle a end-
users dynamic network (e.g. End user
leaving during the experiment)
3. TCP packets limitation
2. Single measurement point
4. Static environment
1. Lack of systematic troubleshooting models
5. Central database to diagnose problems
Real-Time Network Diagnosis
Architecture
integration of mechanisms
Extensive network measurement and
analysis based on quantitative metrics
Real-Time knowledge sharing in order to
troubleshoot bad performance experiences
Integration
of the
methodologies in a
single architecture
Complementarity
of previous
mechanisms
Intensive web pages browsing from
different clients in different locations
Network parameters local storage thanks
to a Firefox plugin
Real-Time updating database
Computing quantitative metrics from
passively measured data
Dynamic Distributed K-means Clustering
through P2P communication
Public relay-server to forward messages
to and from peers behind NAT
Analysis and Troubleshooting
Dynamic Distributed
K-Means Clustering Algorithm
Algorithm for K-Means Clustering on data distributed over a P2P network
Dynamic - It can easily adapt to a dynamic P2P network where existing nodes can drop out and
new nodes can join in during the execution of the algorithm
Distributed - It takes a completely decentralized approach where peers (nodes) only synchronize
with their neighbors in the underlying communication network.
Light - Compared to the Centralized K-Means Clustering algorithm, it alleviates traffic
pressure on communication channel
Smart - If data sources are distributed over a large-scale P2P network, collecting the data at
central location before clustering is not an attractive and pratical option
Distributed Approach
K-means Clustering partitions a collection of
data tuples (X) into K disjoint, exhaustive
groups (clusters), where K is a user-specified
parameter.
The goal is to find the clustering which
minimizes the sum of the distances between
each data tuple and the centroid of the cluster
to which it is assigned.
Thanks to this algorithm, we address the
problem of carrying out K-means clustering
when X is distributed over a large P2P
network
- K-Means Clustering Overview -
Real World Deployments
and Experimentations
Four peers in four different locations
EURECOM - Eurecom Institute (Sophia
Antipolis, France)
HOME 1 - Private Flat (Nice, France)
HOME 2 - Student Residence (Sophia
Antipolis, France)
NAPOLI - University “Federico II” of Naples
URLs Browsing List
We wish to include popular web sites that
most users visit.
Web user activity occur on sites that are
not as universally known, but rather reflect
individual tastes.
Random word Google and Bing searching
to better identify the real user browsing.
Browsing web sites whose server are far
from our territory is an interesting topic.
Browsing from completely different locations allow us to compare the
performances of the access to the same web page in order to identify the
influence of specific area problems:
System Evaluation
Is a measure of similarity between two
vectors of N dimensions by finding the
cosine of the angle between them.
In order to evaluate the clustering
results, we compute the cosine
similarity between each sample
belonging to the final cluster
membership and the global cluster
centroids.
1. Cosine Similarity 2. Threshold and
Convergence
Threshold is user-specified
parameters and it affects the accuracy
of the clustering. Bigger threshold gets
more precise results.
Threshold variation will influence the
iterations number needed by the
algorithm to converge
Experimental Results (1/3)
Local Network Problems
Google works very hard to keep page
download times as low as possible
by decreasing the number of
elements of its web pages and by
placing servers close to the users.
Clustering results show that HOME2
has high TCP and HTTP delays
especially in the first slot time.
Google case
Public shared wireless connection is
frequently overloaded during the
night and apparently it is the only
main reason for bad performances.
Experimental Results (2/3)
Non-Local Problems
‘Baidu’ web page case
All peers suffer the same problems
and they need at least 10 seconds to
reach the load event.
Around 50% of HTTP and TCP
delays are larger than 1 second.
‘Server-side’ delay plays a critical
role in the end-users performance of
content delivery.
DNS Delays HTTP Delays TCP Delays
Experimental Results (3/3)
DNS Resolver Problems
‘Sina’ web page case
HOME2 has much larger TCP and
HTTP delays compared to the others
peers.
Bad performance does NOT change
during the night.
HOME2’s samples are isolated.
Around 70% of samples from NAP, EUR
and HOME1 are lower than 100 ms.
This dissimilarity can be motivated by use
of CDN server.
Conclusions and Future Works
Conclusions
The distributed k-means clustering
architecture implemented can deal
with dynamic behaviors of end-
users
Theoretically the architecture will
totally get rid of the central
machine, since P2P system does
not need any central point
A manual inspection of the results
by an experienced user is still
needed
Future Works
Because of limitation of testing
environment, tests with more than
four nodes have not been done.
Tests in larger network should be
done in order to enhance the
robustness of the system.
Up to now, the execution of the
Python scripts make the CPU
usage high. It requires efforts to
examine the whole system.
User-friendly web interface to
analyze the results is necessary..

More Related Content

presentation_SB_v01

  • 1. A Distributed Architecture for Web Browsing Troubleshooting Relatore Ch.mo prof. Antonio Pescapé Tesi di Laurea Magistrale Anno Accademico 2011-2012 Candidato Salvatore Balzano Matr. M63/220 Correlatore Ch.mo prof. Ernst Biersack
  • 2. Context and Contribution Context • Analysis of Network Problems • Web Browsing Quality of Experience Measurement Contribution • Design, implementation and development of a Dynamic Distributed Architecture for Web Browsing Troubleshooting
  • 3. QoE of Web Browsing Objective and subjective measure of customer's experiences with interactive services that are based on the HTTP protocol and are accessed via web browsers. Web Browsing QoE gives a real idea of user-perceived web performance QoS is a set of technologies for managing network traffic in a cost effective manner to enhance user experiences for home and enterprise environment QoE system will try to measure metrics that user will directly perceive as a quality parameter QoE can be described as an extension of the traditional QoS “...the longer users have to wait for the web page to arrive (or transactions to complete), the more they tend to become dissatisfied with the service...” QoE is related but differs from QoS
  • 4. Troubleshooting User's Network connections Tools for web pages debugging or monitoring such as Firebug and Google Chrome Developer Tools: 1. Inspect HTML and modify style and layout 2. Accurately analyze network usage and performance 3. Visualize CSS metrics 4. Javascript Debugger Tools to measure web browsing performance with page properties (e.g. number of objects): Different tools and methodologies of troubleshooting user's network connections have been proposed in the literature. They can be classified into three different categories: Tools for Network Troubleshooting based on TCP packet transmitted analysis: 1. Root cause diagnosis for TCP throughput limitations of long connections 1. Can include also user participation during web page browsing 2. Allow us to investigate about user satisfaction
  • 5. Limitations Tools for web page debugging or monitoring lack a suitable diagnosis system for network troubleshooting and they also introduce a significant execution overhead Most of the tools for network troubleshooting exploit one single measurement point Web connections are often quite short in terms of the number of packets transmitted Tools are NOT able to handle a end- users dynamic network (e.g. End user leaving during the experiment) 3. TCP packets limitation 2. Single measurement point 4. Static environment 1. Lack of systematic troubleshooting models 5. Central database to diagnose problems
  • 6. Real-Time Network Diagnosis Architecture integration of mechanisms Extensive network measurement and analysis based on quantitative metrics Real-Time knowledge sharing in order to troubleshoot bad performance experiences Integration of the methodologies in a single architecture Complementarity of previous mechanisms Intensive web pages browsing from different clients in different locations Network parameters local storage thanks to a Firefox plugin Real-Time updating database Computing quantitative metrics from passively measured data Dynamic Distributed K-means Clustering through P2P communication Public relay-server to forward messages to and from peers behind NAT Analysis and Troubleshooting
  • 7. Dynamic Distributed K-Means Clustering Algorithm Algorithm for K-Means Clustering on data distributed over a P2P network Dynamic - It can easily adapt to a dynamic P2P network where existing nodes can drop out and new nodes can join in during the execution of the algorithm Distributed - It takes a completely decentralized approach where peers (nodes) only synchronize with their neighbors in the underlying communication network. Light - Compared to the Centralized K-Means Clustering algorithm, it alleviates traffic pressure on communication channel Smart - If data sources are distributed over a large-scale P2P network, collecting the data at central location before clustering is not an attractive and pratical option Distributed Approach K-means Clustering partitions a collection of data tuples (X) into K disjoint, exhaustive groups (clusters), where K is a user-specified parameter. The goal is to find the clustering which minimizes the sum of the distances between each data tuple and the centroid of the cluster to which it is assigned. Thanks to this algorithm, we address the problem of carrying out K-means clustering when X is distributed over a large P2P network - K-Means Clustering Overview -
  • 8. Real World Deployments and Experimentations Four peers in four different locations EURECOM - Eurecom Institute (Sophia Antipolis, France) HOME 1 - Private Flat (Nice, France) HOME 2 - Student Residence (Sophia Antipolis, France) NAPOLI - University “Federico II” of Naples URLs Browsing List We wish to include popular web sites that most users visit. Web user activity occur on sites that are not as universally known, but rather reflect individual tastes. Random word Google and Bing searching to better identify the real user browsing. Browsing web sites whose server are far from our territory is an interesting topic. Browsing from completely different locations allow us to compare the performances of the access to the same web page in order to identify the influence of specific area problems:
  • 9. System Evaluation Is a measure of similarity between two vectors of N dimensions by finding the cosine of the angle between them. In order to evaluate the clustering results, we compute the cosine similarity between each sample belonging to the final cluster membership and the global cluster centroids. 1. Cosine Similarity 2. Threshold and Convergence Threshold is user-specified parameters and it affects the accuracy of the clustering. Bigger threshold gets more precise results. Threshold variation will influence the iterations number needed by the algorithm to converge
  • 10. Experimental Results (1/3) Local Network Problems Google works very hard to keep page download times as low as possible by decreasing the number of elements of its web pages and by placing servers close to the users. Clustering results show that HOME2 has high TCP and HTTP delays especially in the first slot time. Google case Public shared wireless connection is frequently overloaded during the night and apparently it is the only main reason for bad performances.
  • 11. Experimental Results (2/3) Non-Local Problems ‘Baidu’ web page case All peers suffer the same problems and they need at least 10 seconds to reach the load event. Around 50% of HTTP and TCP delays are larger than 1 second. ‘Server-side’ delay plays a critical role in the end-users performance of content delivery. DNS Delays HTTP Delays TCP Delays
  • 12. Experimental Results (3/3) DNS Resolver Problems ‘Sina’ web page case HOME2 has much larger TCP and HTTP delays compared to the others peers. Bad performance does NOT change during the night. HOME2’s samples are isolated. Around 70% of samples from NAP, EUR and HOME1 are lower than 100 ms. This dissimilarity can be motivated by use of CDN server.
  • 13. Conclusions and Future Works Conclusions The distributed k-means clustering architecture implemented can deal with dynamic behaviors of end- users Theoretically the architecture will totally get rid of the central machine, since P2P system does not need any central point A manual inspection of the results by an experienced user is still needed Future Works Because of limitation of testing environment, tests with more than four nodes have not been done. Tests in larger network should be done in order to enhance the robustness of the system. Up to now, the execution of the Python scripts make the CPU usage high. It requires efforts to examine the whole system. User-friendly web interface to analyze the results is necessary..