Variable neighborhood Prediction of temporal collective profiles by Keun-Woo Lim, Telecom ParisTech
•
2 likes•1,017 views
Temporal collective profiles generated by mobile network users can be used to predict network usage, which in turn can be used to improve the performance of the network to meet user demands. This presentation will talk about a prediction method of temporal collective profiles which is suitable for online network management. Using weighted graph representation, the target sample is observed during a given period to determine a set of neighboring profiles that are considered to behave similarly enough. The prediction of the target profile is based on the weighted average of its neighbors, where the optimal number of neighbors are selected through a form of variable neighborhood search. This method is applied to two datasets, one provided by a mobile network service provider and the other from a Wi-Fi service provider. The proposed prediction method can conveniently characterize user behavior via graph representation, while outperforming existing prediction methods. Also, unlike existing methods that utilize categorization, it has a low computational complexity, which makes it suitable for online network analysis.
1 of 33
Download to read offline
More Related Content
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo Lim, Telecom ParisTech
1. VARIABLE NEIGHBORHOOD
PREDICTION OF TEMPORAL
COLLECTIVE PROFILES
Presentation for EuroIoTA ’16
Speaker: Keun-Woo Lim
Telecom Paristech
24-11-2016
2. Brief Overview
What do we do in this work?
Analysis of temporal collective profiles (time-series)
Use of mobile datasets (cellular, Wi-Fi)
Real–time & Lightweight prediction (online prediction)
What do we try to achieve?
Better prediction accuracy
Lower computational complexity
Better application & use case
5. Temporal collective profiles
Representation of data that aggregate behavior of
group of individuals – over time
Can be categorized in various ways
“Daily Profiles”
6. What are collected?
Basic telephone calls and SMS?
However, we want to focus on more specific matters
Specific application data
Usage of Internet service
7. Why do we analyze these data?
For “online network analysis”
Real-time prediction of the near-future
Recognition of sudden changes/outliers
Timely adaptation
Use cases
Resource allocation
Traffic handling
Social behavior
9. Dataset
Cellular mobile dataset
1-week data from 90 lacs in Paris
More than 500 daily profiles
Wi-Fi cloud dataset
122 days (March 1st to June 30th, 2014)
60 million URL connection logs
(Top 20 mobile applications)
11. What should we do with daily profiles?
Daily profiles can be:
Very similar to each other (same day, location, etc.)
Very different too (outlier, events)
We use methods to calculate similarity
Cluster similar profiles
Distinguish different profiles
12. Previous work (Offline analysis)1
Utilization of clustering methods (UPGMA)
With similarity comparison techniques (DTW, quantiles)
Not ideal in online data analysis
Clustering may take long time (𝑂(𝑀2 𝑁3)with DTW)
1K. Lim, S. Secci, L. Tabourier, B. Tebbani, “Characterizing and predicting mobile application usage,”
https://hal.archives-ouvertes.fr/hal-01345824/document
13. Profile similarity
We use two examples of similarity measures
(M values in a time-series)
Euclidean distance (ED) = Θ(M)
Dynamic time warping (DTW) = Θ(M2)
For specific dataset containing N profiles,
ED = Θ(N2M)
DTW = Θ(N2M2)
to compare all with each other
14. Weighted graph representation
Using similarity measures, we acquire a graph
structure of neighbors
E.g., if ED is used, lower value = more similar
15. Filtering paths
Filter neighbors with high distance
Depending on the value of α, the number of neighbors
change for all profiles
18. Principle of VNP
For a new day 𝑥 𝑛(𝑡), we configure
𝑡0 = 0, 𝑡1 = 0~24, 𝑡2 = 24 (hour)
Objective
Observation period = 𝑥 𝑛 𝑡0, 𝑡1
Create a temporal profile to predict 𝑥 𝑛 𝑡1, 𝑡2
Find 𝑥 from the observation period
The closest profile 𝑥, in 𝑥 𝑡0, 𝑡1 and 𝑥 𝑛 𝑡0, 𝑡1
19. Find the neighbors
Using closest neighbor 𝑥, we find the group of
neighbors 𝑁 𝑛 to be used for prediction
For any other profile y of the training set,
𝑦 ∈ 𝑁𝑛 𝑖𝑖𝑓
𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑦 𝑡0, 𝑡1 ≤ 𝑎 ∙ 𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑥 𝑡0, 𝑡1
20. Creating the prediction profile
Using 𝑁 𝑛, formulate the prediction
𝑥 𝑛 𝑡 =
σ 𝑦∈𝑁 𝑛
𝑠(𝑥 𝑛,𝑦)∙𝑦(𝑡)
σ 𝑦∈𝑁 𝑛
𝑠(𝑥 𝑛,𝑦)
Simply put, it is the weighted average over the profiles
of its neighborhood
21. Training Parameter 𝑎
𝑎 can be tuned to select the optimal number of
neighbors
Variable neighborhood search to find the 𝑎 that yields
the highest accuracy over time
E.g. 1.0 < 𝑎 < 10.0
Drawbacks
Increase in complexity (recalculate for each 𝑎)
22. Calculating multiple 𝑡1
For a more fine-grained prediction, multiple 𝑡1 can
be used in one day
Repetition of the VNP (e.g. per-hour analysis)
23. Handling Complexity - VNP
Computation of calculating neighborhood of target
day per 𝑎 :
ED = Θ(NM)
DTW = Θ(NM2)
Depending on N, this can be large in practice
Also, in case of multiple 𝑡1 analysis, large M can
also impact
24. Handling Complexity - Graph
Can be heavy
ED = Θ(N2M)
DTW = Θ(N2M2)
Luckily, graph representation is only updated once per day
Although, needed for various M in case of multiple 𝑡1 analysis
Also, space partitioning can be used to reduce time
Via Kd-tree
This can reduce complexity of ED to Θ(log(N)M)
26. Prediction accuracy analysis
Prediction through relative error, defined as
𝜀 =
σ 𝑡=𝑡1
𝑡2 𝑥 𝑛 𝑡 − 𝑥 𝑛 𝑡
2
σ 𝑡=𝑡1
𝑡2 𝑥 𝑛 𝑡
2
Comparison with closest neighbor ( 𝑎 =1), UPGMA
𝑡1 = 12
cellular data - ED cellular data - DTW
27. Effect of changing 𝑡1
Per-hour analysis
The length of observation period may also effect the performance
of prediction
cellular data - ED cellular data - DTW
28. Time consumption
The required time can be acceptable for both methods in a
per-hour analysis.
However, need caution for DTW when many profiles are used
cellular data - ED cellular data - DTW
29. Distribution of α
The distribution of optimal α is focused in range [1,2], allowing
us to easily limit the range of α
Distribution of neighbors is heterogeneous, but most are < 20
31. Conclusion & Future work
We have proposed a methodology for online
prediction of mobile time-series datasets
Acceptable time for our current dataset
Can be used for other time-series datasets in various
IoT environment
Further studies include
Testing in a bigger scale dataset