Time series clustering presentation

12 Dec 2019
Time Series Clustering
Yolande, Elena, & Beth

AGENDA 01
Distance Measures 02
03Prototypes
07
Data Preprocessing 04
Background
Clustering Algorithms 05
Cluster Evaluation 06
Remarks & Conclusions

BACKGROUND
What is time series?
What is time series clustering and its applications?
01

Background
• High dimensionality
• Irregular lengths
• Noise and time shifts
time (s)
variable
A time series is a collection of observations made sequentially in time.

Time Series Clustering Applications
Customer Segmentation Chicken Segmentation

Clustering
Algorithm
Distance
Measure
Prototype
N clusters
Time Series
Data
𝑑(𝑥, 𝑐)
𝑝(𝑥)
𝑥

DISTANCE MEASURES
Euclidean
Manhattan
Correlation-Based
Dynamic Time Warping (DTW)
Others: Canberra
Binary
Minkowski
Maximum Norm
Hamming Distance
02

1. Euclidean distance
n = number of dimensions
xi and yi = the i-th attribute (component) of x and y
𝑑(𝑥, 𝑦) =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖 2 𝑥
𝑦

2. Manhattan distance
𝑑 𝑚𝑎𝑛(𝑥, 𝑦) =
𝑖=1
𝑛
| (𝑥𝑖 − 𝑦𝑖) |
Also known as Manhattan length, rectilinear distance,
Minkowski's L1 distance, L1 norm, taxi cab metric, snake
distance, city block distance

3. Correlation-Based Distances
 Pearson
 Spearman
 Kendall

4. Dynamic Time Warping (DTW)
Time Series 2
TimeSeries1
Distance Matrix
1. Create matrix
2. Create the alignment

4. Dynamic Time Warping (DTW)
1. Create matrix 2. Create the alignment

Difference between DTW and Euclidean

Which Distance Measure to Use
• Type of the data
• Research questions
Criteria Euclidean DTW
Supports Time Series length differences No Yes
Supports Time Series time shifts No Yes
Computational costs Low High

PROTOTYPES
Mean - Median
Partition Around Medoids ( PAM) Methods
DTW Barycenter Averaging
Shape Extraction
03

1.Mean 2.Median
𝜇𝑖
𝑣
=
1
𝑁
𝑥 𝑐,𝑖
𝑣
, ∀𝑐 ∈ 𝐶 𝑚𝑖
𝑣
=
𝑥 𝑐,(𝑛−1)/2
𝑣
− 𝑥 𝑐,(𝑛+1)/2
𝑣
2
(n+1)/2
(n-1)/2

3.Partition Around Medoids (PAM)
𝑓 𝑥, 𝑐 =
𝑖=1
𝑁
𝑑𝑖𝑠𝑡(𝑥𝑖 − 𝑐𝑖)

4.Dynamic Time Warping Barycenter Averaging (DBA)
l1
l2
l3
P
l4
l5
l1 l2 l3 l4 l5
P D1 D2 D3 D4 D5
𝐷𝐵𝐴 = 𝑎𝑟𝑔 𝑚𝑖𝑛
𝑖=1
𝑁
𝐷𝑇𝑊(𝑃, 𝑙𝑖)

5.Shape Extraction
𝑆𝐸 𝑋, 𝑣𝑖 =
𝑥 𝑘∈𝑋 𝑖
∆(𝑥 𝑘 − 𝑣𝑖)2

Clustering
Algorithm
Distance
Measure
Prototype
N clusters
Time Series
Data
𝑑(𝑥, 𝑐)
𝑝(𝑥)
𝑥
Normalized

DATA PREPROCESSING
Dimensionality Reduction
Normalization
04

Aggregating
time (s)
variablevariable
time (s)

Downsampling
time (s)
variable
time (s)
variable

Normalization
𝜇 = 0
σ = 1
Input Output

CLUSTERING ALGORITHMS
Partitional Clustering
Hierarchical Clustering
05

Clustering
Clustering algorithm Distance measure Prototype
Partitional
K – means / K – medoid Euclidean / Manhattan Mean / PAM
TAD Pole DTW DBA
K – shape SBD Shape Extraction
Hierarchical Agglomerative All All
Clustering AlgorithmDistance
Measure
Prototype
N clusters
Time Series
Data

Assign clusters
𝑐1
𝑐2
𝑥1
𝑑 𝟏𝟏 < 𝑑 𝟏2
Random centroid
initialization
k
𝑥 = 𝑝𝑜𝑖𝑛𝑡𝑠 , 𝑐 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠,
𝑑 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒
𝑚𝑖𝑛 𝑥,𝑐 𝑑(𝑥, 𝑐)
𝑑(𝑥, 𝑐)

Assign clusters
𝑐1 𝑥1
𝑥2
𝑥 = 𝑝𝑜𝑖𝑛𝑡𝑠 , 𝑐 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠, 𝑑 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒
Random centroid
initialization
k 𝑑(𝑥, 𝑐)

Adjust
centroids
Assign clusters
𝑐1 𝑥1
𝑥2
𝑥 = 𝑝𝑜𝑖𝑛𝑡𝑠 , 𝑐 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠, 𝑑 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒, 𝑝 = 𝑝𝑟𝑜𝑡𝑜𝑡𝑦𝑝𝑒
Random centroid
initialization
k 𝑑(𝑥, 𝑐) 𝑝(𝑥)
𝑐𝑖 = 𝑝(𝑥)
𝑐1
𝑐2
𝑛 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠

• Agglomerative bottom – up
• Divisive top – down

Dendrogram

PattySelmaMargeLisaEdna
1 2 3

Partitional Clustering vs Hierarchical Clustering
Partitional Clustering Hierarchical Clustering
Visualization
Cluster number predefined predefined threshold
Computational
cost
low high
k = 3

CLUSTER EVALUATION
Internal metric
 Silhouette
 Davies – Bouldin (DB)
External metric
 Soft Rand
 Jaccard Coefficient
06

Silhouette Index
𝑆𝐼 =
1
𝑁
𝑖=1
𝑘
𝑥 ∈𝐶 𝑖
𝑏 𝑥, 𝐶𝑖 − 𝑎(𝑥, 𝐶𝑖)
max(𝑎 𝑥, 𝐶𝑖 , 𝑏 𝑥, 𝐶𝑖 )
𝑎 𝑥, 𝐶𝑖 =
1
𝑁𝐶 𝑖 𝑦∈𝐶 𝑖
𝑑𝑖𝑠𝑡(𝑥, 𝑦) b 𝑥, 𝐶𝑖 = 𝑚𝑖𝑛
1
𝑁 𝐶 𝑖
𝑦∈𝐶𝑖
𝑑𝑖𝑠𝑡(𝑥, 𝑦)
𝑐1
𝑥1
Intra – cluster distance Inter – cluster distance
𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟

Davies-Bouldin (DB)
𝐷𝐵 =
1
𝑘
𝑖=1
𝑘
min
𝑖≠𝑗
𝛼𝑖 + 𝛼𝑗
𝑑𝑖𝑠𝑡( 𝐶𝑖, 𝐶𝑗)
𝑐1
𝑐2
𝛼𝑖, 𝛼𝑗 = 𝑖𝑛𝑡𝑟𝑎 − 𝑐𝑙𝑢𝑠𝑒𝑟 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑐𝑖, 𝑐𝑗 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑

Soft Rand
𝑅𝐼 =
𝑎 + 𝑏
𝑎 + 𝑏 + 𝑐 + 𝑑
Clustering Ground Truth
a b c d RI
9 3 2 2 0,75

Jaccard Coefficient
Clustering Ground Truth
a b c J
9 3 2 0,643
J=
𝑎
𝑎+𝑏+𝑐

REMARKS
Elbow method
Silhouette index
GAP Statistics
Discrete Fourier Transform (DFT)
07

CONCLUSIONS
What is the take away?
07

 DTW
 Right combination of distance measure & prototype
Conclusions
Clustering algorithm Distance measure Prototype
Partitional
K – means / K – medoid Euclidean / Manhattan Mean / PAM
TAD Pole DTW DBA
K – shape SBD Shape Extraction
Hierarchical Agglomerative All All

Time series clustering presentation

Time series clustering presentation

More Related Content

Time series clustering presentation

Editor's Notes