Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
12 Dec 2019
Time Series Clustering
Yolande, Elena, & Beth
AGENDA 01
Distance Measures 02
03Prototypes
07
Data Preprocessing 04
Background
Clustering Algorithms 05
Cluster Evaluation 06
Remarks & Conclusions
BACKGROUND
What is time series?
What is time series clustering and its applications?
01
Background
• High dimensionality
• Irregular lengths
• Noise and time shifts
time (s)
variable
A time series is a collection of observations made sequentially in time.
Time Series Inputs
time (s)
Time Series Clustering Applications
Customer Segmentation Chicken Segmentation
Clustering
Algorithm
Distance
Measure
Prototype
N clusters
Time Series
Data
𝑑(𝑥, 𝑐)
𝑝(𝑥)
𝑥
DISTANCE MEASURES
Euclidean
Manhattan
Correlation-Based
Dynamic Time Warping (DTW)
Others: Canberra
Binary
Minkowski
Maximum Norm
Hamming Distance
02
1. Euclidean distance
n = number of dimensions
xi and yi = the i-th attribute (component) of x and y
𝑑(𝑥, 𝑦) =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖 2 𝑥
𝑦
2. Manhattan distance
𝑑 𝑚𝑎𝑛(𝑥, 𝑦) =
𝑖=1
𝑛
| (𝑥𝑖 − 𝑦𝑖) |
Also known as Manhattan length, rectilinear distance,
Minkowski's L1 distance, L1 norm, taxi cab metric, snake
distance, city block distance
3. Correlation-Based Distances
 Pearson
 Spearman
 Kendall
4. Dynamic Time Warping (DTW)
Time Series 2
TimeSeries1
Distance Matrix
1. Create matrix
2. Create the alignment
4. Dynamic Time Warping (DTW)
1. Create matrix 2. Create the alignment
Difference between DTW and Euclidean
Which Distance Measure to Use
• Type of the data
• Research questions
Criteria Euclidean DTW
Supports Time Series length differences No Yes
Supports Time Series time shifts No Yes
Computational costs Low High
Clustering
Algorithm
Distance
Measure
Prototype
N clusters
Time Series
Data
𝑑(𝑥, 𝑐)
𝑝(𝑥)
𝑥
PROTOTYPES
Mean - Median
Partition Around Medoids ( PAM) Methods
DTW Barycenter Averaging
Shape Extraction
03
1.Mean 2.Median
𝜇𝑖
𝑣
=
1
𝑁
𝑥 𝑐,𝑖
𝑣
, ∀𝑐 ∈ 𝐶 𝑚𝑖
𝑣
=
𝑥 𝑐,(𝑛−1)/2
𝑣
− 𝑥 𝑐,(𝑛+1)/2
𝑣
2
(n+1)/2
(n-1)/2
3.Partition Around Medoids (PAM)
𝑓 𝑥, 𝑐 =
𝑖=1
𝑁
𝑑𝑖𝑠𝑡(𝑥𝑖 − 𝑐𝑖)
4.Dynamic Time Warping Barycenter Averaging (DBA)
l1
l2
l3
P
l4
l5
l1 l2 l3 l4 l5
P D1 D2 D3 D4 D5
𝐷𝐵𝐴 = 𝑎𝑟𝑔 𝑚𝑖𝑛
𝑖=1
𝑁
𝐷𝑇𝑊(𝑃, 𝑙𝑖)
5.Shape Extraction
𝑆𝐸 𝑋, 𝑣𝑖 =
𝑥 𝑘∈𝑋 𝑖
∆(𝑥 𝑘 − 𝑣𝑖)2
Clustering
Algorithm
Distance
Measure
Prototype
N clusters
Time Series
Data
𝑑(𝑥, 𝑐)
𝑝(𝑥)
𝑥
Normalized
DATA PREPROCESSING
Dimensionality Reduction
Normalization
04
Aggregating
time (s)
variablevariable
time (s)
Downsampling
time (s)
variable
time (s)
variable
Normalization
𝜇 = 0
σ = 1
Input Output
Clustering
Algorithm
Distance
Measure
Prototype
N clusters
Time Series
Data
𝑑(𝑥, 𝑐)
𝑝(𝑥)
𝑥
Normalized
CLUSTERING ALGORITHMS
Partitional Clustering
Hierarchical Clustering
05
Clustering
Clustering algorithm Distance measure Prototype
Partitional
K – means / K – medoid Euclidean / Manhattan Mean / PAM
TAD Pole DTW DBA
K – shape SBD Shape Extraction
Hierarchical Agglomerative All All
Clustering AlgorithmDistance
Measure
Prototype
N clusters
Time Series
Data
Assign clusters
𝑐1
𝑐2
𝑥1
𝑑 𝟏𝟏 < 𝑑 𝟏2
Random centroid
initialization
k
𝑥 = 𝑝𝑜𝑖𝑛𝑡𝑠 , 𝑐 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠,
𝑑 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒
𝑚𝑖𝑛 𝑥,𝑐 𝑑(𝑥, 𝑐)
𝑑(𝑥, 𝑐)
Partitional Clustering
Assign clusters
𝑐1 𝑥1
𝑥2
𝑥 = 𝑝𝑜𝑖𝑛𝑡𝑠 , 𝑐 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠, 𝑑 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒
𝑚𝑖𝑛 𝑥,𝑐 𝑑(𝑥, 𝑐)
Random centroid
initialization
k 𝑑(𝑥, 𝑐)
Partitional Clustering
Adjust
centroids
Assign clusters
𝑐1 𝑥1
𝑥2
𝑥 = 𝑝𝑜𝑖𝑛𝑡𝑠 , 𝑐 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠, 𝑑 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒, 𝑝 = 𝑝𝑟𝑜𝑡𝑜𝑡𝑦𝑝𝑒
𝑚𝑖𝑛 𝑥,𝑐 𝑑(𝑥, 𝑐)
Random centroid
initialization
k 𝑑(𝑥, 𝑐) 𝑝(𝑥)
𝑐𝑖 = 𝑝(𝑥)
𝑐1
𝑐2
𝑛 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
Partitional Clustering
Hierarchical Clustering
• Agglomerative bottom – up
• Divisive top – down
Dendrogram
Hierarchical Clustering
PattySelmaMargeLisaEdna
Hierarchical Clustering
1 2 3
Partitional Clustering vs Hierarchical Clustering
Partitional Clustering Hierarchical Clustering
Visualization
Cluster number predefined predefined threshold
Computational
cost
low high
k = 3
CLUSTER EVALUATION
Internal metric
 Silhouette
 Davies – Bouldin (DB)
External metric
 Soft Rand
 Jaccard Coefficient
06
Silhouette Index
𝑆𝐼 =
1
𝑁
𝑖=1
𝑘
𝑥 ∈𝐶 𝑖
𝑏 𝑥, 𝐶𝑖 − 𝑎(𝑥, 𝐶𝑖)
max(𝑎 𝑥, 𝐶𝑖 , 𝑏 𝑥, 𝐶𝑖 )
𝑎 𝑥, 𝐶𝑖 =
1
𝑁𝐶 𝑖 𝑦∈𝐶 𝑖
𝑑𝑖𝑠𝑡(𝑥, 𝑦) b 𝑥, 𝐶𝑖 = 𝑚𝑖𝑛
1
𝑁 𝐶 𝑖
𝑦∈𝐶𝑖
𝑑𝑖𝑠𝑡(𝑥, 𝑦)
𝑐1
𝑥1
Intra – cluster distance Inter – cluster distance
𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
Davies-Bouldin (DB)
𝐷𝐵 =
1
𝑘
𝑖=1
𝑘
min
𝑖≠𝑗
𝛼𝑖 + 𝛼𝑗
𝑑𝑖𝑠𝑡( 𝐶𝑖, 𝐶𝑗)
𝑐1
𝑐2
𝛼𝑖, 𝛼𝑗 = 𝑖𝑛𝑡𝑟𝑎 − 𝑐𝑙𝑢𝑠𝑒𝑟 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑐𝑖, 𝑐𝑗 = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑
Soft Rand
𝑅𝐼 =
𝑎 + 𝑏
𝑎 + 𝑏 + 𝑐 + 𝑑
Clustering Ground Truth
a b c d RI
9 3 2 2 0,75
Jaccard Coefficient
Clustering Ground Truth
a b c J
9 3 2 0,643
J=
𝑎
𝑎+𝑏+𝑐
REMARKS
Elbow method
Silhouette index
GAP Statistics
Discrete Fourier Transform (DFT)
07
CONCLUSIONS
What is the take away?
07
 DTW
 Right combination of distance measure & prototype
Conclusions
Clustering algorithm Distance measure Prototype
Partitional
K – means / K – medoid Euclidean / Manhattan Mean / PAM
TAD Pole DTW DBA
K – shape SBD Shape Extraction
Hierarchical Agglomerative All All
Time series clustering presentation

More Related Content

What's hot

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
A Note on Leapfrog Integration
A Note on Leapfrog IntegrationA Note on Leapfrog Integration
A Note on Leapfrog Integration
Kai Xu
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
Jungwon Kim
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical Imaging
Sanghoon Hong
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Hakky St
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
Dmytro Fishman
 
Classification
ClassificationClassification
Classification
CloudxLab
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
Mustafa Yagmur
 
Machine learning
Machine learningMachine learning
Machine learning
Amit Kumar Rathi
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
Dong Heon Cho
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi
 
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
Hamidreza Mahdavipanah
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 

What's hot (20)

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
A Note on Leapfrog Integration
A Note on Leapfrog IntegrationA Note on Leapfrog Integration
A Note on Leapfrog Integration
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical Imaging
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
Classification
ClassificationClassification
Classification
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Machine learning
Machine learningMachine learning
Machine learning
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 

Similar to Time series clustering presentation

Clustering
ClusteringClustering
Clustering
Smrutiranjan Sahu
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
taeseon ryu
 
Lect4
Lect4Lect4
Lect4
sumit621
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
Adam Fausett
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
Hsing-chuan Hsieh
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
Ruta Kambli
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
Datadog
 
08 clustering
08 clustering08 clustering
Pattern Mining in large time series databases
Pattern Mining in large time series databasesPattern Mining in large time series databases
Pattern Mining in large time series databases
Jitesh Khandelwal
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
ishmecse13
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
Datadog
 
2017 09-29 ndt loop closure
2017 09-29 ndt loop closure2017 09-29 ndt loop closure
2017 09-29 ndt loop closure
iMorpheus ai
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
EuroIoTa
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
WooSung Choi
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 

Similar to Time series clustering presentation (20)

Clustering
ClusteringClustering
Clustering
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Lect4
Lect4Lect4
Lect4
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
08 clustering
08 clustering08 clustering
08 clustering
 
Pattern Mining in large time series databases
Pattern Mining in large time series databasesPattern Mining in large time series databases
Pattern Mining in large time series databases
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
2017 09-29 ndt loop closure
2017 09-29 ndt loop closure2017 09-29 ndt loop closure
2017 09-29 ndt loop closure
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 

Recently uploaded

ChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptxChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptx
duduphc
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
DALubis
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
AltanAtabarut
 
BGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for SummerBGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for Summer
Stanislava Tropcheva
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
AltanAtabarut
 
Databricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptxDatabricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptx
dewsharon760
 
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
da42ki0
 
KeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZ
KeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZKeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZ
KeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZ
jp3113ig
 
004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf
DaraputriOktiara
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
bkldehligame1
 
Toward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive ComputingToward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive Computing
Larry Smarr
 
Why You Need Real-Time Data to Compete in E-Commerce
Why You Need  Real-Time Data to Compete in  E-CommerceWhy You Need  Real-Time Data to Compete in  E-Commerce
Why You Need Real-Time Data to Compete in E-Commerce
PromptCloud
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Integrated Optical Fiber/Wireless Systemsfor Environmental MonitoringIntegrated Optical Fiber/Wireless Systemsfor Environmental Monitoring
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Larry Smarr
 
INTRODUCTION TO BIG DATA ANALYTICS.pptx
INTRODUCTION TO  BIG DATA ANALYTICS.pptxINTRODUCTION TO  BIG DATA ANALYTICS.pptx
INTRODUCTION TO BIG DATA ANALYTICS.pptx
Preethi G
 
Graph Machine Learning - Past, Present, and Future -
Graph Machine Learning - Past, Present, and Future -Graph Machine Learning - Past, Present, and Future -
Graph Machine Learning - Past, Present, and Future -
kashipong
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
femim26318
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
Indian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptxIndian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptx
Nikita Gaikwad
 

Recently uploaded (20)

ChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptxChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptx
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
 
BGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for SummerBGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for Summer
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
 
Databricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptxDatabricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptx
 
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
 
KeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZ
KeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZKeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZ
KeynoteUploadJRP ABCDEFGHIJKLMNOPQRSTUVWXYZ
 
004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
 
Toward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive ComputingToward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive Computing
 
Why You Need Real-Time Data to Compete in E-Commerce
Why You Need  Real-Time Data to Compete in  E-CommerceWhy You Need  Real-Time Data to Compete in  E-Commerce
Why You Need Real-Time Data to Compete in E-Commerce
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Integrated Optical Fiber/Wireless Systemsfor Environmental MonitoringIntegrated Optical Fiber/Wireless Systemsfor Environmental Monitoring
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
 
INTRODUCTION TO BIG DATA ANALYTICS.pptx
INTRODUCTION TO  BIG DATA ANALYTICS.pptxINTRODUCTION TO  BIG DATA ANALYTICS.pptx
INTRODUCTION TO BIG DATA ANALYTICS.pptx
 
Graph Machine Learning - Past, Present, and Future -
Graph Machine Learning - Past, Present, and Future -Graph Machine Learning - Past, Present, and Future -
Graph Machine Learning - Past, Present, and Future -
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
Indian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptxIndian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptx
 

Time series clustering presentation

Editor's Notes

  1. Provide quantification for the dissimilarity between two time-series The classification of objects, into clusters, requires some methods for measuring the distance or the (dis)similarity between the objects The term proximity is used to refer to either similarity or dissimilarity. Frequently, the term distance is used as a synonym for dissimilarity.
  2. Variable for Recent years have seen a surge of interest in time series clustering. Data characteristics are evolving and traditional clustering algorithms are becoming less popular in time series clustering. The most commonly used distance measures are only defined for series of equal length and are sensitive to noise, scale and time shifts Thus, many other distance measures tailored to time-series have been developed in order to overcome these limitations; other challenges associated with the structure of time-series, such as multiple variables, serial correlation each
  3. Goal is to put them all together in clusters
  4. Input in customer segmentation Mention about chicken segmentation Behavior based on purchases, bank transactions, energy, other utilities usage/consumption, social networks – who is connected to who
  5. Hierarchy of classes  dendrogram
  6. Provide quantification for the dissimilarity between two time-series The classification of objects, into clusters, requires some methods for measuring the distance or the (dis)similarity between the objects The term proximity is used to refer to either similarity or dissimilarity. Frequently, the term distance is used as a synonym for dissimilarity.
  7. https://en.wikipedia.org/wiki/Taxicab_geometry The distance between two points measured along axes at right angles. Also known as Manhattan length, rectilinear distance, Minkowski's L1 distance, L1 norm, taxi cab metric, snake distance, city block distance
  8. Correlation measures are only useful if/when the relationship between attributes is linear. So if the correlation is 0, then there is no linear relationship between the two data objects. http://cs.tsu.edu/ghemri/CS497/ClassNotes/ML/Similarity%20Measures.pdf Be ready to explain pearson and spearman
  9. When time series have different lengths One of the most used measure of the similarity between two time series Originally designed to treat automatic speech recognition Optimal global alignment between two time series, exploiting temporal distortions between them Designed especially for time series analysis Ignore shifts in time dimension Ignore speeds of two time series How is it calculated?
  10. When time series have different lengths One of the most used measure of the similarity between two time series Originally designed to treat automatic speech recognition Optimal global alignment between two time series, exploiting temporal distortions between them Designed especially for time series analysis Ignore shifts in time dimension Ignore speeds of two time series How is it calculated?
  11. https://www.datanovia.com/en/lessons/clustering-distance-measures/ For example, correlation-based distance is often used in gene expression data analysis. Correlation-based distance considers two objects to be similar if their features are highly correlated, even though the observed values may be far apart in terms of Euclidean distance.  For most clustering package, Euclidean is default. If we want to identify clusters of observations with the same overall profiles regardless of their magnitudes, then correlation-based distance If correlation, Pearson’s correlation is quite sensitive to outliers Commonly used in gene expression data analysis marketing, if we want to identify group of shoppers with the same preference in term of items, regardless of the volume of items they bought.
  12. Hierarchy of classes  dendrogram
  13. Gamma is the optimization function. A is the alignment function
  14. Hierarchy of classes  dendrogram
  15. Hierarchy of classes  dendrogram
  16. Clusters are defines beforehand
  17. Compute distance between point and centroids and keep the minimum Predict For each data point calculate the distance from both centroids and the data point is assigned to the cluster with the min distance Move centroids in the point where the is the mean distance so that they are in the center of the cluster
  18. Compute distance between point and centroids and keep the minimum Predict For each data point calculate the distance from both centroids and the data point is assigned to the cluster with the min distance Move centroids in the point where the is the mean distance so that they are in the center of the cluster
  19. Compute distance between point and centroids and keep the minimum Predict For each data point calculate the distance from both centroids and the data point is assigned to the cluster with the min distance Move centroids in the point where the is the mean distance so that they are in the center of the cluster
  20. Hierarchy of classes  dendrogram
  21. Each character has each one cluster Input = genetic code Selma + Patty  twins Lisa + Merge  mother and daughter (less similarity because the share genetic code with Homer Simpson) Selma + patty sisters of Marge Number of clusters and order of clustering
  22. A: number of time series assigned to same cluster and belong to the same class B: number of time series assigned to different cluster and belong to the different class C: number of time series assigned to different cluster and belong to the same class D: number of time series assigned to same cluster and belong to the different class