Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
TECHNIQUES OF CLUSTERING (a short review for students) Mikhail Alexandrov 1,2 , Pavel Makagonov 3 1  Autonomous University of Barcelona, Spain 2  Social Network Research Center with UCE, Slovakia 3  Mixtec Technological University, Mexico dyner1950@mail.ru, mpp@mixteco.utm.mx Petersburg 2008
Introduction   Definitions  Clustering  Discussion Open Problems CONTENTS 
Prof. Dr. Benno Stein  Dr. Sven Meyer zu Eissen  Weimar University,  Germany Ideas,  Materials,  Collaboration - Structuring and Indexing - AI search  in IR  - Semi-Supervised Learning Dr. Xiaojin Zhu University of Wisconsin, Madison ,  USA
TEXTUAL DATA Subject of Grouping NON TEXTUAL DATA Local Terminology It is not important what is the source of data: textual or non textual. Data :  work in the space of numerical   parameters   Texts :  work in the space of  words   Example: typical dialog between  passenger (  US   ) and railway  directory inquires (  DI  )
TEXTS  (‘indexed’) Presentation of Textual Data  Vector model (‘parameterized’) Local Terminology Indexed   texts  are only parameterized texts in the  space of words
TEXTS  (‘indexed’) Presentation of Textual Data  TEXTS (‘parameterized’)  Local Terminology  Indexed   texts  are only parameterized texts in the  space of themes   = >  Category /Context  Vector Models... Example :  manually parameterized dialogs in the  space of parameters  (transport service and passenger needs)
Introduction   Definitions   Clustering  Discussion Open Problems CONTENTS 
Unsupervised Learning Types  of  Grouping Supervised Learning Semi-Supervised Learning We know nothing about data sructure We know well  data sructure We know something about data sructure
Clustering  Classification Characteristics:  Characteristics : Absence of  patterns  or  descriptions   Presence of  patterns  or  descriptiones of classes, so the results  are  of classes, so the results are  defined by the  nature   of  defined by the  user   (  N>=1  ) the data themselves  (  N >1  ) Synonyms:  Synonyms : Classification   without teacher   Classificatio n   with teacher Unsupervised   learning   Supervised   learning Number of clusters   Specials terms :   [  ] is known   exactly   Categorization   (of documents) [x] is known  approximately   Diagnostics   (technics, medicine) [  ]  is  not   known  =>  searching  Recognition  (technics, science) Types  of  Grouping
“ Semi Clustering/Classification”  Classification Characteristics:  Characteristics : Presence of  limited number   Presence of  patterns  or of patterns, so the results  are  descriptiones   of classes, so the results defined both by the  user  or  are defined by the  user   (  N>=1  ) by  the  data  themselves  (  N >1  ) Synonyms:  Synonyms : Semi-Classification   Classificatio n   with teacher Semi Supervised   learning   Supervised   learning Number of clusters/categories  Specials terms :   [  ] is known   exactly   Categorization   (of documents) [x] is known  approximately  Diagnostics   (technics, medicine) [  ]  is  not   known  =>  searching  Recognition  (technics, science) Types  of  Grouping
Objectives of Grouping 1. Organization  (structuring) of an object set  Process is named   data structuring   2. Searching interesting patterns  Process is named   navigation 3. Grouping for other applications: -  Knowledge   discovery (clustering) -  Summarization   of documents   Note:  Do not mix  the   type   of grouping  and its  objective
Classification of methods Based on belonging to  cluster/category Exclusive methods Every object belongs only to one cluster/category. Methods are named  hard  grouping methods Non-exclusive methods Every object can belong to several clusters/categories. Methods are named  soft  grouping methods.  Based on data presentation Methods oriented on  free metric space Every object is presented as a point in a free space Methods oriented on graphs Every object is presented as an element on graph
Hard grouping   Hard   clustering Hard   categorization Soft grouping Soft   clustering Soft   categorization Example The distribution of letters of Moscovites to the Government is  soft categorization   (numbers in the table reflect the relative weight of each theme)  Fuzzy Grouping
Preprocessing  <= Processing General Scheme of Clustering Process - I Principal idea : To transform texts to  num erical  form in order to use  matematical   tools   Remember :   our  problem is grouping textual  documents but  not  undestanding Here: Both rude and good matrixes are matrix  Object/Attributes
General Scheme of Clustering Process - II Preprocessing  Processing  <= Here: matrix   Attribute/Attribute  can be used instead of matrix  Object/Object
Matrixes to be Considered
Clustering for Categorization Colour matrix “words-words” before clustering Matrix contains the value of word co-occurrences in texts.  Red :  if value more than some threshold. White :  if less.
Clustering for Categorizatión Colour matriz “words-words” after clustering Words are groupped. Cluster = >   Subdictionary Absence of blocks means absence  of  Subthemes
Importance of Preprocessing  ( it takes   60%-90% of efforts)
Introduction   Definitions  Clustering   Discussion Open Problems CONTENTS 
Definitions Def. 1  “Let us V be the set of objects. Clustering  C  = { Ci   |  Ci  є  V   }   of V  is  division  of V  on subsets,  for which we have :  U i Ci = V  and  Ci  ∩ Cj = 0  i  ≠j“ Def. 2  “Let us V be the set of nodes, E be arcs,  φ   is weight function that reflects the distance between objects, so we have a weighted graph  G  = { V,E,  φ  }.   In this case  C   is named as clustering of  G .” In the framework of the second definition every Ci  produced subgraph G(Ci).  Both subsets  Ci  and subgraphs  G(Ci)   are  named  clusters . Graph Set Clique
Definitions Principal note Both  definitions  SAYS NOTHING :  -  about quality of clusters  -  about  numbers of  clusters   Reason of difficulties Nowadays  there is no any general agreement   about any universal defintion of the term  ‘ cluster ’ What means that clustering  is good ? 1. Closeness between  objets  inside clusters  is  essentially more  than  the  closeness  between  clusters   themselves 2. Constructed  clusters correspond  to  intuitive presentations  of users  ( they are  natural   clusters)
Classification  of  methods 1.  Hierarchy based methods  Any neighbors  N =?   N is not given 2. Exemplar based methods  K-means N = ?   N  is given 3.   Density based methods MajorClust  N = ?   N is calculated  automatically Based on the way of grouping
Hierarchy based methods Neighbors. Every object is cluster General algorithm Initially every object is  one cluster The series of steps are performed. On every step the pair of cluster being the  closest ones  are merged.  At the end we have one cluster.
Hierarchy  based  methods Nearest neighbor  method   (NN)
Exemplar based methods K - means,  centroid
Method K-means General algorithm   Initially  K centers  are selected by any random way Series of steps are performed. On every step the objects are distributed between centers according the criterion of the  nearest   center . Then all centers are recalculated.  The end is fixed when the centers are not changed . Exemplar based methods
Method X-means (Dan Pelleg, Andrew Moor) Approach Using evaluation of object distribution  Selection of the most  likely points Advantage - More rapid - Number of cluster  is not fixed (in all cases it tends to be less)   Exemplar based methods
Density based methods MajorClust method Principal idea Total closeness to the objects of his own cluster  exceeds the closeness  to any other cluster Suboptimal solution Only part  of neighbors are considered on every step  (to save time, to avoid mergence) .
Density based methods MajorClust method General algorithm   Initially every object is  one cluster  and it joins to the nearest neighbor  Every object evaluates the total closeness to his own cluster and separately to all other clusters. After such evaluation the  objects change  its belonging and go off to the closest one  The end of searching is fixed when clusters do not  change . Preprocessing for MajorClust Many weak links  can be stronger than the several  strongest ones that disfigures  results.  So: weak links should be  eliminated before clustering
Cluster Validity Definition It reflects cluster  separability and formally depends on :  - Scatters inside clusters  - Separation between clusters Indexes It is formal characteristics of structure   Dunn  index Davies Bouldin   index   Hypervolume criterion ( Andre Hardy )  Density expected measure  DEM  ( Benno Stein ) Dunn   index (to be  max )
Cluster Validity Number of clusters Geometrical approach, two variants: Optimum  (min, max) of curve Jump  of curve Dunn  index (to be  max ) is too sensible to extremal cases
Cluster Usability Definition It reflects user’s opinion  and formally expresses the difference between : - Classes selected manually by a user - Clusters constructed by a given method  Cluster  F -measure  (  Benno Stein  ) Data Expert Method Here:  i, j  are indexes of clusses and clusters C * i   , C j  are classes and clusters  prec(i,j), rec(i,j)  are precision and recall
Validity  and  Usability Conclusion Density expected measure  corresponds to  F -measure   reflecting expert ’s opinion. So,  DEM  can be an indicator of expert  opinion
Tecnologies of  Clustering Meta methods They construct separated data sets using criteria of optimization and  limitations : Neither much nor small  number  of clusters Neither large nor small  size  of clusters  etc. Visual methods They present visual images to a user in order to select  manually  the clusters Using  different  methods Comparing   results
Meta Methods Algorithm (example)  Notations: N   is  the number of objects in a given cluster D   is the diagonal of a given cluster  Initially  N 0   and their   centers   Ci  are   given Steps 1. Method  K - medoid (or any other one) is performed 2. If  N   >  N max  or  D  >  D max   (in any cluster), then this cluster is divided on 2 parts. Go to p.1 3. If  N  <  N min   or  D   <  D min  (in any cluster), then  this and the closest clusters are joined. Go to p.1 4. When the number of iteration  I  >  I max , Stop Otherwise go to p.1
Visual Clustering Clustering on dendrite  Clustering in space of factors
Problem  Authorship of Molier dramatic   works (comedies, dramas,...). Corneille   and/or  Molier  ? Approach Style based indexing  (  NooJ   can  be  used ) Clustering all dramatic works Well-known dramatic works should be marked  Style - Formal  style  estimations  Informal  style  estimations  Formal style indicators - Text Complexity - Text Harmonicity Authorship References : Labbe C., Labbe D.  Inter-textual distance and authorship attribution Corneille and Molier.  Journ. of Quantitative Linguistics.  2001. Vol.8, N_3, pp.213-331
Clustering Authorship Results  1) 18 comedies of Molier  should be belonged to  Corneille   2) 15 comedies of Mollier  are weak connected with all his other works.  So, they can be written by  two authors   3) 2 comedies of Corneille now are considered as works of  Molier .  etc. Note : During a certain time Molier and Corneille were friends
Special and Universal packages with algorithms of  С lustering 1.  ClustAn  ( Scotland )  www. clustan.com   Clustan Graphics-7 (2006) 2.  MatLab  Descriptions are in Internet   3.  Statistica  Descriptions are in Internet   Learning Journals and  С ongresses about Clustering 1. Journal  “Journal of Classification”,   Springer   2. IFCS  -  International Federation of Classification Societies, Conferences 3. CSNA  -  Classification Society of North America, Seminars, Workshops
Introduction   Definitions  Clustering  Discussion Open Problems CONTENTS 
Certain Observations The numbers of methods for grouping data is a little bit more than the numbers of researchers working in this area.   Problem does not consist in searching the  best method  for all cases.  Problem consists in searching the  method being relevant  for your data.  Only you know what methods are the best for you own data . Principal problems consist in choice of indexes (parameters) and measure of closeness to be adecuate  to a given problem and given data  Frecuently the results are bad  because of the  bad indexes   and   bad measure   but not the  bad   method   !
Certain Observations Antipodal methods To be sure that results are really good and  do not depend on the method  used  one should test these results using any  antipodal  methods Solomon G, 1977: “The most antipodes are:   NN-method   and  K-means ” Sensibility To be sure that results  do not depend  essentially on the  method’s parameters   one should perform the analysis of sensibility by changing parameters  of  adjustment.
Introduction   Definitions  Clustering  Conclusions Open Problems CONTENTS 
Some Problems Question 1  How to reveal  alien   objects? Solution (idea) Revealing  a stable  structure on different sets of objects. They are subsets of a given set.   Object distribution reflects: real structure ( nature )  +  noise ( alien objects )
Some Problems Question 2  How to  accelerate  classification? Solution (idea) Filtering  objects, which give  a minimum contribution to  decisive function  Representative objects of each cluster
CONTACT  INFORMATION Mikhail Alexandrov 1,2 , Pavel Makagonov 3 1  Autonomous University of Barcelona, Spain 2  Social Network Research Center with UCE, Slovakia 3  Mixtec Technological University, Mexico dyner1950@mail.ru, mpp@mixteco.utm.mx Petersburg 2008

More Related Content

What's hot

K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
singh7599
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
Deepak George
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
Fellowship at Vodafone FutureLab
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
Megha Sharma
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
Akshay Dhole
 

What's hot (20)

K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
 

Similar to Clustering

Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
Houw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
Houw Liong The
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
mqasimsheikh5
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
sriharipatilin
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
engrasi
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
NaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
NaveenKumar5162
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
Aman Jatain
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Subrata Kumer Paul
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Salah Amean
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
JoonyoungJayGwak
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
IJRAT
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
VIKASGUPTA127897
 
47 292-298
47 292-29847 292-298
47 292-298
idescitation
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
vikassingh569137
 
My8clst
My8clstMy8clst
My8clst
ketan533
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
Amy Aung
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
SamPrem3
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
PalaniKumarR2
 

Similar to Clustering (20)

Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
47 292-298
47 292-29847 292-298
47 292-298
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
My8clst
My8clstMy8clst
My8clst
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 

More from NLPseminar

[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо
[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо
[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо
NLPseminar
 
Events
EventsEvents
Events
NLPseminar
 
клышинский
клышинскийклышинский
клышинский
NLPseminar
 
конф ии и ея гаврилова
конф ии и ея  гавриловаконф ии и ея  гаврилова
конф ии и ея гаврилова
NLPseminar
 
кудрявцев V3
кудрявцев V3кудрявцев V3
кудрявцев V3
NLPseminar
 
rubashkin
rubashkinrubashkin
rubashkin
NLPseminar
 
Vlasova
VlasovaVlasova
Vlasova
NLPseminar
 
Ageev
AgeevAgeev
Ageev
NLPseminar
 
Khomitsevich
Khomitsevich Khomitsevich
Khomitsevich
NLPseminar
 
акинина осмоловская
акинина осмоловскаяакинина осмоловская
акинина осмоловская
NLPseminar
 
Serebryakov
SerebryakovSerebryakov
Serebryakov
NLPseminar
 
потапов
потаповпотапов
потапов
NLPseminar
 
molchanov(promt)
molchanov(promt)molchanov(promt)
molchanov(promt)
NLPseminar
 
белканова
белкановабелканова
белканова
NLPseminar
 
Skatov
SkatovSkatov
Skatov
NLPseminar
 
гвоздикин
гвоздикингвоздикин
гвоздикин
NLPseminar
 
веселов
веселоввеселов
веселов
NLPseminar
 

More from NLPseminar (20)

[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо
[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо
[ИТ-лекторий ФКН ВШЭ]: Диалоговые системы. Татьяна Ландо
 
Events
EventsEvents
Events
 
Tomita
TomitaTomita
Tomita
 
бетин
бетинбетин
бетин
 
Andreev
AndreevAndreev
Andreev
 
клышинский
клышинскийклышинский
клышинский
 
конф ии и ея гаврилова
конф ии и ея  гавриловаконф ии и ея  гаврилова
конф ии и ея гаврилова
 
кудрявцев V3
кудрявцев V3кудрявцев V3
кудрявцев V3
 
rubashkin
rubashkinrubashkin
rubashkin
 
Vlasova
VlasovaVlasova
Vlasova
 
Ageev
AgeevAgeev
Ageev
 
Khomitsevich
Khomitsevich Khomitsevich
Khomitsevich
 
акинина осмоловская
акинина осмоловскаяакинина осмоловская
акинина осмоловская
 
Serebryakov
SerebryakovSerebryakov
Serebryakov
 
потапов
потаповпотапов
потапов
 
molchanov(promt)
molchanov(promt)molchanov(promt)
molchanov(promt)
 
белканова
белкановабелканова
белканова
 
Skatov
SkatovSkatov
Skatov
 
гвоздикин
гвоздикингвоздикин
гвоздикин
 
веселов
веселоввеселов
веселов
 

Recently uploaded

Brigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptxBrigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptx
aiofits06
 
How to install python packages from Pycharm
How to install python packages from PycharmHow to install python packages from Pycharm
How to install python packages from Pycharm
Celine George
 
How to Restrict Price Modification to Managers in Odoo 17 POS
How to Restrict Price Modification to Managers in Odoo 17 POSHow to Restrict Price Modification to Managers in Odoo 17 POS
How to Restrict Price Modification to Managers in Odoo 17 POS
Celine George
 
Tale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptx
Tale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptxTale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptx
Tale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptx
OH TEIK BIN
 
Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024
APEC Melmaruvathur
 
Personality Development , Dr. SAROJ KUMAR DATTA
Personality Development , Dr. SAROJ KUMAR DATTAPersonality Development , Dr. SAROJ KUMAR DATTA
Personality Development , Dr. SAROJ KUMAR DATTA
CallplanetsDeveloper
 
english 9 Quarter 1 Week 1 Modals and its Uses
english 9 Quarter 1 Week 1 Modals and its Usesenglish 9 Quarter 1 Week 1 Modals and its Uses
english 9 Quarter 1 Week 1 Modals and its Uses
EjNoveno
 
Angular Roadmap For Beginner PDF By ScholarHat.pdf
Angular Roadmap For Beginner PDF By ScholarHat.pdfAngular Roadmap For Beginner PDF By ScholarHat.pdf
Angular Roadmap For Beginner PDF By ScholarHat.pdf
Scholarhat
 
How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17
Celine George
 
Replacing the Whole Capitalist Stack.pdf
Replacing the Whole Capitalist Stack.pdfReplacing the Whole Capitalist Stack.pdf
Replacing the Whole Capitalist Stack.pdf
StefanMz
 
How to Configure Extra Steps During Checkout in Odoo 17 Website App
How to Configure Extra Steps During Checkout in Odoo 17 Website AppHow to Configure Extra Steps During Checkout in Odoo 17 Website App
How to Configure Extra Steps During Checkout in Odoo 17 Website App
Celine George
 
Odoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 SlidesOdoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 Slides
Celine George
 
Java Developer Roadmap PDF By ScholarHat
Java Developer Roadmap PDF By ScholarHatJava Developer Roadmap PDF By ScholarHat
Java Developer Roadmap PDF By ScholarHat
Scholarhat
 
Types of Diode and its working principle.pptx
Types of Diode and its working principle.pptxTypes of Diode and its working principle.pptx
Types of Diode and its working principle.pptx
nitugatkal
 
DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....
DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....
DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....
mariateresabadilla2
 
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docxQ1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
SANDRAMEMBRERE1
 
Powerpoint on Classroom Orientation2024-2025
Powerpoint on Classroom Orientation2024-2025Powerpoint on Classroom Orientation2024-2025
Powerpoint on Classroom Orientation2024-2025
MarynolMagbanuaJimer
 
SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
MarkKennethBellen1
 
How to Load Custom Field to POS in Odoo 17 - Odoo 17 Slides
How to Load Custom Field to POS in Odoo 17 - Odoo 17 SlidesHow to Load Custom Field to POS in Odoo 17 - Odoo 17 Slides
How to Load Custom Field to POS in Odoo 17 - Odoo 17 Slides
Celine George
 
Module 5 Bone, Joints & Muscle Injuries.ppt
Module 5 Bone, Joints & Muscle Injuries.pptModule 5 Bone, Joints & Muscle Injuries.ppt
Module 5 Bone, Joints & Muscle Injuries.ppt
KIPAIZAGABAWA1
 

Recently uploaded (20)

Brigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptxBrigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptx
 
How to install python packages from Pycharm
How to install python packages from PycharmHow to install python packages from Pycharm
How to install python packages from Pycharm
 
How to Restrict Price Modification to Managers in Odoo 17 POS
How to Restrict Price Modification to Managers in Odoo 17 POSHow to Restrict Price Modification to Managers in Odoo 17 POS
How to Restrict Price Modification to Managers in Odoo 17 POS
 
Tale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptx
Tale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptxTale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptx
Tale of a Scholar and a Boatman ~ A Story with Life Lessons (Eng. & Chi.).pptx
 
Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024
 
Personality Development , Dr. SAROJ KUMAR DATTA
Personality Development , Dr. SAROJ KUMAR DATTAPersonality Development , Dr. SAROJ KUMAR DATTA
Personality Development , Dr. SAROJ KUMAR DATTA
 
english 9 Quarter 1 Week 1 Modals and its Uses
english 9 Quarter 1 Week 1 Modals and its Usesenglish 9 Quarter 1 Week 1 Modals and its Uses
english 9 Quarter 1 Week 1 Modals and its Uses
 
Angular Roadmap For Beginner PDF By ScholarHat.pdf
Angular Roadmap For Beginner PDF By ScholarHat.pdfAngular Roadmap For Beginner PDF By ScholarHat.pdf
Angular Roadmap For Beginner PDF By ScholarHat.pdf
 
How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17
 
Replacing the Whole Capitalist Stack.pdf
Replacing the Whole Capitalist Stack.pdfReplacing the Whole Capitalist Stack.pdf
Replacing the Whole Capitalist Stack.pdf
 
How to Configure Extra Steps During Checkout in Odoo 17 Website App
How to Configure Extra Steps During Checkout in Odoo 17 Website AppHow to Configure Extra Steps During Checkout in Odoo 17 Website App
How to Configure Extra Steps During Checkout in Odoo 17 Website App
 
Odoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 SlidesOdoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 Slides
 
Java Developer Roadmap PDF By ScholarHat
Java Developer Roadmap PDF By ScholarHatJava Developer Roadmap PDF By ScholarHat
Java Developer Roadmap PDF By ScholarHat
 
Types of Diode and its working principle.pptx
Types of Diode and its working principle.pptxTypes of Diode and its working principle.pptx
Types of Diode and its working principle.pptx
 
DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....
DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....
DO5s2024-Orientation-Material.pptx. This is a presentation of DepEd Order No....
 
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docxQ1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
 
Powerpoint on Classroom Orientation2024-2025
Powerpoint on Classroom Orientation2024-2025Powerpoint on Classroom Orientation2024-2025
Powerpoint on Classroom Orientation2024-2025
 
SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
 
How to Load Custom Field to POS in Odoo 17 - Odoo 17 Slides
How to Load Custom Field to POS in Odoo 17 - Odoo 17 SlidesHow to Load Custom Field to POS in Odoo 17 - Odoo 17 Slides
How to Load Custom Field to POS in Odoo 17 - Odoo 17 Slides
 
Module 5 Bone, Joints & Muscle Injuries.ppt
Module 5 Bone, Joints & Muscle Injuries.pptModule 5 Bone, Joints & Muscle Injuries.ppt
Module 5 Bone, Joints & Muscle Injuries.ppt
 

Clustering

  • 1. TECHNIQUES OF CLUSTERING (a short review for students) Mikhail Alexandrov 1,2 , Pavel Makagonov 3 1 Autonomous University of Barcelona, Spain 2 Social Network Research Center with UCE, Slovakia 3 Mixtec Technological University, Mexico dyner1950@mail.ru, mpp@mixteco.utm.mx Petersburg 2008
  • 2. Introduction Definitions Clustering Discussion Open Problems CONTENTS 
  • 3. Prof. Dr. Benno Stein Dr. Sven Meyer zu Eissen Weimar University, Germany Ideas, Materials, Collaboration - Structuring and Indexing - AI search in IR - Semi-Supervised Learning Dr. Xiaojin Zhu University of Wisconsin, Madison , USA
  • 4. TEXTUAL DATA Subject of Grouping NON TEXTUAL DATA Local Terminology It is not important what is the source of data: textual or non textual. Data : work in the space of numerical parameters Texts : work in the space of words Example: typical dialog between passenger ( US ) and railway directory inquires ( DI )
  • 5. TEXTS (‘indexed’) Presentation of Textual Data Vector model (‘parameterized’) Local Terminology Indexed texts are only parameterized texts in the space of words
  • 6. TEXTS (‘indexed’) Presentation of Textual Data TEXTS (‘parameterized’) Local Terminology Indexed texts are only parameterized texts in the space of themes = > Category /Context Vector Models... Example : manually parameterized dialogs in the space of parameters (transport service and passenger needs)
  • 7. Introduction Definitions Clustering Discussion Open Problems CONTENTS 
  • 8. Unsupervised Learning Types of Grouping Supervised Learning Semi-Supervised Learning We know nothing about data sructure We know well data sructure We know something about data sructure
  • 9. Clustering Classification Characteristics: Characteristics : Absence of patterns or descriptions Presence of patterns or descriptiones of classes, so the results are of classes, so the results are defined by the nature of defined by the user ( N>=1 ) the data themselves ( N >1 ) Synonyms: Synonyms : Classification without teacher Classificatio n with teacher Unsupervised learning Supervised learning Number of clusters Specials terms : [ ] is known exactly Categorization (of documents) [x] is known approximately Diagnostics (technics, medicine) [ ] is not known => searching Recognition (technics, science) Types of Grouping
  • 10. “ Semi Clustering/Classification” Classification Characteristics: Characteristics : Presence of limited number Presence of patterns or of patterns, so the results are descriptiones of classes, so the results defined both by the user or are defined by the user ( N>=1 ) by the data themselves ( N >1 ) Synonyms: Synonyms : Semi-Classification Classificatio n with teacher Semi Supervised learning Supervised learning Number of clusters/categories Specials terms : [ ] is known exactly Categorization (of documents) [x] is known approximately Diagnostics (technics, medicine) [ ] is not known => searching Recognition (technics, science) Types of Grouping
  • 11. Objectives of Grouping 1. Organization (structuring) of an object set Process is named data structuring 2. Searching interesting patterns Process is named navigation 3. Grouping for other applications: - Knowledge discovery (clustering) - Summarization of documents Note: Do not mix the type of grouping and its objective
  • 12. Classification of methods Based on belonging to cluster/category Exclusive methods Every object belongs only to one cluster/category. Methods are named hard grouping methods Non-exclusive methods Every object can belong to several clusters/categories. Methods are named soft grouping methods. Based on data presentation Methods oriented on free metric space Every object is presented as a point in a free space Methods oriented on graphs Every object is presented as an element on graph
  • 13. Hard grouping Hard clustering Hard categorization Soft grouping Soft clustering Soft categorization Example The distribution of letters of Moscovites to the Government is soft categorization (numbers in the table reflect the relative weight of each theme) Fuzzy Grouping
  • 14. Preprocessing <= Processing General Scheme of Clustering Process - I Principal idea : To transform texts to num erical form in order to use matematical tools Remember : our problem is grouping textual documents but not undestanding Here: Both rude and good matrixes are matrix Object/Attributes
  • 15. General Scheme of Clustering Process - II Preprocessing Processing <= Here: matrix Attribute/Attribute can be used instead of matrix Object/Object
  • 16. Matrixes to be Considered
  • 17. Clustering for Categorization Colour matrix “words-words” before clustering Matrix contains the value of word co-occurrences in texts. Red : if value more than some threshold. White : if less.
  • 18. Clustering for Categorizatión Colour matriz “words-words” after clustering Words are groupped. Cluster = > Subdictionary Absence of blocks means absence of Subthemes
  • 19. Importance of Preprocessing ( it takes 60%-90% of efforts)
  • 20. Introduction Definitions Clustering Discussion Open Problems CONTENTS 
  • 21. Definitions Def. 1 “Let us V be the set of objects. Clustering C = { Ci | Ci є V } of V is division of V on subsets, for which we have : U i Ci = V and Ci ∩ Cj = 0 i ≠j“ Def. 2 “Let us V be the set of nodes, E be arcs, φ is weight function that reflects the distance between objects, so we have a weighted graph G = { V,E, φ }. In this case C is named as clustering of G .” In the framework of the second definition every Ci produced subgraph G(Ci). Both subsets Ci and subgraphs G(Ci) are named clusters . Graph Set Clique
  • 22. Definitions Principal note Both definitions SAYS NOTHING : - about quality of clusters - about numbers of clusters Reason of difficulties Nowadays there is no any general agreement about any universal defintion of the term ‘ cluster ’ What means that clustering is good ? 1. Closeness between objets inside clusters is essentially more than the closeness between clusters themselves 2. Constructed clusters correspond to intuitive presentations of users ( they are natural clusters)
  • 23. Classification of methods 1. Hierarchy based methods Any neighbors N =? N is not given 2. Exemplar based methods K-means N = ? N is given 3. Density based methods MajorClust N = ? N is calculated automatically Based on the way of grouping
  • 24. Hierarchy based methods Neighbors. Every object is cluster General algorithm Initially every object is one cluster The series of steps are performed. On every step the pair of cluster being the closest ones are merged. At the end we have one cluster.
  • 25. Hierarchy based methods Nearest neighbor method (NN)
  • 26. Exemplar based methods K - means, centroid
  • 27. Method K-means General algorithm Initially K centers are selected by any random way Series of steps are performed. On every step the objects are distributed between centers according the criterion of the nearest center . Then all centers are recalculated. The end is fixed when the centers are not changed . Exemplar based methods
  • 28. Method X-means (Dan Pelleg, Andrew Moor) Approach Using evaluation of object distribution Selection of the most likely points Advantage - More rapid - Number of cluster is not fixed (in all cases it tends to be less) Exemplar based methods
  • 29. Density based methods MajorClust method Principal idea Total closeness to the objects of his own cluster exceeds the closeness to any other cluster Suboptimal solution Only part of neighbors are considered on every step (to save time, to avoid mergence) .
  • 30. Density based methods MajorClust method General algorithm Initially every object is one cluster and it joins to the nearest neighbor Every object evaluates the total closeness to his own cluster and separately to all other clusters. After such evaluation the objects change its belonging and go off to the closest one The end of searching is fixed when clusters do not change . Preprocessing for MajorClust Many weak links can be stronger than the several strongest ones that disfigures results. So: weak links should be eliminated before clustering
  • 31. Cluster Validity Definition It reflects cluster separability and formally depends on : - Scatters inside clusters - Separation between clusters Indexes It is formal characteristics of structure Dunn index Davies Bouldin index Hypervolume criterion ( Andre Hardy ) Density expected measure DEM ( Benno Stein ) Dunn index (to be max )
  • 32. Cluster Validity Number of clusters Geometrical approach, two variants: Optimum (min, max) of curve Jump of curve Dunn index (to be max ) is too sensible to extremal cases
  • 33. Cluster Usability Definition It reflects user’s opinion and formally expresses the difference between : - Classes selected manually by a user - Clusters constructed by a given method Cluster F -measure ( Benno Stein ) Data Expert Method Here: i, j are indexes of clusses and clusters C * i , C j are classes and clusters prec(i,j), rec(i,j) are precision and recall
  • 34. Validity and Usability Conclusion Density expected measure corresponds to F -measure reflecting expert ’s opinion. So, DEM can be an indicator of expert opinion
  • 35. Tecnologies of Clustering Meta methods They construct separated data sets using criteria of optimization and limitations : Neither much nor small number of clusters Neither large nor small size of clusters etc. Visual methods They present visual images to a user in order to select manually the clusters Using different methods Comparing results
  • 36. Meta Methods Algorithm (example) Notations: N is the number of objects in a given cluster D is the diagonal of a given cluster Initially N 0 and their centers Ci are given Steps 1. Method K - medoid (or any other one) is performed 2. If N > N max or D > D max (in any cluster), then this cluster is divided on 2 parts. Go to p.1 3. If N < N min or D < D min (in any cluster), then this and the closest clusters are joined. Go to p.1 4. When the number of iteration I > I max , Stop Otherwise go to p.1
  • 37. Visual Clustering Clustering on dendrite Clustering in space of factors
  • 38. Problem Authorship of Molier dramatic works (comedies, dramas,...). Corneille and/or Molier ? Approach Style based indexing ( NooJ can be used ) Clustering all dramatic works Well-known dramatic works should be marked Style - Formal style estimations Informal style estimations Formal style indicators - Text Complexity - Text Harmonicity Authorship References : Labbe C., Labbe D. Inter-textual distance and authorship attribution Corneille and Molier. Journ. of Quantitative Linguistics. 2001. Vol.8, N_3, pp.213-331
  • 39. Clustering Authorship Results 1) 18 comedies of Molier should be belonged to Corneille 2) 15 comedies of Mollier are weak connected with all his other works. So, they can be written by two authors 3) 2 comedies of Corneille now are considered as works of Molier . etc. Note : During a certain time Molier and Corneille were friends
  • 40. Special and Universal packages with algorithms of С lustering 1. ClustAn ( Scotland ) www. clustan.com Clustan Graphics-7 (2006) 2. MatLab Descriptions are in Internet 3. Statistica Descriptions are in Internet Learning Journals and С ongresses about Clustering 1. Journal “Journal of Classification”, Springer 2. IFCS - International Federation of Classification Societies, Conferences 3. CSNA - Classification Society of North America, Seminars, Workshops
  • 41. Introduction Definitions Clustering Discussion Open Problems CONTENTS 
  • 42. Certain Observations The numbers of methods for grouping data is a little bit more than the numbers of researchers working in this area. Problem does not consist in searching the best method for all cases. Problem consists in searching the method being relevant for your data. Only you know what methods are the best for you own data . Principal problems consist in choice of indexes (parameters) and measure of closeness to be adecuate to a given problem and given data Frecuently the results are bad because of the bad indexes and bad measure but not the bad method !
  • 43. Certain Observations Antipodal methods To be sure that results are really good and do not depend on the method used one should test these results using any antipodal methods Solomon G, 1977: “The most antipodes are: NN-method and K-means ” Sensibility To be sure that results do not depend essentially on the method’s parameters one should perform the analysis of sensibility by changing parameters of adjustment.
  • 44. Introduction Definitions Clustering Conclusions Open Problems CONTENTS 
  • 45. Some Problems Question 1 How to reveal alien objects? Solution (idea) Revealing a stable structure on different sets of objects. They are subsets of a given set. Object distribution reflects: real structure ( nature ) + noise ( alien objects )
  • 46. Some Problems Question 2 How to accelerate classification? Solution (idea) Filtering objects, which give a minimum contribution to decisive function Representative objects of each cluster
  • 47. CONTACT INFORMATION Mikhail Alexandrov 1,2 , Pavel Makagonov 3 1 Autonomous University of Barcelona, Spain 2 Social Network Research Center with UCE, Slovakia 3 Mixtec Technological University, Mexico dyner1950@mail.ru, mpp@mixteco.utm.mx Petersburg 2008