This document provides an overview of clustering techniques and similarity measures. It introduces clustering as an unsupervised classification technique where data is grouped without predefined classes. Various similarity and dissimilarity measures are discussed for calculating proximity between data points defined by single or multiple attributes. Measures like symmetric binary coefficient and Jaccard coefficient are explained for computing similarity between objects described by symmetric and asymmetric binary attributes respectively. Examples are also provided to demonstrate calculating these similarity measures.
PCA and LDA are dimensionality reduction techniques. PCA transforms variables into uncorrelated principal components while maximizing variance. It is unsupervised. LDA finds axes that maximize separation between classes while minimizing within-class variance. It is supervised and finds axes that separate classes well. The document provides mathematical explanations of how PCA and LDA work including calculating covariance matrices, eigenvalues, eigenvectors, and transformations.
This document discusses uncertainty and probability theory. It begins by explaining sources of uncertainty for autonomous agents from limited sensors and an unknown future. It then covers representing uncertainty with probabilities and Bayes' rule for updating beliefs. Examples show inferring diagnoses from symptoms using conditional probabilities. Independence is described as reducing the information needed for joint distributions. The document emphasizes probability theory and Bayesian reasoning for handling uncertainty.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
Data preprocessing is the process of preparing raw data for analysis by cleaning it, transforming it, and reducing it. The key steps in data preprocessing include data cleaning to handle missing values, outliers, and noise; data transformation techniques like normalization, discretization, and feature extraction; and data reduction methods like dimensionality reduction and sampling. Preprocessing ensures the data is consistent, accurate and suitable for building machine learning models.
Decision tree in artificial intelligenceMdAlAmin187
The document presents an overview of decision trees, including what they are, common algorithms like ID3 and C4.5, types of decision trees, and how to construct a decision tree using the ID3 algorithm. It provides an example applying ID3 to a sample dataset about determining whether to go out based on weather conditions. Key advantages of decision trees are that they are simple to understand, can handle both numerical and categorical data, and closely mirror human decision making. Limitations include potential for overfitting and lower accuracy compared to other models.
Introduction to Machine Learning with Find-SKnoldus Inc.
Machine Learning with Find-S algorithm. Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
Knowledge representation and Predicate logicAmey Kerkar
1. The document discusses knowledge representation and predicate logic.
2. It explains that knowledge representation involves representing facts through internal representations that can then be manipulated to derive new knowledge. Predicate logic allows representing objects and relationships between them using predicates, quantifiers, and logical connectives.
3. Several examples are provided to demonstrate representing simple facts about individuals as predicates and using quantifiers like "forall" and "there exists" to represent generalized statements.
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
1. Machine learning involves developing algorithms that can learn from data and improve their performance over time without being explicitly programmed. 2. Neural networks are a type of machine learning algorithm inspired by the human brain that can perform both supervised and unsupervised learning tasks. 3. Supervised learning involves using labeled training data to infer a function that maps inputs to outputs, while unsupervised learning involves discovering hidden patterns in unlabeled data through techniques like clustering.
This document discusses rule-based classification. It describes how rule-based classification models use if-then rules to classify data. It covers extracting rules from decision trees and directly from training data. Key points include using sequential covering algorithms to iteratively learn rules that each cover positive examples of a class, and measuring rule quality based on both coverage and accuracy to determine the best rules.
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
The document describes Chapter 6 of the book "Data Mining: Concepts and Techniques" which covers the topics of classification and prediction. It defines classification and prediction and discusses key issues in classification such as data preparation, evaluating methods, and decision tree induction. Decision tree induction creates a tree model by recursively splitting the training data on attributes and their values to make predictions. The chapter also covers other classification methods like Bayesian classification, rule-based classification, and support vector machines. It describes the process of model construction from training data and then using the model to classify new, unlabeled data.
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
The document describes chapter 7 of the book "Data Mining: Concepts and Techniques" which covers cluster analysis. The chapter discusses what cluster analysis is, different types of data that can be analyzed, major clustering methods like partitioning, hierarchical, and density-based methods. It also covers measuring cluster quality, requirements for clustering in data mining, and how to calculate similarity and dissimilarity between data objects.
The document discusses different types of knowledge that may need to be represented in AI systems, including objects, events, performance, and meta-knowledge. It also discusses representing knowledge at two levels: the knowledge level containing facts, and the symbol level containing representations of objects defined in terms of symbols. Common ways of representing knowledge mentioned include using English, logic, relations, semantic networks, frames, and rules. The document also discusses using knowledge for applications like learning, reasoning, and different approaches to machine learning such as skill refinement, knowledge acquisition, taking advice, problem solving, induction, discovery, and analogy.
This document discusses inductive bias in machine learning. It defines inductive bias as the assumptions that allow an inductive learning system to generalize beyond its training data. Without some biases, a learning system cannot rationally classify new examples. The document compares different learning algorithms based on the strength of their inductive biases, from weak biases like rote learning to stronger biases like preferring more specific hypotheses. It argues that all inductive learning systems require some inductive biases to generalize at all.
Data preprocessing involves transforming raw data into an understandable and consistent format. It includes data cleaning, integration, transformation, and reduction. Data cleaning aims to fill missing values, smooth noise, and resolve inconsistencies. Data integration combines data from multiple sources. Data transformation handles tasks like normalization and aggregation to prepare the data for mining. Data reduction techniques obtain a reduced representation of data that maintains analytical results but reduces volume, such as through aggregation, dimensionality reduction, discretization, and sampling.
The document discusses descriptive data summarization techniques for data cleaning and preprocessing. It describes common issues with real-world data including missing values, noise, and inconsistencies. Common techniques for data cleaning are then presented, such as data cleaning, integration, transformation, and reduction. Methods for handling missing values, smoothing noisy data, and resolving inconsistencies are outlined. Finally, descriptive statistical techniques for summarizing data distributions are reviewed, including measures of central tendency, dispersion, and graphical displays like histograms and scatter plots.
This document discusses different types of clustering methods used in data analysis. It covers partitioning methods like k-means and k-medoids that group data into a predefined number of clusters. It also describes different data types that can be clustered, including interval-scaled, binary, categorical, ordinal and ratio-scaled variables. The document provides details on calculating distances and similarities between data objects for various variable types during clustering.
This document discusses different types of clustering methods used in data analysis. It covers partitioning methods like k-means and k-medoids that group data into a predefined number of clusters. It also describes different data types that can be clustered, including interval-scaled, binary, categorical, ordinal and ratio-scaled variables. The document provides details on calculating distances and similarities between objects for different variable types during clustering.
The document discusses various techniques for clustering and dimensionality reduction of web documents. It introduces machine learning clustering methods like k-means clustering and discusses challenges like handling different cluster sizes and shapes. It also covers dimensionality reduction methods like principal component analysis (PCA) and locality-sensitive hashing that can be used to cluster high dimensional web document datasets by reducing their dimensionality.
This document provides a short review of clustering techniques for students. It defines clustering and different types of grouping methods such as hard vs soft clustering. It discusses popular clustering algorithms like hierarchical clustering, k-means clustering, and density-based clustering. It also covers cluster validity, usability, preprocessing techniques, meta methods, and visual clustering. Open problems in clustering mentioned include how to identify outlier objects and accelerate classification.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named “High Accuracy Clustering Algorithm for Categorical datasets”.
This document summarizes several major data classification techniques, including decision tree induction, Bayesian classification, rule-based classification, classification by back propagation, support vector machines, lazy learners, genetic algorithms, rough set approach, and fuzzy set approach. It provides an overview of each technique, describing their basic concepts and key algorithms. The goal is to help readers understand different data classification methodologies and which may be suitable for various domain-specific problems.
Application for Logical Expression Processing csandit
Processing of logical expressions – especially a conversion from conjunctive normal form (CNF) to disjunctive normal form (DNF) – is very common problem in many aspects of information retrieval and processing. There are some existing solutions for the logical symbolic calculations, but none of them offers a functionality of CNF to DNF conversion. A new application for this purpose is presented in this paper.
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...ijsc
In real world everything is an object which represents particular classes. Every object can be fully described by its attributes. Any real world dataset contains large number of attributes and objects. Classifiers give poor performance when these huge datasets are given as input to it for proper classification. So from these huge dataset most useful attributes need to be extracted that contribute the maximum to the decision. In the paper, attribute set is reduced by generating reducts using the indiscernibility relation of Rough Set Theory (RST). The method measures similarity among the attributes using relative indiscernibility relation and computes attribute similarity set. Then the set is minimized and an attribute similarity table is constructed from which attribute similar to maximum number of attributes is selected so that the resultant minimum set of selected attributes (called reduct) cover all attributes of the attribute similarity table. The method has been applied on glass dataset collected from the UCI repository and the classification accuracy is calculated by various classifiers. The result shows the efficiency of the proposed method.
Clustering is the process of grouping objects into clusters based on similarities. There are different types of clustering including hierarchical, k-means, and two stage clustering. Factor analysis reduces a large number of variables into fewer factors that capture maximum common variance. Classification sorts data into predefined categories or classes while clustering does not predefine categories, allowing structure in the data to determine the grouping. Clustering and classification are both used for data analysis but differ in how groups are determined.
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Abebe Admasu
This document discusses various techniques for quantifying similarity and distance between data objects. It begins by explaining that similarity and distance measures are needed to solve problems like recommending similar items to customers or grouping similar web documents. It then covers specific measures like Jaccard similarity, cosine similarity, Lp norms/Minkowski distance, and edit distance. It discusses properties these measures should satisfy to be considered a valid distance metric. Finally, it discusses applications of similarity measures in recommendation systems and challenges therein.
Supervised learning uses labeled training data to predict outcomes for new data. Unsupervised learning uses unlabeled data to discover patterns. Some key machine learning algorithms are described, including decision trees, naive Bayes classification, k-nearest neighbors, and support vector machines. Performance metrics for classification problems like accuracy, precision, recall, F1 score, and specificity are discussed.
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...IOSR Journals
This document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes a method using node importance to label unlabeled data points and find the next clustering result based on the previous one. It first reviews related literature on clustering categorical data and cluster representatives. It then defines basic notations for the problem, including defining a node. It discusses data labeling for unlabeled data points through a rough membership function based similarity between clusters and unlabeled points. This considers node frequency within clusters and distribution across clusters to measure similarity.
The document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes using node importance and a rough membership function to label unlabeled data points and detect concept drift between data clusters over time. Specifically, it defines key terms like nodes, introduces the problem of clustering categorical time-series data and concept drift detection. It then describes using a rough membership function to calculate similarity between unlabeled data and existing clusters in order to label the data and detect changes in cluster characteristics over time.
Undergraduate Topics in Computer Science, Concise Computer Vision Reinhard Klette An Introduction
into Theory and Algorithms:
FEATURE DETECTION AND OBJECT DETECTION - Localization, Classification, and Evaluation - Descriptors, Classifiers and Learning
Image Processing, Facial Expression
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
Dear Sakthi Thiru Dr. G. B. Senthil Kumar,
It is with great honor and respect that we extend this formal invitation to you. As a distinguished leader whose presence commands admiration and reverence, we cordially invite you to join us in celebrating the 25th anniversary of our graduation from Adhiparasakthi Engineering College on 27th July, 2024. we would be honored to have you by our side as we reflect on the achievements and memories of the past 25 years.
Email Marketing in Odoo 17 - Odoo 17 SlidesCeline George
Email marketing is used to send advertisements or commercial messages to specific groups of people by using email. Email Marketing also helps to track the campaign’s overall effectiveness. This slide will show the features of odoo 17 email marketing.
How to install python packages from PycharmCeline George
In this slide, let's discuss how to install Python packages from PyCharm. In case we do any customization in our Odoo environment, sometimes it will be necessary to install some additional Python packages. Let’s check how we can do this from PyCharm.
How to Configure Field Cleaning Rules in Odoo 17Celine George
In this slide let’s discuss how to configure field cleaning rules in odoo 17. Field Cleaning is used to format the data that we use inside Odoo. Odoo 17's Data Cleaning module offers Field Cleaning Rules to improve data consistency and quality within specific fields of our Odoo records. By using the field cleaning, we can correct the typos, correct the spaces between them and also formats can be corrected.
Float Operations in Odoo 17 - Odoo 17 SlidesCeline George
This slide will shows the Float Operations in Odoo 17.
Decimal accuracy is a measurement tool of Odoo to let the end-user specify the floating position of different categories like unit of measure, price. This feature helps the user to decide how many values to be displayed as decimals in each category.
How to Load Custom Field to POS in Odoo 17 - Odoo 17 SlidesCeline George
This slide explains how to load custom fields you've created into the Odoo 17 Point-of-Sale (POS) interface. This approach involves extending the functionalities of existing POS models (e.g., product.product) to include your custom field.
How to Set Start Category in Odoo 17 POSCeline George
When Opening a session of a Point of Sale (POS) we can set the default product view. We can give which category we need to view first. This feature will help to improve the efficiency and it also saves time for the cashier. This slide will show how to set the start category in Odoo 17 POS.
How to Use Quality Module in Odoo 17 - Odoo 17 SlidesCeline George
To improve the quality of our business we have to supervise all the operations and tasks. We can do different quality checks before the product is put to the market. We can do all these activities in a single module that is the Quality module in Odoo 17. This slide will show how to use the quality module in odoo 17.
What is the Use of API.onchange in Odoo 17Celine George
The @api.onchange decorator in Odoo is indeed used to trigger a method when a field's value changes. It's commonly used for validating data or triggering actions based on the change of a specific field. When the field value changes, the function decorated with @api.onchange will be called automatically.
How to Configure Extra Steps During Checkout in Odoo 17 Website AppCeline George
Odoo websites allow us to add an extra step during the checkout process to collect additional information from customers. This can be useful for gathering details that aren't necessarily covered by standard shipping and billing addresses.
2. Topics to be covered…
Introduction to clustering
Similarity and dissimilarity measures
Clustering techniques
Partitioning algorithms
Hierarchical algorithms
Density-based algorithm
2
3. Introduction to Clustering
3
Classification consists of assigning a class label to a set of unclassified
cases.
Supervised Classification
The set of possible classes is known in advance.
Unsupervised Classification
Set of possible classes is not known. After classification we can try to assign a
name to that class.
Unsupervised classification is called clustering.
6. Introduction to Clustering
Clustering is somewhat related to classification in the sense that in both
cases data are grouped.
•
However, there is a major difference between these two techniques.
In order to understand the difference between the two, consider a sample
dataset containing marks obtained by a set of students and corresponding
grades as shown in Table 15.1.
6
7. Introduction to Clustering
Table 12.1: Tabulation of Marks
7
Roll No Mark Grade
1 80 A
2 70 A
3 55 C
4 91 EX
5 65 B
6 35 D
7 76 A
8 40 D
9 50 C
10 85 EX
11 25 F
12 60 B
13 45 D
14 95 EX
15 63 B
16 88 A
Figure 12.1: Group representation of
dataset in Table 15.1
15 12
5 11
14 10
4
6 13
8
16 7
1 2
3
9
B
F
EX
D
C
A
8. Introduction to Clustering
It is evident that there is a simple mapping between Table 12.1 and Fig 12.1.
The fact is that groups in Fig 12.1 are already predefined in Table 12.1. This
is similar to classification, where we have given a dataset where groups of
data are predefined.
Consider another situation, where ‘Grade’ is not known, but we have to
make a grouping.
Put all the marks into a group if any other mark in that group does not
exceed by 5 or more.
This is similar to “Relative grading” concept and grade may range from A
to Z.
8
9. Introduction to Clustering
Figure 12.2 shows another grouping by means of another simple mapping,
but the difference is this mapping does not based on predefined classes.
In other words, this grouping is
accomplished by finding similarities
between data according to
characteristics found in the actual
data.
Such a group making is called
clustering.
10. Introduction to Clustering
Example 12.1 : The task of clustering
In order to elaborate the clustering task, consider the following dataset.
Table 12.2: Life Insurance database
With certain similarity or likeliness defined, we can classify the records to
one or group of more attributes (and thus mapping being non-trivial).
10
Martial
Status
Age Income Education Number of
children
Single 35 25000 Under Graduate 3
Married 25 15000 Graduate 1
Single 40 20000 Under Graduate 0
Divorced 20 30000 Post-Graduate 0
Divorced 25 20000 Under Graduate 3
Married 60 70000 Graduate 0
Married 30 90000 Post-Graduate 0
Married 45 60000 Graduate 5
Divorced 50 80000 Under Graduate 2
11. Clustering has been used in many application domains:
Image analysis
Document retrieval
Machine learning, etc.
When clustering is applied to real-world database, many problems may
arise.
1. The (best) number of cluster is not known.
There is not correct answer to a clustering problem.
In fact, many answers may be found.
The exact number of cluster required is not easy to determine.
11
Introduction to Clustering
12. 2. There may not be any a priori knowledge concerning the clusters.
• This is an issue that what data should be used for clustering.
• Unlike classification, in clustering, we have not supervisory learning to aid
the process.
• Clustering can be viewed as similar to unsupervised learning.
3. Interpreting the semantic meaning of each cluster may be difficult.
• With classification, the labeling of classes is known ahead of time. In
contrast, with clustering, this may not be the case.
• Thus, when the clustering process is finished yielding a set of clusters, the
exact meaning of each cluster may not be obvious.
12
Introduction to Clustering
13. Definition of Clustering Problem
13
Given a database D = 𝑡1, 𝑡2, … . . , 𝑡𝑛 of 𝑛 tuples, the clustering problem is to
define a mapping 𝑓 ∶ D 𝐶, where each 𝑡𝑖 ∈ 𝐷 is assigned to one cluster 𝑐𝑖 ∈
𝐶. Here, C = 𝑐1, 𝑐2, … . . , 𝑐𝑘 denotes a set of clusters.
Definition 12.1: Clustering
• Solution to a clustering problem is devising a mapping formulation.
• The formulation behind such a mapping is to establish that a tuple within
one cluster is more like tuples within that cluster and not similar to tuples
outside it.
14. Definition of Clustering Problem
14
• Hence, mapping function f in Definition 12.1 may be explicitly stated as
𝑓 ∶ D 𝑐1, 𝑐2, … . . , 𝑐𝑘
where i) each 𝑡𝑖 ∈ 𝐷 is assigned to one cluster 𝑐𝑖 ∈ 𝐶.
ii) for each cluster 𝑐𝑖 ∈ 𝐶, and for all 𝑡𝑖𝑝 , 𝑡𝑖𝑞 ∈ 𝑐𝑖 and there exist 𝑡𝑗 ∉ 𝑐𝑖 such that
similarity (𝑡𝑖𝑝 , 𝑡𝑖𝑞 ) > similarity (𝑡𝑖𝑝 , 𝑡𝑗 ) AND similarity (𝑡𝑖𝑞 , 𝑡𝑗 )
• In the field of cluster analysis, this similarity plays an important part.
• Now, we shall learn how similarity (this is also alternatively judged as “dissimilarity”)
between any two data can be measured.
16. Similarity and Dissimilarity Measures
16
• In clustering techniques, similarity (or dissimilarity) is an important measurement.
• Informally, similarity between two objects (e.g., two images, two documents, two
records, etc.) is a numerical measure of the degree to which two objects are alike.
• The dissimilarity on the other hand, is another alternative (or opposite) measure of
the degree to which two objects are different.
• Both similarity and dissimilarity also termed as proximity.
• Usually, similarity and dissimilarity are non-negative numbers and may range from
zero (highly dissimilar (no similar)) to some finite/infinite value (highly similar (no
dissimilar)).
Note:
• Frequently, the term distance is used as a synonym for dissimilarity
• In fact, it is used to refer as a special case of dissimilarity.
17. Proximity Measures: Single-Attribute
17
• Consider an object, which is defined by a single attribute A (e.g., length) and the
attribute A has n-distinct values 𝑎1, 𝑎2, … . . , 𝑎𝑛.
• A data structure called “Dissimilarity matrix” is used to store a collection of
proximities that are available for all pair of n attribute values.
• In other words, the Dissimilarity matrix for an attribute A with n values is represented by
an 𝑛 × 𝑛 matrix as shown below.
0
𝑝(2,1) 0
𝑝(3,1)
⋮
𝑝(𝑛,1)
𝑝(3,2)
⋮
𝑝(𝑛,2)
0
⋮
… … 0 𝑛×𝑛
• Here, 𝑝(𝑖,𝑗) denotes the proximity measure between two objects with attribute values 𝑎𝑖
and 𝑎𝑗.
• Note: The proximity measure is symmetric, that is, 𝑝(𝑖,𝑗) = 𝑝(𝑗,𝑖)
18. Proximity Calculation
Proximity calculation to compute 𝑝(𝑖,𝑗) is different for different types of
attributes according to NOIR topology.
Proximity calculation for Nominal attributes:
• For example, binary attribute, Gender = {Male, female} where Male is
equivalent to binary 1 and female is equivalent to binary 0.
• Similarity value is 1 if the two objects contains the same attribute value, while
similarity value is 0 implies objects are not at all similar.
18
Object Gender
Ram Male
Sita Female
Laxman Male
• Here, Similarity value let it be denoted by 𝑝, among
different objects are as follows.
𝑝 𝑅𝑎𝑚, 𝑠𝑖𝑡𝑎 = 0
𝑝 𝑅𝑎𝑚, 𝐿𝑎𝑥𝑚𝑎𝑛 = 1
Note : In this case, if 𝑞 denotes the dissimilarity between two objects 𝑖 𝑎𝑛𝑑 𝑗 with
single binary attributes, then 𝑞(𝑖,𝑗)= 1 − 𝑝(𝑖,𝑗)
19. Proximity Calculation
19
• Now, let us focus on how to calculate proximity measures between objects which are
defined by two or more binary attributes.
• Suppose, the number of attributes be 𝑏. We can define the contingency table
summarizing the different matches and mismatches between any two objects
𝑥 and 𝑦, which are as follows.
Object
𝑥
Object y
1 0
1 𝑓11 𝑓10
0 𝑓01 𝑓00
Table 12.3: Contingency table with binary attributes
Here, 𝑓11= the number of attributes where 𝑥=1 and 𝑦=1.
𝑓10= the number of attributes where 𝑥=1 and 𝑦=0.
𝑓01= the number of attributes where 𝑥=0 and 𝑦=1.
𝑓00= the number of attributes where 𝑥=0 and 𝑦=0.
Note : 𝑓00 + 𝑓01 + 𝑓10 + 𝑓11 = 𝑏, the total number of binary attributes.
Now, two cases may arise: symmetric and asymmetric binary attributes.
20. Similarity Measure with Symmetric Binary
20
• To measure the similarity between two objects defined by symmetric binary
attributes using a measure called symmetric binary coefficient and denoted as 𝒮 and
defined below
𝒮 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑣𝑎𝑙𝑢𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
or
𝒮 =
𝑓00 + 𝑓11
𝑓00 + 𝑓01 + 𝑓10 + 𝑓11
The dissimilarity measure, likewise can be denoted as 𝒟 and defined as
𝒟 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑣𝑎𝑙𝑢𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
or
𝒟 =
𝑓01 + 𝑓10
𝑓00 + 𝑓01 + 𝑓10 + 𝑓11
Note that, 𝒟 = 1 − 𝒮
21. Similarity Measure with Symmetric Binary
21
Example 12.2: Proximity measures with symmetric binary attributes
Consider the following two dataset, where objects are defined with symmetric binary
attributes.
Gender = {M, F}, Food = {V, N}, Caste = {H, M}, Education = {L, I},
Hobby = {T, C}, Job = {Y, N}
Object Gender Food Caste Education Hobby Job
Hari M V M L C N
Ram M N M I T N
Tomi F N H L C Y
𝒮(Hari, Ram) =
1+2
1+2+1+2
= 0.5
22. Proximity Measure with Asymmetric Binary
22
• Such a similarity measure between two objects defined by asymmetric binary
attributes is done by Jaccard Coefficient and which is often symbolized by 𝒥 is given
by the following equation
𝒥=
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 𝑛𝑜𝑡 𝑖𝑛𝑣𝑜𝑙𝑣𝑒𝑑 𝑖𝑛 00 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔
or
𝒥 =
𝑓11
𝑓01 + 𝑓10 + 𝑓11
23. 23
Example 12.3: Jaccard Coefficient
Consider the following two dataset.
Gender = {M, F}, Food = {V, N}, Caste = {H, M}, Education = {L, I},
Hobby = {T, C}, Job = {Y, N}
Calculate the Jaccard coefficient between Ram and Hari assuming that all binary
attributes are asymmetric and for each pair values for an attribute, first one is more
frequent than the second.
Object Gender Food Caste Education Hobby Job
Hari M V M L C N
Ram M N M I T N
Tomi F N H L C Y
𝒥(Hari, Ram) =
1
2+1+1
= 0.25
Note: 𝒥(Ram, Tomi) = 0 and 𝒥(Hari, Ram) = 𝒥(Ram, Hari), etc.
Proximity Measure with Asymmetric Binary
24. 24
Example 12.4:
Consider the following two dataset.
Gender = {M, F}, Food = {V, N}, Caste = {H, M}, Education = {L, I},
Hobby = {T, C}, Job = {Y, N}
Object Gender Food Caste Education Hobby Job
Hari M V M L C N
Ram M N M I T N
Tomi F N H L C Y
How you can calculate similarity if Gender, Hobby and Job are symmetric
binary attributes and Food, Caste, Education are asymmetric binary
attributes?
Obtain the similarity matrix with Jaccard coefficient of objects for the above, e.g.
?
25. 25
• Binary attribute is a special kind of nominal attribute where the attribute has values
with two states only.
• On the other hand, categorical attribute is another kind of nominal attribute where it
has values with three or more states (e.g. color = {Red, Green, Blue}).
• If 𝓈 𝑥, 𝑦 denotes the similarity between two objects 𝑥 𝑎𝑛𝑑 𝑦, then
𝓈 𝑥, 𝑦 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑡𝑐ℎ𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
and the dissimilarity 𝒹(𝑥, 𝑦) is
𝒹(𝑥, 𝑦) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
• If 𝑚 = number of matches and 𝑎 = number of categorical attributes with which
objects are defined as
𝓈 𝑥, 𝑦 =
𝑚
𝑎
and 𝒹 𝑥, 𝑦 =
𝑎−𝑚
𝑎
Proximity Measure with Categorical Attribute
26. 26
Example 12.4:
Object Color Position Distance
1 R L L
2 B C M
3 G R M
4 R L H
The similarity matrix considering only color attribute is shown below
Dissimilarity matrix, 𝒹 =?
Obtain the dissimilarity matrix considering both the categorical attributes (i.e.
color and position).
Proximity Measure with Categorical Attribute
27. 27
• Ordinal attribute is a special kind of categorical attribute, where the values of
attribute follows a sequence (ordering) e.g. Grade = {Ex, A, B, C} where Ex > A >B >C.
• Suppose, A is an attribute of type ordinal and the set of values of
𝐴 = {𝑎1, 𝑎2, … . . , 𝑎𝑛}. Let 𝑛 values of 𝐴 are ordered in ascending order as 𝑎1< 𝑎2 <. . <
𝑎𝑛. Let i-th attribute value 𝑎𝑖 be ranked as i, i=1,2,..n.
• The normalized value of 𝑎𝑖 can be expressed as
𝑎𝑖 =
𝑖 − 1
𝑛 − 1
• Thus, normalized values lie in the range [0. . 1].
• As 𝑎𝑖 is a numerical value, the similarity measure, then can be calculated using any
similarity measurement method for numerical attribute.
• For example, the similarity measure between two objects 𝑥 𝑎𝑛𝑑 𝑦 with attribute
values 𝑎𝑖 and 𝑎𝑗, then can be expressed as
𝓈 𝑥, 𝑦 = (𝑎𝑖 − 𝑎𝑗)2
where 𝑎𝑖 and 𝑎𝑖 are the normalized values of 𝑎𝑖 and 𝑎𝑖 , respectively.
Proximity Measure with Ordinal Attribute
28. 28
Example 12.5:
Consider the following set of records, where each record is defined by two ordinal attributes
size={S, M, L} and Quality = {Ex, A, B, C} such that S<M<L and Ex>A>B>C.
Object Size Quality
A S (0.0) A (0.66)
B L (1.0) Ex (1.0)
C L (1.0) C (0.0)
D M (0.5) B (0.33)
• Normalized values are shown in brackets.
• Their similarity measures are shown in the similarity matrix below.
?
Find the dissimilarity matrix, when each object is defined by only one ordinal
attribute say size (or quality).
𝐴
𝐵
𝐶
𝐷
0 0 0
0 0
0
0
0
0
0
𝐴 𝐵 𝐶 D
Proximity Measure with Ordinal Attribute
29. 29
• The measure called distance is usually referred to estimate the similarity between
two objects defined with interval-scaled attributes.
• We first present a generic formula to express distance d between two objects
𝑥 𝑎𝑛𝑑 𝑦 in n-dimensional space. Suppose, 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 denote the values of ith
attribute of the objects 𝑥 𝑎𝑛𝑑 𝑦 respectively.
𝑑 𝑥, 𝑦 =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖
𝑟
1
𝑟
• Here, 𝑟 is any integer value.
• This distance metric most popularly known as Minkowski metric.
• This distance measure follows some well-known properties. These are mentioned
in the next slide.
Proximity Measure with Interval Scale
30. 30
Properties of Minkowski metrics:
1. Non-negativity:
a. 𝑑 𝑥, 𝑦 ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 𝑎𝑛𝑑 𝑦
b. 𝑑 𝑥, 𝑦 = 0 only if 𝑥 = 𝑦. This is also called identity condition.
2. Symmetry:
𝑑 𝑥, 𝑦 = 𝑑 𝑦, 𝑥 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 𝑎𝑛𝑑 𝑦
This condition ensures that the order in which objects are considered is not important.
3. Transitivity:
𝑑 𝑥, 𝑧 ≤ 𝑑 𝑥, 𝑦 + 𝑑 𝑦, 𝑧 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 , 𝑦 𝑎𝑛𝑑 𝑧.
• This condition has the interpretation that the least distance 𝑑 𝑥, 𝑧 between objects
𝑥 𝑎𝑛𝑑 𝑧 is always less than or equal to the sum of the distance between the objects
𝑥 𝑎𝑛𝑑 𝑦, and between 𝑦 𝑎𝑛𝑑 𝑧.
• This property is also termed as Triangle Inequality.
Proximity Measure with Interval Scale
31. 31
Depending on the value of 𝑟, the distance measure is renamed accordingly.
1. Manhattan distance (L1 Norm: 𝒓 = 1)
The Manhattan distance is expressed as
𝑑 =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖
where … denotes the absolute value. This metric is also alternatively termed as Taxicals
metric, city-block metric.
Example: x = [7, 3, 5] and y = [3, 2, 6].
The Manhattan distance is 7 − 3 + 3 − 2 + 5 − 6 = 6.
• As a special instance of Manhattan distance, when attribute values ∈ [0, 1] is called
Hamming distance.
• Alternatively, Hamming distance is the number of bits that are different between two
objects that have only binary values (i.e. between two binary vectors).
Proximity Measure with Interval Scale
32. 32
2. Euclidean Distance (L2 Norm: 𝒓 = 2)
This metric is same as Euclidean distance between any two points 𝑥 𝑎𝑛𝑑 𝑦 𝑖𝑛 ℛ𝑛.
𝑑(𝑥, 𝑦) =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖
2
Example: x = [7, 3, 5] and y = [3, 2, 6].
The Euclidean distance between 𝑥 𝑎𝑛𝑑 𝑦 is
𝑑 𝑥, 𝑦 = 7 − 3 2 + 3 − 2 2 + 5 − 6 2 = 18 ≈ 2.426
Proximity Measure with Interval Scale
33. 33
3. Chebychev Distance (L∝
Norm: 𝒓 ∈ ℛ)
This metric is defined as
𝑑(𝑥, 𝑦) = max
∀𝑖
𝑥𝑖 − 𝑦𝑖
• We may clearly note the difference between Chebychev metric and Manhattan distance.
That is, instead of summing up the absolute difference (in Manhattan distance), we simply
take the maximum of the absolute differences (in Chebychev distance). Hence, L∝
< L𝟏
Example: x = [7, 3, 5] and y = [3, 2, 6].
The Manhattan distance = 7 − 3 + 3 − 2 + 5 − 6 = 6.
The chebychev distance = Max { 7 − 3 , 3 − 2 , 5 − 6 } = 4.
Proximity Measure with Interval Scale
34. 34
4. Other metrics:
a. Canberra metric:
𝑑(𝑥, 𝑦) =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖
𝑥𝑖 + 𝑦𝑖
𝑞
• where q is a real number. Usually q = 1, because numerator of the ratio is always ≤
denominator, the ratio ≤ 1, that is, the sum is always bounded and small.
• If q ≠ 1, it is called Fractional Canberra metric.
• If q > 1, the oppositive relationship holds.
b. Hellinger metric:
𝑑(𝑥, 𝑦) =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖
2
This metric is then used as either squared or transformed into an acceptable range [-1, +1]
using the following transformations.
i. 𝑑 𝑥, 𝑦 = (1 − 𝑟(𝑥, 𝑦)) 2
ii. 𝑑 𝑥, 𝑦 = 1 − 𝑟 𝑥, 𝑦
Where 𝑟 𝑥, 𝑦 is correlation coefficient between 𝑥 𝑎𝑛𝑑 𝑦.
Note: Dissimilarity measurement is not relevant with distance measurement.
Proximity Measure with Interval Scale
35. Proximity Measure for Ratio-Scale
35
The proximity between the objects with ratio-scaled variable can be carried with the
following steps:
1. Apply the appropriate transformation to the data to bring it into a linear scale. (e.g.
logarithmic transformation to data of the form 𝑋 = 𝐴𝑒𝐵.
2. The transformed values can be treated as interval-scaled values. Any distance measure
discussed for interval-scaled variable can be applied to measure the similarity.
Note:
There are two concerns on proximity measures:
• Normalization of the measured values.
• Intra-transformation from similarity to dissimilarity measure and vice-versa.
36. Proximity Measure for Ratio-Scale
36
Normalization:
• A major problem when using the similarity (or dissimilarity) measures (such as Euclidean
distance) is that the large values frequently swamp the small ones.
• For example, consider the following data.
Here, the contribution of Cost 2 and Cost 3 is insignificant compared to Cost 1 so far the
Euclidean distance is concerned.
• This problem can be avoided if we consider the normalized values of all numerical
attributes.
• Another normalization may be to take the estimated values in a normalized range say [0, 1].
Note that, if a measure varies in the range, then it can be normalized as
𝓈′ =
1
1+𝓈
where 𝓈 ∈ 0. . ∞
37. 37
Intra-transformation:
• Transforming similarities to dissimilarities and vice-versa is also relatively straightforward.
• If the similarity (or dissimilarity) falls in the interval [0..1], the dissimilarity (or similarity)
can be obtained as
𝑑 = 1 − 𝓈
or
𝓈 = 1 − 𝑑
• Another approach is to define similarity as the negative of dissimilarity ( or vice-versa).
Proximity Measure for Ratio-Scale
38. Proximity Measure with Mixed Attributes
38
• The previous metrics on similarity measures assume that all the attributes were of
the same type. Thus, a general approach is needed when the attributes are of
different types.
• One straightforward approach is to compute the similarity between each attribute
separately and then combine these attribute using a method that results in a
similarity between 0 and 1.
• Typically, the overall similarity is defined as the average of all the individual
attribute similarities.
• See the algorithm in the next slide for doing this.
39. Similarity Measure with Vector Objects
39
Suppose, the objects are defined with 𝐴1, 𝐴2, … . . , 𝐴𝑛 attributes.
1. For the k-th attribute (k = 1, 2, . . , n), compute similarity 𝓈𝑘(𝑥, 𝑦) in the range [0, 1].
2. Compute the overall similarity between two objects using the following formula
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑥, 𝑦 =
𝑖=1
𝑛
𝓈𝑖(𝑥, 𝑦)
𝑛
3. The above formula can be modified by weighting the contribution of each attribute. If the
weight 𝑤𝑘 is for the k-th attribute, then
𝑤_𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑥, 𝑦 =
𝑖=1
𝑛
𝑤𝑖𝓈𝑖(𝑥, 𝑦)
𝑛
Such that 𝑖=1
𝑛
𝑤𝑖 = 1.
4. The definition of the Minkowski distance can also be modified as follows:
𝑑 𝑥, 𝑦 =
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 − 𝑦𝑖
𝑟
1
𝑟
Each symbols are having their usual meanings.
40. Similarity Measure with Mixed Attributes
40
Example 12.6:
Consider the following set of objects. Obtain the similarity matrix.
[For C: X>A>B>C]
Object
A
(Binary)
B
(Categorical)
C
(Ordinal)
D
(Numeric)
E
(Numeric)
1 Y R X 475 108
2 N R A 10 10-2
3 N B C 1000 105
4 Y G B 500 103
5 Y B A 80 1
How cosine similarity can be applied to this?
41. Non-Metric similarity
41
• In many applications (such as information retrieval) objects are complex and contains a
large number of symbolic entities (such as keywords, phrases, etc.).
• To measure the distance between complex objects, it is often desirable to introduce a non-
metric similarity function.
• Here, we discuss few such non-metric similarity measurements.
Cosine similarity
Suppose, x and y denote two vectors representing two complex objects. The cosine similarity
denoted as cos(𝑥, 𝑦) and defined as
cos(𝑥, 𝑦) =
𝑥 ⋅ 𝑦
𝑥 ⋅ 𝑦
• where 𝑥 ⋅ 𝑦 denotes the vector dot product, namely 𝑥 ⋅ 𝑦 = 𝑖=1
𝑛
𝑥𝑖 ⋅ 𝑦𝑖 such that 𝑥 =
[𝑥1, 𝑥2, . . , 𝑥𝑛] and 𝑦 = [𝑦1, 𝑦2, . . , 𝑦𝑛].
• 𝑥 and 𝑦 denote the Euclidean norms of vector x and y, respectively (essentially the
length of vectors x and y), that is
𝑥 = 𝑥1
2
+ 𝑥2
2
+. . +𝑥𝑛
2
and 𝑦 = 𝑦1
2
+ 𝑦2
2
+. . +𝑦𝑛
2
42. Cosine Similarity
42
• In fact, cosine similarity essentially is a measure of the (cosine of the)
angle between x and y.
• Thus if the cosine similarity is 1, then the angle between x and y is 0°
and in this case, x and y are the same except for magnitude.
• On the other hand, if cosine similarity is 0, then the angle between
x and y is 90° and they do not share any terms.
• Considering, this cosine similarity can be written equivalently
cos 𝑥, 𝑦 =
𝑥 ⋅ 𝑦
𝑥 ⋅ 𝑦
=
𝑥
𝑥
⋅
𝑦
𝑦
= 𝑥 ⋅ 𝑦
where 𝑥 =
𝑥
𝑥
and 𝑦 =
𝑦
𝑦
. This means that cosine similarity does
not take the magnitude of the two vectors into account, when
computing similarity.
• It is thus, one way normalized measurement.
43. Non-Metric Similarity
43
Example 12.7: Cosine Similarity
Suppose, we are given two documents with count of 10 words in each are shown in the
form of vectors x and y as below.
x = [3, 2, 0, 5, 0, 0, 0, 2, 0, 0] and y = [1, 0, 0, 0, 0, 0, 0, 1, 0, 2]
Thus, 𝑥 ⋅ 𝑦 = 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2
= 5
𝑥 = 32 + 22 + 0 + 52 + 0 + 0 + 0 + 22 + 0 + 0 = 6.48
𝑦 = 12 + 0 + 0 + 0 + 0 + 0 + 0 + 12 + 0 + 22 = 2.24
∴ cos 𝑥, 𝑦 = 0.31
Extended Jaccard Coefficient
The extended Jaccard coefficient is denoted as 𝐸𝐽 and defined as
𝐸𝐽 =
𝑥 ⋅ 𝑦
‖x‖2 ⋅ 𝑦 2 − 𝑥 ⋅ 𝑦
• This is also alternatively termed as Tanimoto coefficient and can be used to measure
like document similarity.
Compute Extended Jaccard coefficient (𝐸𝐽) for the above example 12.7.
44. Pearson’s Correlation
44
• The correlation between two objects x and y gives a measure of the linear relationship
between the attributes of the objects.
• More precisely, Pearson’s correlation coefficient between two objects x and y is defined in
the following.
𝑃 𝑥, 𝑦 =
𝑆𝑥𝑦
𝑆𝑥 ⋅ 𝑆𝑦
where 𝑆𝑥𝑦 = 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥, 𝑦 =
1
𝑛−1 𝑖=1
𝑛
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑆𝑥 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑥 =
1
𝑛−1 𝑖=1
𝑛
𝑥𝑖 − 𝑥 2
𝑆𝑦 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑦 =
1
𝑛−1 𝑖=1
𝑛
𝑦𝑖 − 𝑦 2
𝑥 = 𝑚𝑒𝑎𝑛 𝑥 =
1
𝑛 𝑖=1
𝑛
𝑥𝑖
𝑦 = 𝑚𝑒𝑎𝑛 𝑦 =
1
𝑛 𝑖=1
𝑛
𝑦𝑖
and n is the number of attributes in x and y.
45. 45
Note 1: Correlation is always in the range of -1 to 1. A correlation of 1(-1) means that x and y
have a perfect positive (negative) linear relationship, that is, 𝑥𝑖 = 𝑎 ⋅ 𝑦𝑖 + 𝑏 for some and b.
Example 12.8: Pearson’s correlation
Calculate the Pearson's correlation of the two vectors x and y as given below.
x = [3, 6, 0, 3, 6]
y = [1, 2, 0, 1, 2]
Note: Vector components can be negative values as well.
Note:
If the correlation is 0, then there is no linear relationship between the attribute of the object.
Example 12.9: Non-linear correlation
Verify that there is no linear relationship among attributes in the objects x and y given below.
x = [-3, -2, -1, 0, 1, 2, 3]
y = [9, 4, 1, 0, 1, 4, 9]
P (x, y) = 0, and also note 𝑥𝑖 = 𝑦𝑖
2
for all attributes here.
46. Mahalanobis Distance
46
• A related issue with distance measurement is how to handle the situation when attributes do
not have the same range of values.
• For example, a record with two objects Age and Income. Here, two attributes have different
scales. Thus, Euclidean distance is not a suitable measure to handle such situation.
• In the other way around, how to compute distance when there is correlation between some
of the attributes, perhaps, in addition to difference in the ranges of values.
• A generalization of Euclidean distance, the mahalanobi’s distance is useful when attributes
are (partially) correlated and/or have different range of values.
• The Mahalanobi’s distance between two objects (vectors) x and y is defined as
𝑀(𝑥, 𝑦) = (𝑥 − 𝑦) −1
𝑥 − 𝑦 𝑇
Here, −1
is inverse if the covariance matrix.
47. Set Difference and Time Difference
47
Set Difference
• Another non-metric dissimilarity measurement is set difference.
• Given two sets A and B, A – B is the set of elements of A that are not in B. Thus, if
A = {1, 2, 3, 4} and B = {2, 3, 4} then A – B = {1} and B – A = ∅.
• We can define d between two sets as d(A, B) as
𝑑(𝐴, 𝐵) = |𝐴 − 𝐵|
where 𝐴 denotes the size of set A.
Note:
This measure does not satisfy the property of Non-negativity, Symmetric and Transitivity.
• Another modified definition however satisfies
𝑑 𝐴, 𝐵 = 𝐴 − 𝐵 + |𝐵 − 𝐴|
Time Difference
• It defines the distance between times of the day as follows
𝑑 𝑡1, 𝑡2 =
𝑡2 − 𝑡1 𝑖𝑓 𝑡1 ≤ 𝑡2
24 + (𝑡2−𝑡1) 𝑖𝑓 𝑡1 ≥ 𝑡2
• Example: 𝑑 1 𝑝𝑚, 2 𝑝𝑚 = 1 ℎ𝑜𝑢𝑟
𝑑 2 𝑝𝑚, 1 𝑝𝑚 = 23 ℎ𝑜𝑢𝑟𝑠.