TDA in NLP

TDA in NLP
Author Name
1 Definition and Intuition

Topology is a branch of mathematics that studies the properties of spaces that
are preserved under continuous deformations, such as stretching and bending,
but not tearing or gluing. The central idea is to focus on the qualitative
aspects of geometric structures, rather than their specific measurements like
distance or angle.
At its core, topology deals with open sets, which form the building blocks
of a topological space. A topological space is defined as a set X together
with a collection of subsets T , called a topology, such that:
• The empty set ∅ and the entire set X are in T .
• The union of any collection of sets in T is also in T .
• The intersection of any finite collection of sets in T is also in T .
Intuitively, topology generalizes the notion of continuity from calculus to

more abstract settings. For example, while two shapes might look very differ-
ent geometrically, topology considers them equivalent if one can be continu-
ously transformed into the other. This leads to a popular phrase in topology:
“a coffee cup is topologically equivalent to a doughnut,” because both have
one hole and can be deformed into each other without cutting or attaching
new parts.
1.1 Intuition Behind Topology

Consider the difference between the Euclidean plane, a sphere, and a torus.
While they have different shapes in the conventional sense, from a topological
viewpoint, the focus is on properties that remain invariant under continuous
1
transformations. These include concepts like connectedness and compactness,
which we will explore later.
—
2 Historical Background
The development of topology can be traced back to the 18th century, with
the work of mathematicians like Leonhard Euler. Euler’s solution to the
Königsberg bridge problem in 1736 is often regarded as one of the first results
in topology, although it was not recognized as such at the time. The problem
asked whether it was possible to walk through the city of Königsberg, crossing
each of its seven bridges exactly once. Euler proved that it was impossible
and, in doing so, laid the foundation for graph theory and the notion of
connectivity, a key concept in topology.
Topology, as a formal mathematical discipline, began to emerge in the
late 19th and early 20th centuries, primarily through the work of Henri
Poincaré, who is often credited as the founder of algebraic topology. Poincaré
introduced concepts like homotopy and homology, which relate topological
spaces to algebraic structures, allowing mathematicians to study spaces us-
ing algebraic tools.
In the early 20th century, mathematicians such as Felix Hausdorff,
Maurice Fréchet, and L.E.J. Brouwer further formalized the field. Haus-
dorff’s introduction of Hausdorff spaces and Fréchet’s work on metric spaces
played a significant role in the development of general topology, also known
as point-set topology.
—
3 Real-World Examples of Topological Con-

cepts
Topology has far-reaching applications beyond abstract mathematics. Here
are a few real-world examples where topological ideas come into play:
2
3.1 Knot Theory
Knot theory, a subfield of topology, studies mathematical knots, which are
loops in 3D space that do not intersect themselves. While this might seem
esoteric, knot theory has applications in understanding the structure of DNA
molecules, where the long strands of DNA often form knots.
3.2 Sensor Networks

In engineering and computer science, topology is used to analyze the connec-
tivity of sensor networks. For instance, ensuring that sensors are sufficiently
connected to cover a certain area can be understood using topological meth-
ods. The Vietoris-Rips complex and Čech complex are tools from topological
data analysis (TDA) that help model and analyze such networks.
3.3 Data Science: Topological Data Analysis (TDA)

In recent years, topology has found applications in data science through
Topological Data Analysis (TDA). TDA provides tools to study the shape
of high-dimensional data, identifying clusters, holes, and other patterns. One
such tool is persistent homology, which allows for the analysis of data across
multiple scales.
3.4 Cosmology
Topology is also used in cosmology to study the shape and structure of the
universe. Cosmologists use topological methods to explore whether the uni-
verse is finite or infinite, and whether it has a complex structure, such as
being doughnut-shaped or connected in higher dimensions.
—
4 Conclusion
Topology provides a powerful framework for understanding the intrinsic prop-
erties of spaces that remain unchanged under continuous deformations. While
it originated from purely mathematical questions, its applications now span
fields ranging from biology and chemistry to engineering and data science. In
3
the next sections, we will dive deeper into the fundamental concepts of topo-
logical spaces, open sets, and continuous maps, setting the stage for more
advanced topics like algebraic topology and topological data analysis.
5 Sets, Open and Closed Sets

In topology, we frequently work with sets and certain special types of subsets.
Understanding open and closed sets is fundamental to the study of topological
spaces.
5.1 Sets
A set is simply a collection of distinct objects, often called elements. For
example, the set X = {1, 2, 3, 4} contains the elements 1, 2, 3, and 4. The
collection of all subsets of X is called the power set of X, denoted by 2X .
5.2 Open Sets

In topology, a subset U of a space X is called open if it satisfies certain
conditions depending on the topology defined on X. The notion of openness
generalizes the idea of open intervals in R.
For example, in the standard topology on R, a set U ⊂ R is open if, for
every point x ∈ U , there exists an ϵ > 0 such that the interval (x − ϵ, x + ϵ)
is entirely contained in U .
5.3 Closed Sets

A subset C ⊆ X is called closed if its complement, X \ C, is open in X.
Therefore, closed sets are the complements of open sets. In R, closed intervals
like [a, b] are examples of closed sets.
An important result is that finite intersections of open sets are open, and
finite unions of closed sets are closed. Additionally, every topological space
has both the empty set ∅ and the entire space X as open and closed sets,
known as trivial open/closed sets.
4
6 Basis for a Topology
The concept of a basis is a convenient way to describe a topology. A collec-
tion B of subsets of a space X is called a basis for a topology if:
• Every element of B is an open set.
• For every x ∈ X and every open set U containing x, there exists a basis
element B ∈ B such that x ∈ B ⊆ U .
Once we have a basis, we can define the topology on X as the collection
of all unions of sets from B. For example, the collection of open intervals in
R forms a basis for the standard topology on R.
7 Interior, Closure, and Boundary

7.1 Interior
The interior of a subset A of a topological space X, denoted by int(A), is
the largest open set contained in A. In other words, it is the union of all open
sets that are contained within A. If A is already an open set, its interior is
simply A.
For example, if A = [0, 1] ⊂ R, then int(A) = (0, 1).
7.2 Closure
The closure of a subset A of a topological space X, denoted by A, is the
smallest closed set containing A. In other words, it is the intersection of all
closed sets that contain A.
Alternatively, A consists of all points in A together with any limit points
of A. For example, if A = (0, 1) in R, then A = [0, 1].
7.3 Boundary
The boundary of a subset A of a topological space X, denoted by ∂A, is
the set of points that are neither in the interior nor in the exterior of A.
Formally, the boundary is defined as:
∂A = A \ int(A).
For example, if A = (0, 1) in R, then ∂A = {0, 1}.
5
8 Continuous Functions and Homeomorphisms
8.1 Continuous Functions
A function f : X → Y between two topological spaces is said to be con-
tinuous if the preimage of every open set in Y is an open set in X. This
definition generalizes the familiar notion of continuity from calculus, where
the idea is to avoid “jumps” or “breaks” in the function.
Formally, f is continuous if for every open set V ⊆ Y , the set f −1 (V ) is
open in X.
8.2 Homeomorphisms
A homeomorphism is a special type of continuous function that defines
an equivalence between two topological spaces. Specifically, a function f :
X → Y is a homeomorphism if it is a continuous bijection with a continuous
inverse. If such a function exists, we say that X and Y are homeomorphic,
meaning that they have the same topological structure.
For example, the unit circle S 1 and the square [0, 1] × [0, 1] are homeo-
morphic, since they can be continuously deformed into one another without
tearing or gluing.
Homeomorphisms are the isomorphisms of topology, meaning they are
the functions that preserve all topological properties.
9 Conclusion
In this chapter, we have explored some of the basic concepts in topology,
including open and closed sets, bases for topologies, and important opera-
tions such as interior, closure, and boundary. We also discussed continu-
ous functions and homeomorphisms, which are central to understanding how
topological spaces relate to one another. These foundational ideas will be
essential as we move forward in studying more advanced topics in topology
and its applications.
6
10 Discrete and Indiscrete Topologies
10.1 Discrete Topology
The discrete topology on a set X is the topology in which every subset of
X is open. In other words, the discrete topology is defined by the power set
of X, i.e., every possible subset is an open set:
Tdiscrete = 2X .
This topology is often the finest topology that can be given to a set, as
it distinguishes between all elements. Consequently, all functions from a
discrete space to any topological space are continuous. This is because the
preimage of any open set is an open set, as all sets in a discrete topology are
open.
10.2 Indiscrete Topology

The indiscrete topology, also called the trivial topology, on a set X is
the coarsest topology that can be defined on X. In this topology, the only
open sets are the empty set ∅ and the whole space X itself:
Tindiscrete = {∅, X}.
The indiscrete topology makes all functions from X to any other space triv-
ially continuous because the only non-empty open set is the entire space X,
and the preimage of X is always X.
Both the discrete and indiscrete topologies represent extreme cases of
topologies, one being the finest possible and the other the coarsest.
11 Metric Topology
A metric space is a set X equipped with a distance function, or metric,
d : X × X → R, which satisfies the following properties for all x, y, z ∈ X:
• d(x, y) ≥ 0 (non-negativity),
• d(x, y) = 0 if and only if x = y (identity of indiscernibles),
• d(x, y) = d(y, x) (symmetry),
7
• d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).
Given a metric space (X, d), the metric topology on X is defined by
taking the open sets as those that can be expressed as unions of open balls.
An open ball of radius ϵ > 0 centered at a point x ∈ X is the set:
B(x, ϵ) = {y ∈ X | d(x, y) < ϵ}.
The collection of all open balls forms a basis for the metric topology. Intu-
itively, a set is open in the metric topology if, around each of its points, one
can find an
12 Product Topology
Let (X, TX ) and (Y, TY ) be two topological spaces. The product topol-
ogy on the Cartesian product X × Y is the coarsest topology for which the
projection maps:
πX : X × Y → X and πY : X × Y → Y
are continuous. That is, a set U ⊆ X × Y is open if and only if it can be

written as a union of sets of the form UX × UY , where UX is open in X and
UY is open in Y .
More generally, if {(Xi , Ti )}i∈I is a collection Q
of topological spaces indexed
by I, the product topology on the product i∈I Xi is generated by the
sets: Y
Ui ,
i∈I
where each Ui is open in Xi and Ui = Xi for all but finitely many i.

For example, the Euclidean space Rn is the product of n copies of R, each
with the standard topology.
The product topology has the property that a function f : Z → X × Y is
continuous if and only if both projections πX ◦ f and πY ◦ f are continuous.
13 Quotient Topology
Let (X, T ) be a topological space, and let ∼ be an equivalence relation on
X. The quotient space X/ ∼ is the set of equivalence classes under ∼.
8
The quotient topology on X/ ∼ is defined as follows: a subset U ⊆ X/ ∼
is open if and only if its preimage under the quotient map q : X → X/ ∼ is
open in X.
That is, U is open in X/ ∼ if q −1 (U ) is open in X. The quotient map
q : X → X/ ∼ is continuous by construction.
13.1 Example
A classic example of a quotient space is the circle S 1 . Consider the unit
interval [0, 1] ⊂ R, and define an equivalence relation that identifies 0 ∼ 1.
The quotient space [0, 1]/ ∼ is homeomorphic to the circle S 1 .
The quotient topology is useful in constructing spaces by “gluing” points
together. For instance, one can form the Möbius strip, the Klein bottle, or
projective spaces using quotient topologies.
14 Connectedness and Disconnected Spaces

Definition 14.1. A topological space X is said to be connected if it cannot
be represented as the union of two nonempty disjoint open sets. In other
words, if X = U ∪ V , where U and V are open subsets of X, then either U
or V must be empty.
Theorem 14.2. If X is connected and Y is a non-empty subset of X, then

Y is connected.
Definition 14.3. A space X is disconnected if it can be expressed as

X = U ∪ V where U and V are nonempty, disjoint open sets.
15 Compact Spaces
Definition 15.1. A topological space X is called compact if every open
cover of X has a finite subcover.
S That is, if for every collection of open sets
{Ui }i∈I
S such that X ⊆ U
i∈I i , there exists a finite subset J ⊆ I such that
X ⊆ j∈J Uj .
Theorem 15.2 (Heine-Borel Theorem). In Rn , a subset is compact if and

only if it is closed and bounded.
9
16 Tychonoff ’s Theorem
Theorem 16.1 (Tychonoff’s Theorem). The product of any collection of
compact topological spaces is compact in the product topology.
Proof. QLet {Xi }i∈I be a collection of compact spaces. Consider the product
space i∈I Xi with the product topology. Let {Uα }α∈A be an open cover of
Q
i∈I Xi . For each i ∈ I, there exists an index αi ∈ A such that xi ∈ Uαi .
Since each Xi is compact, we can extract a finite subcover for each finite
subset J ⊆ I. The union of these finite subcovers will yield a finite subcover
for the entire product space.
17 Heine-Borel Theorem
The Heine-Borel Theorem provides a characterization of compactness in Eu-
clidean spaces.
Theorem 17.1 (Heine-Borel Theorem). In Rn , a subset A is compact if and
only if it is closed and bounded.
Proof. (⇒) If A is compact, let x ∈ Rn be any point not in A. The sets
B(x, r) (balls centered at x of radius r) form an open cover of A but do not
include any points from A. Therefore, A must be bounded. Also, if A were
not closed, there would be a limit point not contained in A, which would
contradict the compactness.
(⇐) If A is closed and bounded, then every open cover has a finite sub-
cover due to the Bolzano-Weierstrass theorem, establishing that A is com-
pact.
18 Separation Axioms
In topology, separation axioms are conditions that define how distinct points
and sets can be separated by open sets. These axioms are used to classify
topological spaces based on their separation properties.
18.1 T0 Spaces
Definition 18.1. A topological space X is called a T0 (Kolmogorov) space
if for any two distinct points x, y ∈ X, there exists an open set U such that
10
x ∈ U and y ∈ / U . This means that at least one of the points can be
”separated” from the other by an open set.
18.2 T1 Spaces
Definition 18.2. A topological space X is called a T1 (Frechet) space if
for any two distinct points x, y ∈ X, there exist open sets Ux and Uy such
that x ∈ Ux and y ∈/ Ux , and y ∈ Uy and x ∈
/ Uy . In a T1 space, every single
point is closed.
18.3 T2 Spaces (Hausdorff Spaces)

Definition 18.3. A topological space X is called a T2 (Hausdorff ) space
if for any two distinct points x, y ∈ X, there exist disjoint open sets U and
V such that x ∈ U and y ∈ V . This means that points can be separated by
neighborhoods that do not overlap.
Theorem 18.4. Every T2 space is also a T1 space, and every T1 space is a

T0 space.
18.4 Regular Spaces

Definition 18.5. A topological space X is called a regular space if it is T1
and, for every point x ∈ X and a closed set C not containing x, there exist
disjoint open sets U and V such that x ∈ U and C ⊆ V .
Theorem 18.6. Every T2 space is a regular space.
18.5 Normal Spaces

Definition 18.7. A topological space X is called a normal space if it is T1
and, for any two disjoint closed sets A and B in X, there exist disjoint open
sets U and V such that A ⊆ U and B ⊆ V .
Theorem 18.8. Every normal space is regular, and every T2 space is normal
if it is also second-countable (has a countable base).
11
19 Homotopy
Definition 19.1. Let X and Y be topological spaces and let f, g : X → Y
be continuous functions. We say that f is homotopic to g (denoted f ≃ g)
if there exists a continuous function H : X × [0, 1] → Y such that:
• H(x, 0) = f (x) for all x ∈ X
• H(x, 1) = g(x) for all x ∈ X
The function H is called a homotopy between f and g.
Theorem 19.2. Homotopy is an equivalence relation on the set of continuous
functions from X to Y .
20 Homology Theory
Homology is a mathematical tool used to associate a sequence of abelian
groups or modules with a topological space, providing a way to analyze its
shape and structure.
Definition 20.1. Let X be a topological space. The singular homology
groups Hn (X) are defined using singular simplices, which are continuous
maps from the standard n-simplex ∆n to X. The n-th homology group is
defined as:
ker(∂n )
Hn (X) =
im(∂n+1 )
where ∂n is the boundary map that sends n-simplices to their boundaries.
Theorem 20.2. The homology groups Hn (X) are topological invariants,
meaning that homeomorphic spaces have isomorphic homology groups.
21 Fundamental Groups and Covering Spaces

The fundamental group is a key concept in algebraic topology that captures
information about the shape of a topological space.
Definition 21.1. The fundamental group π1 (X, x0 ) of a space X based
at a point x0 is the group of equivalence classes of loops based at x0 under
the operation of path concatenation. Two loops are equivalent if one can be
continuously deformed into the other without leaving the space.
12
Theorem 21.2. If X is path-connected, then the fundamental group π1 (X, x0 )
is well-defined, independent of the choice of the base point x0 .
Definition 21.3. A covering space of a topological space X is a space C

together with a continuous surjective map p : C → X such that for every
point x ∈ X, there exists an open neighborhood U of x such that p−1 (U ) is
a disjoint union of open sets in C homeomorphic to U .
Theorem 21.4. If p : C → X is a covering map, then the fundamental group

π1 (C, c0 ) is isomorphic to the group of deck transformations of the covering
space.
22 Simplicial Complexes
Definition 22.1. A simplicial complex is a set K of simplices such that:
• Every face of a simplex in K is also in K.
• The intersection of any two simplices in K is either empty or a face of

both simplices.
A k-simplex is the convex hull of k + 1 affinely independent points (ver-

tices) in some Euclidean space. The vertices of the simplex are denoted by
{v0 , v1 , . . . , vk }, and the k-simplex is denoted by σ = [v0 , v1 , . . . , vk ].
22.1 Examples of Simplicial Complexes

• A single point is a 0-simplex.
• A line segment is a 1-simplex.
• A filled triangle is a 2-simplex.
22.2 Geometric Realization

The geometric realization of a simplicial complex K is a topological space
|K| constructed by gluing together simplices according to the face relations
defined in K.
13
23 Nerve of a Cover
Definition 23.1. Let U = {Ui }i∈I be an open cover of a topological space X.
The nerve of the cover U is a simplicial complex N (U) whose vertices corre-
spond to the open sets Ui in the cover, and a finite subset {Ui0 , Ui1 , . . . , Uik }
forms a k-simplex if and only if the intersection Ui0 ∩ Ui1 ∩ . . . ∩ Uik ̸= ∅.
Theorem 23.2. The nerve of an open cover captures information about the
topology of the underlying space X.
24 Čech Nerve and Dowker’s Theorem

Definition 24.1. The Čech nerve Ň (U) of an open cover U = {Ui }i∈I is
a simplicial complex similar to the nerve of a cover, but with vertices corre-
sponding to the open sets and k-simplices formed by k +1 indices i0 , i1 , . . . , ik
such that the intersection of the corresponding sets Ui0 ∩ Ui1 ∩ · · · ∩ Uik is
non-empty.
Theorem 24.2 (Dowker’s Theorem). Let X be a paracompact space and let

U be an open cover of X. If the Čech nerve Ň (U) is contractible, then X is
homotopy equivalent to the simplicial complex formed by the nerve N (U).
Theorem 24.3. If X is compact and U is an open cover of X, then the nerve

of the cover N (U) is homotopy equivalent to X if the nerve is well-defined.
25 Euler Characteristic
Definition 25.1. The Euler characteristic χ(X) of a finite simplicial com-
plex X is defined as:
χ(X) = V − E + F
where V is the number of vertices, E is the number of edges, and F is the
number of faces in X. For higher-dimensional complexes, this generalizes to:
n
X
χ(X) = (−1)k bk
k=0
where bk is the k-th Betti number.
14
Theorem 25.2. For any convex polyhedron, the Euler characteristic is given
by χ = 2.
Theorem 25.3 (Euler-Poincaré Formula). For a compact, orientable mani-
fold M of dimension n,
Xn
χ(M ) = (−1)i bi
i=0
where bi are the Betti numbers of M .
26 Betti Numbers
Definition 26.1. The Betti numbers bk of a topological space X are de-
fined as the rank of the k-th homology group Hk (X):
bk = rank(Hk (X))
These numbers provide a measure of the k-dimensional holes in the space.

Theorem 26.2. The first Betti number b1 counts the number of independent
loops in a space, while b0 counts the number of connected components.
Theorem 26.3. For a compact, orientable surface of genus g, the Betti
numbers are given by:
b0 = 1, b1 = g, b2 = 1
27 Poincaré Conjecture
Definition 27.1. The Poincaré Conjecture states that every simply con-
nected, closed 3-manifold is homeomorphic to the 3-sphere S 3 . This con-
jecture was one of the most famous in topology and was proven by Grigori
Perelman in 2003.
Theorem 27.2. If M is a compact, simply connected 3-manifold, then M is
homeomorphic to S 3 .
Theorem 27.3. The Poincaré Conjecture implies that any two simply con-
nected, compact 3-manifolds are homotopy equivalent if and only if they are
homeomorphic.
15

TDA in NLP

Uploaded by

Copyright:

Available Formats

TDA in NLP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TDA in NLP

Uploaded by

Copyright:

Available Formats

TDA in NLP

1 Definition and Intuition

• The empty set ∅ and the entire set X are in T .

• The union of any collection of sets in T is also in T .

• The intersection of any finite collection of sets in T is also in T .

Intuitively, topology generalizes the notion of continuity from calculus to

1.1 Intuition Behind Topology

3 Real-World Examples of Topological Con-

3.2 Sensor Networks

3.3 Data Science: Topological Data Analysis (TDA)

5 Sets, Open and Closed Sets

5.2 Open Sets

5.3 Closed Sets

7 Interior, Closure, and Boundary

10.2 Indiscrete Topology

Tindiscrete = {∅, X}.

• d(x, y) = 0 if and only if x = y (identity of indiscernibles),

• d(x, y) = d(y, x) (symmetry),

B(x, ϵ) = {y ∈ X | d(x, y) < ϵ}.

are continuous. That is, a set U ⊆ X × Y is open if and only if it can be

where each Ui is open in Xi and Ui = Xi for all but finitely many i.

14 Connectedness and Disconnected Spaces

Theorem 14.2. If X is connected and Y is a non-empty subset of X, then

Definition 14.3. A space X is disconnected if it can be expressed as

Theorem 15.2 (Heine-Borel Theorem). In Rn , a subset is compact if and

18.3 T2 Spaces (Hausdorff Spaces)

Theorem 18.4. Every T2 space is also a T1 space, and every T1 space is a

18.4 Regular Spaces

Theorem 18.6. Every T2 space is a regular space.

18.5 Normal Spaces

21 Fundamental Groups and Covering Spaces

Definition 21.3. A covering space of a topological space X is a space C

Theorem 21.4. If p : C → X is a covering map, then the fundamental group

• Every face of a simplex in K is also in K.

• The intersection of any two simplices in K is either empty or a face of

A k-simplex is the convex hull of k + 1 affinely independent points (ver-

22.1 Examples of Simplicial Complexes

• A line segment is a 1-simplex.

• A filled triangle is a 2-simplex.

22.2 Geometric Realization

24 Čech Nerve and Dowker’s Theorem

Theorem 24.2 (Dowker’s Theorem). Let X be a paracompact space and let

Theorem 24.3. If X is compact and U is an open cover of X, then the nerve

where bk is the k-th Betti number.

These numbers provide a measure of the k-dimensional holes in the space.

You might also like