pykda
(you say "pie-k-d-a") is a Python package for the Kemeny Decomposition Algorithm (KDA) which
allows to decompose a Markov chain into clusters of states, where states within
a cluster are relatively more connected to each other than states outside
the cluster. This is useful for analyzing influence graphs, such as social
networks and internet networks. KDA was developed in the paper from Berkhout and Heidergott (2019)
and uses the Kemeny constant as a connectivity measure.
Package pykda
depends on numpy
, tarjan
and pyvis
.
Use the package manager pip to install PyKDA
pip install pykda
The first step is to load a Markov chain as a MarkovChain
object using a
transition matrix P
.
from pykda.Markov_chain import MarkovChain
P = [[0, 0.3, 0.7, 0, 0],
[0.7, 0.2, 0.09, 0, 0.01],
[0.5, 0.25, 0.25, 0, 0],
[0, 0, 0, 0.5, 0.5],
[0.01, 0, 0, 0.74, 0.25]] # artificial transition matrix
MC = MarkovChain(P)
We can study some properties of the Markov chain, such as the stationary distribution:
print(MC.stationary_distribution.flatten().round(3))
This gives [0.226 0.156 0.23 0.232 0.156]
. We can also plot the Markov chain:
MC.plot(file_name="An artificial Markov chain")
Now, let us decompose the Markov chain into clusters using KDA. We start by
initializing a KDA
object using the Markov chain and the KDA settings (such
as the number of clusters). For more details about setting choices, see the KDA documentation
or Berkhout and Heidergott (2019).
Here, we apply the default settings, which is to cut all edges with a negative
Kemeny constant derivative and normalizing the transition matrix afterward.
kda = KDA(
original_MC=MC, CO_A="CO_A_1(1)", CO_B="CO_B_3(0)", symmetric_cut=False
)
Now, let us run the KDA algorithm and visualize the results.
kda.run()
kda.plot(file_name="An artificial Markov chain after KDA_A1_1_B3_0")
We can study the resulting Markov chain in more detail via the current Markov chain
attribute MC
of the KDA
object.
print(kda.MC)
This gives the following output:
MC with 5 states.
Ergodic classes: [[2, 0], [3]].
Transient classes: [[1], [4]].
So KDA led to a Markov multi-chain with two ergodic classes and two transient classes.
We can also study the edges that KDA cut via the log
attribute of the KDA
object.
print(kda.log['edges cut'])
This gives the following output:
[[None], [(4, 0), (1, 4), (2, 1), (0, 1), (3, 4)]]
We can also study the Markov chains that KDA found in each (outer) iteration via
kda.log['Markov chains']
)`.
As another KDA application example, let us apply KDA until we find two ergodic classes explicitly. We will also ensure that the Kemeny constant derivatives are recalculated after each cut (and normalize the cut transition matrix to ensure it is a stochastic matrix again). To that end, we use:
kda2 = KDA(
original_MC=MC, CO_A="CO_A_2(2)", CO_B="CO_B_1(1)", symmetric_cut=False
)
kda2.run()
kda2.plot(file_name="An artificial Markov chain after KDA_A2_2_B1_1")
which gives (edges (4, 0) and (1, 4) are cut in two iterations):
To learn more about pykda
have a look at the documentation. There, you can
also find links to interactive Google Colab notebooks in examples. If you
have any questions, feel free to open an issue here on Github Issues.
If you use pykda
in your research, please consider citing the following paper:
Joost Berkhout, Bernd F. Heidergott (2019). Analysis of Markov influence graphs. Operations Research, 67(3):892-904. https://doi.org/10.1287/opre.2018.1813
Or, using the following BibTeX entry:
@article{Berkhout_Heidergott_2019,
title = {Analysis of {Markov} influence graphs},
volume = {67},
number = {3},
journal = {Operations Research},
author = {Berkhout, J. and Heidergott, B. F.},
year = {2019},
pages = {892--904},
}