Probabilistic Models
Probabilistic Models
Probabilistic Models
Winter 2021
Instructors: Indranil Saha & Jyo Deshmukh
USC Viterbi
School of Engineering
Department of Computer Science
Probabilistic Models
Models for components that we studied so far were either deterministic or
nondeterministic.
The goal of such models is to represent computation or time-evolution of a
physical phenomenon.
These models do not do a great job of capturing uncertainty.
We can usually model uncertainty using probabilities, so probabilistic
models allow us to account for likelihood of environment behaviors
Machine learning/AI algorithms also require probabilistic modelling!
USC Viterbi
School of Engineering
Department of Computer Science
2
Stochastic Process
Collection of finite or infinite random variables, indexed by time
Discrete: 𝑋 𝑘 , 𝑘 ∈ ℕ ∪ {0}
Continuous: 𝑋(𝑡), 𝑡 ∈ ℝ
Joint distribution of a (discrete-time) stochastic process
𝐹𝑋 𝑑1 , … , 𝑑𝑛 ; 𝑡1 , … , 𝑡𝑛 = 𝑃 𝑋 𝑡1 < 𝑑1 , … , 𝑋 𝑡𝑛 < 𝑑𝑛
USC Viterbi
School of Engineering
Department of Computer Science
3
Markov chains
Markov process: special case of a stochastic process
Markov property:
𝑃 𝑋 𝑡𝑛+1 = 𝑑𝑛+1 |𝑋 𝑡0 = 𝑑0 , 𝑋 𝑡1 = 𝑑1 , … , 𝑋 𝑡𝑛 = 𝑑𝑛 =
𝑃 𝑋 𝑡𝑛+1 = 𝑑𝑛+1 | 𝑋 𝑡𝑛 = 𝑑𝑛
USC Viterbi
School of Engineering
Department of Computer Science
4
Discrete-time Markov chain (DTMC)
DTMC is a time-homogeneous Markov process
Each step in the process takes the same time
Time-steps are discrete
State-space 𝑄 is usually discrete (values taken by the time-indexed random
variables)
USC Viterbi
School of Engineering
Department of Computer Science
5
Formal definition: DTMC as a transition system
USC Viterbi
School of Engineering
Department of Computer Science
6
Markov chain example: Driver modeling
0 0
0.3
𝑝, 𝑞
0.1 0.4
¬𝑝, ¬𝑞 Constant
Accelerate
0.2 Speed
0 0.5 𝑞: Checking cellphone
0.8 0.5
0.4 0.05 𝑝: Feeling sleepy
𝑝, 𝑞 Idling Brake
0.5
¬𝑝, 𝑞 0.05
0.2 1
USC Viterbi
School of Engineering
Department of Computer Science
7
Markov chain: Transition probability matrix
0 0
0.3
𝑝, 𝑞
0.1 0.4 A C B I
¬𝑝, ¬𝑞 Constant A 0.3 0.2 0.4 0
Accelerate
0.2 Speed
C 0.1 0.4 0.5 0
0 0.5 0.5 B 0.4 0.05 0.05 0.5
0.8 0.4 0.05
I 0.8 0 0 0.8
𝑝, 𝑞 Idling Brake
0.5
¬𝑝, 𝑞 0.05
0.2 1
USC Viterbi
School of Engineering
Department of Computer Science
8
Probability of moving 𝑛 steps
Transition probabilities matrix 𝑀, where 𝑀 𝑞, 𝑞 ′ = 𝑃 𝑞, 𝑞 ′
𝑃𝑛 (𝑞, 𝑞 ′ ) : probability of going from state 𝑞 to 𝑞′ in 𝑛 steps
Chapman-Kolmogorov Equation:
𝑃𝑚+𝑛 𝑞, 𝑞 ′ = σ𝑞′′ 𝑃𝑚 𝑞, 𝑞 ′′ 𝑃𝑛 𝑞′′, 𝑞 ′
Corollary: 𝑃𝑘 𝑞, 𝑞 ′ = 𝑀𝑘 𝑞, 𝑞 ′
USC Viterbi
School of Engineering
Department of Computer Science
9
Geometric distribution for discrete random variables
USC Viterbi
School of Engineering
Department of Computer Science
10
Residence times
Residence time 𝜏 in a state 𝑠 of a Markov chain is an r.v. with geometric
distribution
𝑃 𝜏 = 1 = 1 − 𝑃 𝑠, 𝑠 ,
𝑃 𝜏 = 2 = 𝑃 𝑠, 𝑠 1 − 𝑃 𝑠, 𝑠
…
𝑃 𝜏 = 𝑛 = 𝑃 𝑠, 𝑠 𝑛−1 1 − 𝑃 𝑠, 𝑠
What is the expected time you stay in a state? What is the variance?
Hint: Residence time is a r.v. with geometric distribution
USC Viterbi
School of Engineering
Department of Computer Science
11
Types of states in Markov chains
USC Viterbi
School of Engineering
Department of Computer Science
12
First passage probability
First passage : reaching state 𝑞𝑗 from state 𝑞𝑖 for the first time
𝑓𝑖𝑗 𝑛 = 𝑃 𝑋1 ≠ 𝑞𝑖 , 𝑋2 ≠ 𝑞𝑖 , … 𝑋𝑛−1 ≠ 𝑞𝑖 , 𝑋𝑛 = 𝑞𝑗 |𝑋0 = 𝑞𝑖
Probability of ever reaching 𝑞𝑗 from 𝑞𝑖 : σ∞
𝑛=1 𝑓𝑖𝑗 (𝑛)
𝑓𝑖𝑗 (𝑛) each constitutes a probability distribution, but we cannot say
anything about the infinite sum in general
Persistent if: σ∞𝑛=1 𝑓𝑖𝑖 (𝑛) = 1
Transient if: σ∞𝑛=1 𝑓𝑖𝑖 (𝑛) < 1
USC Viterbi
School of Engineering
Department of Computer Science
13
Stationary distributions
Probability distribution that remains unchanged in the Markov chain as time
progresses
Given as a row vector 𝜋 whose entries sum to 1
Satisfies 𝜋 = 𝜋𝑀
Gives information about the “limiting” behavior of the Markov chain or
stability of the random process
USC Viterbi
School of Engineering
Department of Computer Science
14
Stationary distributions
How to find stationary distribution?
𝜋 = 𝜋𝑀
𝑀 𝑇 𝜋 𝑇 = 1. 𝜋 𝑇 [ Same form as 𝐴𝑣 = 𝜆𝑣 ]
transposed 𝑀 has eigenvectors 𝑣 corresponding to eigenvalue 1
such a 𝑣 is a column vector containing the stationary distribution
Stationary distribution: left eigenvector of 𝑀
Related to limiting distribution of Markov chain: over very long time
horizons, how does the MC behave?
USC Viterbi
School of Engineering
Department of Computer Science
15
Robot trying to reach moving target example
What is the expected time before robot reaches
target?
3 G
What’s the probability that robot reaches target
2 within the next 2 steps?
1 R
What’s the probability that the robot hits a wall
1 2 3 4 5 6 7
before getting to the target
Rules of the Game
Each timestep the target and robot move randomly
to an adjacent cell or stay in the same cell (with
some probability, possibly different for each cell)
When the robot and target occupy the same cell,
robot declares victory
USC Viterbi
School of Engineering
Department of Computer Science
16
Robot trying to reach moving target example
If robot knows the cell in which the target is (fully
3 G observable), then this is simply a Markov chain
2
Each state is a pair (𝑖, 𝑗), where 𝑖 is the cell
1 R
occupied by R, and 𝑗 is the cell occupied by G
1 2 3 4 5 6 7 Movement of robot and target is independent, so
𝑃2 𝑅 = (1,1), 𝐺 = (3,3) , 𝑅 = (2,2), 𝐺 = (2,2) P 𝑖, 𝑗 , 𝑖 ′ , 𝑗 ′ = 𝑃𝑅 𝑖, 𝑖 ′ 𝑃𝐺 (𝑗, 𝑗 ′ )
= 𝑀2 1,1 , (3,3) , 2,2 , (2,2) Compute new transition probability matrix
For any initial configuration, you can find answers
by using the Chapman-Kolmogorov equations
USC Viterbi
School of Engineering
Department of Computer Science
17
What if robot cannot see all the state?
Robot with noisy proximity sensors:
W W W W W W
R R
G G G
True state Observed noisy state
Target state is hidden : if it is not proximal, robot does not know where the target is, and if it is proximal,
robot only has noisy estimates
We can assume robot knows how the target moves (left, right, top, down), and the uncertainty (as
captured by the transition probability matrix): this is like the process model in KF
The robot’s sensors are noisy, this is like the measurement model in KF
Question: Given a series of (noisy) observations, can the robot estimate where the target is?
Can model this problem using Hidden Markov Models (HMMs). Algorithms are very similar to Kalman Filter
USC Viterbi
School of Engineering
Department of Computer Science
18
Hidden Markov Models
Full system state is never observable, so HMMs only make observations or
outputs available
HMM is a tuple: 𝑄, 𝑃, 𝑍, 𝑂, 𝐼, 𝐴𝑃, 𝐿
𝑄, 𝑃, 𝐼, 𝐴𝑃, 𝐿 as before
𝑍: set of observations
𝑂: Conditional probability of observing 𝑧 when in state 𝑞 ∈ 𝑄
USC Viterbi
School of Engineering
Department of Computer Science
19
Interesting Problems for HMMs
[Decoding] Given a sequence of observations, can you estimate the hidden
state sequence? [Solution with the Viterbi Algorithm]
[Likelihood] Given an HMM and an observation sequence, what is the
likelihood of that observation sequence [Dynamic Programming based
Forward Algorithm]
[Learning] Given an observation sequence (or sequences), learn the HMM
that maximizes the likelihood of that sequence [Baum-Welch or forward-
backward algorithm]
USC Viterbi
School of Engineering
Department of Computer Science
20
Viterbi Algorithm
Generate path 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 : a sequence of (hidden) states, that
generate observations 𝑌 = 𝑦1 , 𝑦2 , … , 𝑦𝑛
Two 2-dimensional tables of size 𝑄 × 𝑛 ∶ 𝑇1 , 𝑇2
𝑇1 𝑖, 𝑗 ∶ probability of most likely path that reaches state 𝑞𝑖 in step 𝑗
𝑇2 𝑖, 𝑗 ∶ state 𝑞𝑗−1 of the most likely path ∀𝑗: 2 ≤ 𝑗 ≤ 𝑛
Table entries filled recursively (dynamic programming approach):
𝑇1 𝑖, 𝑗 = max 𝑇1 𝑘, 𝑗 − 1 . 𝑀 𝑘, 𝑗 . 𝑂(𝑖, 𝑦𝑗 )
𝑘
𝑇2 𝑖, 𝑗 = argmax𝑘 𝑇1 𝑘, 𝑗 − 1 . 𝑀 𝑘, 𝑗 . 𝑂 𝑖, 𝑦𝑖
USC Viterbi
School of Engineering
Department of Computer Science
21
Probabilistic CTL
LTL
Can be interpreted over individual executions
Can be interpreted over a state machine: do all paths satisfy property
CTL
Is interpreted over a computation tree
PCTL
Is interpreted over a discrete-time Markov chain
Encodes uncertainties in computation due to environment etc.
USC Viterbi
School of Engineering
Department of Computer Science
22
Probabilistic CTL
Syntax of PCTL
𝜑 ∷= 𝑝 ¬𝜑 𝜑 ∧ 𝜑 | Prop. in 𝐴𝑃, negation, conjunction
(State) 𝑃∼𝜆 𝜓 | ∼∈ {<, ≤, >, ≥}, 𝜆 ∈ [0,1] : Probability of 𝜓 being true
𝜓 ∷= 𝐗𝜑 | NeXt Time
(Path) 𝜑 𝐔 ≤𝑘 𝜑 | Bounded Until (upto 𝑘 steps)
𝜑𝐔𝜑 Until (Recall 𝐅𝜑 = 𝑡𝑟𝑢𝑒 𝐔 𝜑, and 𝐆𝜑 = ¬𝐅¬𝜑
PCTL formulas are state formulas, path formulas used to define how to build a PCTL formula
USC Viterbi
School of Engineering
Department of Computer Science
23
Semantics
Semantics of path formulas is straightforward (similar to LTL/CTL)
Semantics of state formula with Probabilistic operator:
𝑃𝑟𝑜𝑏 𝑞, 𝐗𝜑 : σ𝑞′⊨𝜑 𝑃 𝑞, 𝑞 ′ 𝑞0 0.4
𝑝 ¬𝑝
0.2 0.1
Does𝑃≥0.5 𝐗 𝑝 hold in state q0 ? 𝑞2 𝑞3
No, because 𝑃 𝑞0 , 𝐗 𝑝 = 0.1 + 0.2 = 0.3
𝑞1 𝑝
Semantics of state formula with Until 𝑃𝑟𝑜𝑏 𝑞, 𝛼𝐔 ≤𝒌 𝛽 :
1 if 𝑞 ⊨ 𝛽, otherwise
0 if 𝑞⊭𝛼, otherwise
′ ′ 𝑘−1
σ 𝑃 𝑞, 𝑞 . 𝑃𝑟𝑜𝑏(𝑞 , 𝛼 𝐔 𝛽) for 𝑘 ≥ 0
USC Viterbi
School of Engineering
Department of Computer Science
24
PCTL 𝑟: Checking cellphone
Does this formula 𝑃≥0.5 𝐗𝑝 hold in 𝑝: Feeling sleepy
state Brake?
Yes 0 0.3 0
What value of 𝜖 will make the 𝑝, 𝑟
¬𝑝, 0.1 0.4
formula 𝑃≥𝜖 (𝐅 ≤1 𝑟) true in state 𝐴: Constant
¬𝑟 Accelerate
Zero steps: 0 0.2 Speed
One step: 𝑃 𝐴, 𝐵 + 𝑃 𝐴, 𝐶 0 0.5
0.5 + 0.2 = 0.7 0.8 0.5
0.4 0.05
𝜖 = 0.7
I.e. with probability ≥ 0.7, driver 𝑝, 𝑟 Idling Brake
checks cell phone within 1 step of 0.5
accelerating ¬𝑝, 𝑟 0.05
0.2 1
USC Viterbi
School of Engineering
Department of Computer Science
25
Quantitative in PCTL vs. Qualitative in CTL
Toss a coin repeatedly until “tails” is thrown
1
Is “tails” eventually thrown along all paths? 𝑞1 ℎ𝑒𝑎𝑑𝑠
CTL: 𝐴𝐹 tails
Result: false 0.5
Why? 𝑞0 𝑞1 𝑞0 𝑞1 … 𝑞0
0.5 1
Is the probability of eventually thrown “tails”
equal to 1?
PCTL: 𝑃≥1 ( 𝐅 𝑡𝑎𝑖𝑙𝑠 ) 𝑞2 𝑡𝑎𝑖𝑙𝑠
Result: true
Probability of path 𝑞0 𝑞1 𝑞0 𝑞1 … is zero!
USC Viterbi
School of Engineering
Department of Computer Science
26
Continuous Time Markov Chains
USC Viterbi
School of Engineering
Department of Computer Science
27
Exponential distribution
Continuous random variable : probabilisty density function 𝑓 𝑥 (≥ 0 ∀𝑥)
𝑑
𝑃 𝑋≤𝑑 = −∞ 𝑓 𝑥 𝑑𝑥
A negative exponentially distributed random variable X with rate 𝜆 > 0, has
probability density function (pdf) 𝑓𝑋 𝑥 defined as follows:
−𝜆𝑥
𝑓𝑋 𝑥 = ቊ
𝜆𝑒 if x > 0
0 if x ≤ 0
1
𝐸𝑋 =
𝜆
1
𝑉𝑎𝑟 𝑋 =
𝜆2
USC Viterbi
School of Engineering
Department of Computer Science
28
Exponential distribution properties
Cumulative distribution function (CDF) of 𝑋 is then:
𝑑 𝑑 𝑑
𝐹𝑋 𝑑 =𝑃 𝑋≤𝑑 = −∞ 𝑓𝑋 𝑥 𝑑𝑥 = 0 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = −𝑒 −𝜆𝑥
= (1 − e−𝜆𝑑 )
0
I.e. zero probability of doing transition out of a state in duration 𝑑 = 0, but probability
becomes 1 as 𝑑 → ∞
Fun exercise: show that above CDF is memoryless, i.e. 𝑃 𝑋 > 𝑡 + 𝑑 𝑋 > 𝑡) = 𝑃(𝑋 > 𝑑)
Fun exercise 2: If 𝑋 and 𝑌 are r.v.s negatively exponentially distributed with rates 𝜆 and 𝜇,
𝜆
then 𝑃 𝑋 ≤ 𝑌 = 𝜆+𝜇
Fun exercise 3: If 𝑋1 , … , 𝑋𝑛 are negative exponentially distributed with rates 𝜆1 , … , 𝜆𝑛 :
𝜆𝑖
𝑃 𝑋𝑖 = min 𝑋1 , … , 𝑋𝑛 = 𝑛
σ𝑗=1 𝜆𝑗
USC Viterbi
School of Engineering
Department of Computer Science
29
CTMC example
Tuple (𝑄, 𝑃, 𝐼, 𝑟, 𝐴𝑃, 𝐿)
0.1 0.6 𝑄 is a finite set of states
0.1 0.4 𝑃: 𝑄 × 𝑄 → [0,1] is a transition probability function
𝑙𝑎𝑛𝑒𝑖 𝑙𝑎𝑛𝑒𝑖+1 𝐼: 𝑄 → [0,1] is the init. dist. σ𝑞∈𝑄 𝐼 𝑞 = 1
0.3
5 𝐴𝑃 is a set of Boolean propositions, and 𝐿: 𝑆 → 2𝐴𝑃 is a
0.6 function that assigns some subset of Boolean propositions to
0.8 each state
𝑙𝑎𝑛𝑒𝑖−1 0.2 𝑟 𝑞 : Q → ℝ>0 is the exit-rate function
Residence time in state 𝑞 negative exponentially distributed with
0.5
rate 𝑟 𝑞
1
Average residence time in state 𝑞 =
𝑟 𝑞
Bigger the exit-rate, shorter the average residence time
USC Viterbi
School of Engineering
Department of Computer Science
30
CTMC transition probability
Transition rate 𝑅 𝑞, 𝑞′ = 𝑟 𝑞 𝑃 𝑞, 𝑞′
Rate of exiting 𝑞 Probability of going to 𝑞’
USC Viterbi
School of Engineering
Department of Computer Science
31
CTMC example Probability to go from state 𝑙𝑎𝑛𝑒𝑖+1 to 𝑙𝑎𝑛𝑒𝑖−1 is:
𝑃 𝑋𝑖+1,𝑖 ≤ 𝑋𝑖+1,𝑖+1 ∩ 𝑋𝑖,𝑖−1 ≤ min(𝑋𝑖,𝑖 , 𝑋𝑖,𝑖+1 )
𝑅 𝑖+1,𝑖 𝑅 𝑖,𝑖−1
0.1 0.6
𝑅 𝑖+1,𝑖+1 +𝑅(𝑖+1,𝑖) 𝑅 𝑖,𝑖+1 +𝑅 𝑖,𝑖 +𝑅(𝑖,𝑖−1)
0.1 0.4
𝑙𝑎𝑛𝑒𝑖 𝑙𝑎𝑛𝑒𝑖+1
0.3
5 What is the probability of changing to some lane from 𝑙𝑎𝑛𝑒𝑖 in
0.6
0.8
0, 𝑡 seconds?
𝑡
𝑙𝑎𝑛𝑒𝑖−1
0 𝑟 𝑙𝑎𝑛𝑒𝑖 𝑒 −𝑟 𝑙𝑎𝑛𝑒𝑖 𝑥 𝑑𝑥 = (1 − 𝑒 −𝑟 𝑙𝑎𝑛𝑒𝑖 𝑡 )
0.2
0.5
USC Viterbi
School of Engineering
Department of Computer Science
32
CTMC + PCTL
𝑃≥0.5 𝑋 𝑝
𝑠1
¬𝑝 Recall that 𝑅 𝑠, 𝑠 ′ = 𝑃 s, s′ 𝑟(s)
0.5 𝑟𝑦
Also recall that 𝑃 𝑦 < min 𝑥𝑖 =
𝑖 𝑟𝑦 +σ𝑟𝑥𝑖
𝑠0 𝑠2 𝑝 Probability to go from 𝑠0 to 𝑠2 is:
0.3
𝑅 𝑠0 ,𝑠2
𝑟 𝑠0 = 2 𝑃 𝑋𝑠0 ,𝑠2 ≤ min 𝑋𝑠0 ,𝑠1 , 𝑋𝑠0 ,𝑠3 =
𝑟 𝑠0
0.2
𝑠3 ¬𝑝
𝑃(𝐹 0,0.5 𝑝)?
𝑠,𝑠 ′ 𝑡
𝑃 𝑠, 𝑠 ′ enabled in 0, 𝑡 is 1 − 𝑒 −𝑅
𝑅 𝑠0 ,𝑠2
𝑃 𝐹 0,0.5 𝑝 = 1 − 𝑒 −𝑅 𝑠0 ,𝑠2 ×0.5
𝑟 𝑠0
USC Viterbi
School of Engineering
Department of Computer Science
33
Bibliography
Baier, Christel, Joost-Pieter Katoen, and Kim Guldstrand Larsen. Principles of model checking. MIT press, 2008.
Continuous Time Markov Chains: https://resources.mpi-inf.mpg.de/departments/rg1/conferences/vtsa11/slides/katoen/lec01_handout.pdf
USC Viterbi
School of Engineering
Department of Computer Science
34