Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

ProbabilisticProgramming SUT 2024

Uploaded by

Zaid Ullah Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

ProbabilisticProgramming SUT 2024

Uploaded by

Zaid Ullah Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Probabilistic

Programming

Bayesian
Learning Probabilistic Programming
Probabilistic
Programming An Introduction to Applications in Machine Learning
Turing

Inference

Amirabbas Asadi
MCMC
Variational Inference

Differentiable
Programming amir.asadi78@sharif.edu
Learning
Resources
Sharif University of Technology
Department of Mathematical Sciences

2024
Presentation Outline

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
1 Bayesian Learning
Turing
2 Probabilistic Programs
Inference
MCMC 3 Approximate Bayesian Inference
Variational Inference

Differentiable
Markov Chain Monte Carlo
Programming Variational Inference
Learning
Resources
4 Differentiable Programming
Bayesian Learning

Probabilistic
Programming

Can you guess the next number in the following sequence?


Bayesian
Learning

Probabilistic
2, 4, 6, 8, 10, ?
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Can you guess the next number in the following sequence?


Bayesian
Learning

Probabilistic
2, 4, 6, 8, 10, ?
Programming

What function could have generated this sequence?


Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Can you guess the next number in the following sequence?


Bayesian
Learning

Probabilistic
2, 4, 6, 8, 10, ?
Programming

What function could have generated this sequence?


Turing

Inference
MCMC
Variational Inference
𝑓1 (𝑛) = 2𝑛
Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Can you guess the next number in the following sequence?


Bayesian
Learning

Probabilistic
2, 4, 6, 8, 10, ?
Programming

What function could have generated this sequence?


Turing

Inference
MCMC
Variational Inference
𝑓1 (𝑛) = 2𝑛
Differentiable
Programming

Learning
Resources 𝑓2 (𝑛) = 0.0167𝑛5 − 0.25𝑛4 + 1.4167𝑛3 − 3.75𝑛2 + 6.5667𝑛 − 2
Bayesian Learning

Probabilistic
Programming

Can you guess the next number in the following sequence?


Bayesian
Learning

Probabilistic
2, 4, 6, 8, 10, ?
Programming

What function could have generated this sequence?


Turing

Inference
MCMC
Variational Inference
𝑓1 (𝑛) = 2𝑛
Differentiable
Programming

Learning
Resources 𝑓2 (𝑛) = 0.0167𝑛5 − 0.25𝑛4 + 1.4167𝑛3 − 3.75𝑛2 + 6.5667𝑛 − 2

𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6


Bayesian Learning

Probabilistic
Programming

𝐻1 ∶ 𝑓1 (𝑛) = 2𝑛
Bayesian
Learning 𝐻2 ∶ 𝑓2 (𝑛) = 0.0167𝑛5 −0.25𝑛4 +1.4167𝑛3 −3.75𝑛2 +6.5667𝑛−2
Probabilistic
Programming
Turing
𝐻3 ∶ 𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6
Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

𝐻1 ∶ 𝑓1 (𝑛) = 2𝑛
Bayesian
Learning 𝐻2 ∶ 𝑓2 (𝑛) = 0.0167𝑛5 −0.25𝑛4 +1.4167𝑛3 −3.75𝑛2 +6.5667𝑛−2
Probabilistic
Programming
Turing
𝐻3 ∶ 𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6
Inference
MCMC
All functions reproduce the data exactly
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

𝐻1 ∶ 𝑓1 (𝑛) = 2𝑛
Bayesian
Learning 𝐻2 ∶ 𝑓2 (𝑛) = 0.0167𝑛5 −0.25𝑛4 +1.4167𝑛3 −3.75𝑛2 +6.5667𝑛−2
Probabilistic
Programming
Turing
𝐻3 ∶ 𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6
Inference
MCMC
All functions reproduce the data exactly
Variational Inference

Differentiable
Programming All have the same likelihood
Learning
Resources
Bayesian Learning

Probabilistic
Programming

𝐻1 ∶ 𝑓1 (𝑛) = 2𝑛
Bayesian
Learning 𝐻2 ∶ 𝑓2 (𝑛) = 0.0167𝑛5 −0.25𝑛4 +1.4167𝑛3 −3.75𝑛2 +6.5667𝑛−2
Probabilistic
Programming
Turing
𝐻3 ∶ 𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6
Inference
MCMC
All functions reproduce the data exactly
Variational Inference

Differentiable
Programming All have the same likelihood
Learning
Resources
𝑝(𝐷|𝐻1 ) = 𝑝(𝐷|𝐻2 ) = 𝑝(𝐷|𝐻3 )
Bayesian Learning

Probabilistic
Programming

𝐻1 ∶ 𝑓1 (𝑛) = 2𝑛
Bayesian
Learning 𝐻2 ∶ 𝑓2 (𝑛) = 0.0167𝑛5 −0.25𝑛4 +1.4167𝑛3 −3.75𝑛2 +6.5667𝑛−2
Probabilistic
Programming
Turing
𝐻3 ∶ 𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6
Inference
MCMC
All functions reproduce the data exactly
Variational Inference

Differentiable
Programming All have the same likelihood
Learning
Resources
𝑝(𝐷|𝐻1 ) = 𝑝(𝐷|𝐻2 ) = 𝑝(𝐷|𝐻3 )

Then why do people choose the first one for the same data???
Bayesian Learning

Probabilistic
Programming

𝐻1 ∶ 𝑓1 (𝑛) = 2𝑛
Bayesian
Learning 𝐻2 ∶ 𝑓2 (𝑛) = 0.0167𝑛5 −0.25𝑛4 +1.4167𝑛3 −3.75𝑛2 +6.5667𝑛−2
Probabilistic
Programming
Turing
𝐻3 ∶ 𝑓3 (𝑛) = 0.05𝑛5 − 0.75𝑛4 + 4.25𝑛3 − 11.25𝑛2 + 15.7𝑛 − 6
Inference
MCMC
All functions reproduce the data exactly
Variational Inference

Differentiable
Programming All have the same likelihood
Learning
Resources
𝑝(𝐷|𝐻1 ) = 𝑝(𝐷|𝐻2 ) = 𝑝(𝐷|𝐻3 )

Then why do people choose the first one for the same data???

It seems people as a prior belief prefer 𝐻1 over other options


Bayesian Learning

Probabilistic
Programming
But How can we quantify and take into account a prior belief?
Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming
But How can we quantify and take into account a prior belief?
Bayesian We can formulate our prior belief 𝑝(𝐻) as distribution over all
Learning
hypothesis
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming
But How can we quantify and take into account a prior belief?
Bayesian We can formulate our prior belief 𝑝(𝐻) as distribution over all
Learning
hypothesis
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference
𝑝(𝐷|𝐻)
Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming
But How can we quantify and take into account a prior belief?
Bayesian We can formulate our prior belief 𝑝(𝐻) as distribution over all
Learning
hypothesis
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference
𝑝(𝐷|𝐻)𝑝(𝐻)
Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming
But How can we quantify and take into account a prior belief?
Bayesian We can formulate our prior belief 𝑝(𝐻) as distribution over all
Learning
hypothesis
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference
𝑝(𝐷|𝐻)𝑝(𝐻)
Differentiable
Programming To be a valid probability distribution, a normalization constant is
Learning
Resources
needed

𝑝(𝐷|𝐻)𝑝(𝐻)
𝑝(𝐻|𝐷) =
𝑝(𝐷)
Bayesian Learning

Probabilistic
Programming
But How can we quantify and take into account a prior belief?
Bayesian We can formulate our prior belief 𝑝(𝐻) as distribution over all
Learning
hypothesis
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference
𝑝(𝐷|𝐻)𝑝(𝐻)
Differentiable
Programming To be a valid probability distribution, a normalization constant is
Learning
Resources
needed

𝑝(𝐷|𝐻)𝑝(𝐻)
𝑝(𝐻|𝐷) =
𝑝(𝐷)

Bayes Theorem
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing
Bayesian Learning provides a natural framework for updating
Inference our beliefs.
MCMC
Variational Inference

Differentiable
𝑝(𝐻) →𝐷1 𝑝(𝐻|{𝐷1 }) →𝐷2 𝑝(𝐻|{𝐷1 , 𝐷2 }) →𝐷3 ⋅ ⋅ ⋅
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Learning

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Models

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing How to represent a probabilstic model?
Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Models

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing How to represent a probabilstic model?
Inference
MCMC
Variational Inference
A table of all possible events!
Differentiable
Programming

Learning
Resources
Probabilistic Models

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing How to represent a probabilstic model?
Inference
MCMC
Variational Inference
A table of all possible events!
Differentiable Probabilistic Graphical Models
Programming

Learning
Resources
Probabilistic Models

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing How to represent a probabilstic model?
Inference
MCMC
Variational Inference
A table of all possible events!
Differentiable Probabilistic Graphical Models
Programming

Learning Probabilistic Programs


Resources
Probabilistic Models

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC Can we represent a joint density with an algorithm?
Variational Inference

Differentiable
Programming

Learning
Resources
Beta-Binomial Example

Probabilistic
Programming

Bayesian
Learning @model function coin_toss(N, X)
Probabilistic θ ~ Beta(1.0, 1.0)
Programming
Turing

Inference for i in 1:N


MCMC
Variational Inference X[i] ~ Bernoulli(θ)
Differentiable end
Programming

Learning
end
Resources
Beta-Binomial Example

Probabilistic
Programming

Bayesian
Learning @model function coin_toss(N, X)
Probabilistic θ ~ Beta(1.0, 1.0)
Programming
Turing

Inference for i in 1:N


MCMC
Variational Inference X[i] ~ Bernoulli(θ)
Differentiable end
Programming

Learning
end
Resources
𝑝(𝜃, 𝑋)
Beta-Binomial Example

Probabilistic
Programming

Bayesian X = [1, 1, 1, 1, 0]
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Beta-Binomial Example

Probabilistic
Programming

Bayesian X = [1, 1, 1, 1, 0]
Learning

Probabilistic @model function coin_toss(N, X)


Programming
Turing
θ ~ Beta(1.0, 1.0)
Inference
MCMC
Variational Inference
for i in 1:N
Differentiable X[i] ~ Bernoulli(θ)
Programming
end
Learning
Resources end
Beta-Binomial Example

Probabilistic
Programming

Bayesian X = [1, 1, 1, 1, 0]
Learning

Probabilistic @model function coin_toss(N, X)


Programming
Turing
θ ~ Beta(1.0, 1.0)
Inference
MCMC
Variational Inference
for i in 1:N
Differentiable X[i] ~ Bernoulli(θ)
Programming
end
Learning
Resources end
𝑝(𝜃|𝑋)
Probabilistic Programs

Probabilistic
Programming
Probabilistic Programs offer more representation power than
Bayesian PGMs!
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Programs

Probabilistic
Programming
Probabilistic Programs offer more representation power than
Bayesian PGMs!
Learning

Probabilistic
Programming
Turing
@model function program()
Inference T ~ Geometric(0.1)
MCMC
Variational Inference
S = 0
Differentiable X = Vector{Any}(undef, T)
Programming
for t ∈ 1:T
Learning
Resources X[t] ~ Bernoulli(0.5)
S = S + X[t]
end
return S
end
Probabilistic Programming Languages

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Programming Languages

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Programming Languages

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Programming Languages

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Probabilistic Programming Languages

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Why is inference difficult?

Probabilistic
Programming

Bayesian 𝑝(𝑥|𝑧)𝑝(𝑧)
Learning 𝑝(𝑧|𝑥) =
𝑝(𝑥)
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Why is inference difficult?

Probabilistic
Programming

Bayesian 𝑝(𝑥|𝑧)𝑝(𝑧)
Learning 𝑝(𝑧|𝑥) =
𝑝(𝑥)
Probabilistic
Programming
Turing
To obtain 𝑝(𝑥) we have to marginalize all possible hypotheses:
Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Why is inference difficult?

Probabilistic
Programming

Bayesian 𝑝(𝑥|𝑧)𝑝(𝑧)
Learning 𝑝(𝑧|𝑥) =
𝑝(𝑥)
Probabilistic
Programming
Turing
To obtain 𝑝(𝑥) we have to marginalize all possible hypotheses:
Inference
MCMC
Variational Inference 𝑝(𝑥) = ∫ 𝑝(𝑥, 𝑧)𝑑𝑧
Differentiable
Programming

Learning
Resources
Why is inference difficult?

Probabilistic
Programming

Bayesian 𝑝(𝑥|𝑧)𝑝(𝑧)
Learning 𝑝(𝑧|𝑥) =
𝑝(𝑥)
Probabilistic
Programming
Turing
To obtain 𝑝(𝑥) we have to marginalize all possible hypotheses:
Inference
MCMC
Variational Inference 𝑝(𝑥) = ∫ 𝑝(𝑥, 𝑧)𝑑𝑧
Differentiable
Programming

Learning
Now imagine what does 𝑝(𝑥) look like if we have used
Resources something like neural networks inside the model!
Why is inference difficult?

Probabilistic
Programming

Bayesian 𝑝(𝑥|𝑧)𝑝(𝑧)
Learning 𝑝(𝑧|𝑥) =
𝑝(𝑥)
Probabilistic
Programming
Turing
To obtain 𝑝(𝑥) we have to marginalize all possible hypotheses:
Inference
MCMC
Variational Inference 𝑝(𝑥) = ∫ 𝑝(𝑥, 𝑧)𝑑𝑧
Differentiable
Programming

Learning
Now imagine what does 𝑝(𝑥) look like if we have used
Resources something like neural networks inside the model!

So the posterior turns out to be intractable


Probabilistic
Programming

Bayesian
Learning If we choose likelihood and prior carefully, Exact Inference is
Probabilistic possible.
Programming
Turing
Likelihood Conjugate Prior
Inference
MCMC Bernoulli Beta
Binomial Beta
Variational Inference

Differentiable
Programming Poisson Gamma
Learning
Resources
Categorical Dirichlet
Uniform Pareto
Approximate Inference Methods

Probabilistic
Programming
Exact Inference is not possible for most of the models

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Approximate Inference Methods

Probabilistic
Programming
Exact Inference is not possible for most of the models

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources

Fortunately there are some methods for approximate inference


Markov Chain Monte Carlo

Probabilistic
Programming posterior 𝑝(𝑧𝑧|𝑥
𝑥) is intractable

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Markov Chain Monte Carlo

Probabilistic
Programming posterior 𝑝(𝑧𝑧|𝑥
𝑥) is intractable

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources

If we somehow generate enough number of samples from 𝑝(𝑧𝑧|𝑥


𝑥) then
we can estimate our desired quantities.
Markov Chain Monte Carlo

Probabilistic
Programming Consider a particle in ℝ𝑛 with an initial position (state) 𝑋0 .
When the particle is in a position 𝑋𝑡 it will move to a position
Bayesian
Learning
𝑋𝑡+1 with probability 𝑝(𝑋𝑡+1 |𝑋𝑡 ) so we have a sequence of
Probabilistic random variables
Programming
Turing

Inference 𝑋0 , 𝑋1 , 𝑋2 , ...
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Markov Chain Monte Carlo

Probabilistic
Programming Consider a particle in ℝ𝑛 with an initial position (state) 𝑋0 .
When the particle is in a position 𝑋𝑡 it will move to a position
Bayesian
Learning
𝑋𝑡+1 with probability 𝑝(𝑋𝑡+1 |𝑋𝑡 ) so we have a sequence of
Probabilistic random variables
Programming
Turing

Inference 𝑋0 , 𝑋1 , 𝑋2 , ...
MCMC
Variational Inference
Such a stochastic process is called Markov Chain
Differentiable
Programming

Learning
Resources
Markov Chain Monte Carlo

Probabilistic
Programming Consider a particle in ℝ𝑛 with an initial position (state) 𝑋0 .
When the particle is in a position 𝑋𝑡 it will move to a position
Bayesian
Learning
𝑋𝑡+1 with probability 𝑝(𝑋𝑡+1 |𝑋𝑡 ) so we have a sequence of
Probabilistic random variables
Programming
Turing

Inference 𝑋0 , 𝑋1 , 𝑋2 , ...
MCMC
Variational Inference
Such a stochastic process is called Markov Chain
Differentiable
Programming Under some conditions after a time 𝜏 the Markov Chain will
Learning forget it’s initial state and becomes stationary
Resources
Markov Chain Monte Carlo

Probabilistic
Programming Consider a particle in ℝ𝑛 with an initial position (state) 𝑋0 .
When the particle is in a position 𝑋𝑡 it will move to a position
Bayesian
Learning
𝑋𝑡+1 with probability 𝑝(𝑋𝑡+1 |𝑋𝑡 ) so we have a sequence of
Probabilistic random variables
Programming
Turing

Inference 𝑋0 , 𝑋1 , 𝑋2 , ...
MCMC
Variational Inference
Such a stochastic process is called Markov Chain
Differentiable
Programming Under some conditions after a time 𝜏 the Markov Chain will
Learning forget it’s initial state and becomes stationary
Resources
In other words the terms in the sequence

𝑋𝜏+1 , 𝑋𝜏+2 , 𝑋𝜏+3 , ...

will be random samples from the stationary distribution of


Markov Chain
Markov Chain Monte Carlo

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing
Is it possible to construct a Markov chain that converges to a
Inference
MCMC
specific distribution?
Variational Inference

Differentiable
Programming

Learning
Resources
Markov Chain Monte Carlo

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing
Is it possible to construct a Markov chain that converges to a
Inference
MCMC
specific distribution?
Variational Inference
We need a way to guaranty the existence of stationary
distribution.
Differentiable
Programming

Learning
Resources
Markov Chain Monte Carlo

Probabilistic
Programming

Definition
Bayesian
Learning A Markov chain is called reversible if it satisfies the detailed
Probabilistic
Programming
balance equations. It means the probability of being in a state
Turing 𝑥𝑖 then transitioning to a state 𝑥𝑗 is equal to the probability of
Inference
MCMC
being in 𝑥𝑗 and then transitioning to 𝑥𝑖 . formally:
Variational Inference

Differentiable 𝜋(𝑥𝑖 )𝑃 (𝑥𝑗 |𝑥𝑖 ) = 𝜋(𝑥𝑗 )𝑃 (𝑥𝑖 |𝑥𝑗 )


Programming

Where 𝜋(𝑥) is the stationary distribution


Learning
Resources
Markov Chain Monte Carlo

Probabilistic
Programming

Definition
Bayesian
Learning A Markov chain is called reversible if it satisfies the detailed
Probabilistic
Programming
balance equations. It means the probability of being in a state
Turing 𝑥𝑖 then transitioning to a state 𝑥𝑗 is equal to the probability of
Inference
MCMC
being in 𝑥𝑗 and then transitioning to 𝑥𝑖 . formally:
Variational Inference

Differentiable 𝜋(𝑥𝑖 )𝑃 (𝑥𝑗 |𝑥𝑖 ) = 𝜋(𝑥𝑗 )𝑃 (𝑥𝑖 |𝑥𝑗 )


Programming

Where 𝜋(𝑥) is the stationary distribution


Learning
Resources

The idea is to define the transition probability 𝑃 (𝑥′ |𝑥) such


that it satisfies the detailed balance equations for the target
distribution 𝜋(𝑥)
Random Walk Metropolis-Hastings

Probabilistic
Programming
So we use the detailed balance equation:
Bayesian
Learning 𝜋(𝑥)𝑃 (𝑥′ |𝑥) = 𝜋(𝑥′ )𝑃 (𝑥|𝑥′ )
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming
So we use the detailed balance equation:
Bayesian
Learning 𝜋(𝑥)𝑃 (𝑥′ |𝑥) = 𝜋(𝑥′ )𝑃 (𝑥|𝑥′ )
Probabilistic
Programming
Turing

Inference 𝑃 (𝑥′ |𝑥) 𝜋(𝑥′ )


=
MCMC
Variational Inference
𝑃 (𝑥|𝑥′ ) 𝜋(𝑥)
Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming
So we use the detailed balance equation:
Bayesian
Learning 𝜋(𝑥)𝑃 (𝑥′ |𝑥) = 𝜋(𝑥′ )𝑃 (𝑥|𝑥′ )
Probabilistic
Programming
Turing

Inference 𝑃 (𝑥′ |𝑥) 𝜋(𝑥′ )


=
MCMC
Variational Inference
𝑃 (𝑥|𝑥′ ) 𝜋(𝑥)
Differentiable
Programming
Now let’s break the transition into two steps. First, we propose
Learning
a new state with probability 𝑔(𝑥′ |𝑥). Next deciding whether we
Resources move to the proposed state 𝑥′ according to an acceptance
distribution 𝐴(𝑥′ , 𝑥)
Random Walk Metropolis-Hastings

Probabilistic
Programming
So we use the detailed balance equation:
Bayesian
Learning 𝜋(𝑥)𝑃 (𝑥′ |𝑥) = 𝜋(𝑥′ )𝑃 (𝑥|𝑥′ )
Probabilistic
Programming
Turing

Inference 𝑃 (𝑥′ |𝑥) 𝜋(𝑥′ )


=
MCMC
Variational Inference
𝑃 (𝑥|𝑥′ ) 𝜋(𝑥)
Differentiable
Programming
Now let’s break the transition into two steps. First, we propose
Learning
a new state with probability 𝑔(𝑥′ |𝑥). Next deciding whether we
Resources move to the proposed state 𝑥′ according to an acceptance
distribution 𝐴(𝑥′ , 𝑥)

𝑃 (𝑥′ |𝑥) = 𝑔(𝑥′ |𝑥)𝐴(𝑥′ , 𝑥)


Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning Now we rewrite the equation
Probabilistic
Programming 𝐴(𝑥′ , 𝑥) 𝜋(𝑥′ ) 𝑔(𝑥|𝑥′ )
Turing =
Inference
𝐴(𝑥, 𝑥′ ) 𝜋(𝑥) 𝑔(𝑥′ |𝑥)
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning Now we rewrite the equation
Probabilistic
Programming 𝐴(𝑥′ , 𝑥) 𝜋(𝑥′ ) 𝑔(𝑥|𝑥′ )
Turing =
Inference
𝐴(𝑥, 𝑥′ ) 𝜋(𝑥) 𝑔(𝑥′ |𝑥)
MCMC
Variational Inference
Metropolis-Hastings algorithm defines an acceptance ratio that
Differentiable
Programming
satisfies the above condition
Learning
Resources 𝜋(𝑥′ ) 𝑔(𝑥|𝑥′ )
𝐴(𝑥′ , 𝑥) = 𝑚𝑖𝑛(1, )
𝜋(𝑥) 𝑔(𝑥′ |𝑥)
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

The funny fact is that for computing 𝜋(𝑥


𝜋(𝑥) 𝑔(𝑥′ |𝑥) we don’t need
′ ′
) 𝑔(𝑥|𝑥 )
Inference

to know the normalization constant of 𝜋(𝑥)!


MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

function random_walk_metropolis(logπ, z₀, σ, T)


Bayesian
Learning samples = [z₀]
Probabilistic z = z₀
Programming
Turing
for τ ∈ 1:T
Inference
y = z .+ σ * randn(size(z))
MCMC α = exp(logπ(y) - logπ(z))
Variational Inference
if rand() < α
Differentiable
Programming z = y
Learning push!(samples, z)
Resources end
end
return samples
end
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming The acceptance probability in higher dimensions is not
Turing
satisfying
Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming The acceptance probability in higher dimensions is not
Turing
satisfying
Inference
MCMC RWMH treats the target density as blackbox
Variational Inference

Differentiable
Programming

Learning
Resources
Random Walk Metropolis-Hastings

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming The acceptance probability in higher dimensions is not
Turing
satisfying
Inference
MCMC RWMH treats the target density as blackbox
Variational Inference

Differentiable
Programming
Advanced MCMC methods exploit properties of the target density!
Learning
Resources
Langevin Monte Carlo

Probabilistic
Programming

Bayesian √
Learning
d𝑋𝑡 = ∇log 𝜋(𝑋𝑡 ) + 2 d𝑊𝑡
Probabilistic
Programming
Turing Overdamped Langevin Dynamics
Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Langevin Monte Carlo

Probabilistic
Programming

Bayesian √
Learning
d𝑋𝑡 = ∇log 𝜋(𝑋𝑡 ) + 2 d𝑊𝑡
Probabilistic
Programming
Turing Overdamped Langevin Dynamics
Inference
MCMC function langevin_dynamics(logπ, z, τ)
Variational Inference
ζ = sqrt(2τ) * randn(size(z))
Differentiable
Programming ∇logπ = ForwardDiff.gradient(logπ, z)
Learning z .+ τ * ∇logπ .+ ζ
Resources end
Langevin Monte Carlo

Probabilistic
Programming

Bayesian √
Learning
d𝑋𝑡 = ∇log 𝜋(𝑋𝑡 ) + 2 d𝑊𝑡
Probabilistic
Programming
Turing Overdamped Langevin Dynamics
Inference
MCMC function langevin_dynamics(logπ, z, τ)
Variational Inference
ζ = sqrt(2τ) * randn(size(z))
Differentiable
Programming ∇logπ = ForwardDiff.gradient(logπ, z)
Learning z .+ τ * ∇logπ .+ ζ
Resources end

To derive Langevin Monte Carlo we adjust this dynamics using


Metropolis-Hastings
Langevin Monte Carlo

Probabilistic
Programming
function langevin_monte_carlo(logπ, z₀, τ, T)
z = z₀
Bayesian samples = [z₀]
Learning
for t ∈ 1:T
Probabilistic ζ = sqrt(2τ) * randn(size(z))
Programming
Turing
∇logπ = ForwardDiff.gradient(logπ, z)
Inference
y = z .+ τ * ∇logπ .+ ζ
MCMC ∇logπ_y = ForwardDiff.gradient(logπ, y)
Variational Inference
logg_y_z = -1/(4τ) * norm(y .- z .- τ * ∇logπ)^2
Differentiable logg_z_y = -1/(4τ) * norm(z .- y .- τ * ∇logπ_y)^2
Programming
α = logπ(y) + logg_z_y - logπ(z) - logg_y_z
Learning
Resources if rand() < exp(α)
z = y
push!(samples, z)
end
end
return samples
end
Langevin Monte Carlo

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Research Trends

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming alternatives for the base dynamics (HMC)
Turing

Inference
adaptive samplers (NUTS)
MCMC
Variational Inference
exploiting other properties of the target density
Differentiable
Programming
compositional samplers
Learning SMC, particle filter
Resources
Useful Tools

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming BlackJAX: Composable Bayesian Inference in JAX
Learning
Resources
effiecient implementation of samplers
composable samplers
GPU acceleration
suitable for designing PPLs or using with exisiting ones
Example : Bayesian Linear Regression

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Example : Bayesian Linear Regression

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Example : Bayesian Linear Regression

Probabilistic
Programming

Bayesian Probabilistic Program:


Learning
@model function bayesian_regression(X, y)
Probabilistic
Programming α ~ Normal(0.0, 1.0)
Turing
β ~ Normal(0.0, 1.0)
Inference
MCMC
Variational Inference for i ∈ eachindex(y)
Differentiable y[i] ~ Normal(α * X[i] + β, 1.0)
Programming
end
Learning
Resources end
Example : Bayesian Linear Regression

Probabilistic
Programming

Bayesian Probabilistic Program:


Learning
@model function bayesian_regression(X, y)
Probabilistic
Programming α ~ Normal(0.0, 1.0)
Turing
β ~ Normal(0.0, 1.0)
Inference
MCMC
Variational Inference for i ∈ eachindex(y)
Differentiable y[i] ~ Normal(α * X[i] + β, 1.0)
Programming
end
Learning
Resources end

Inference:
ch = sample(bayesian_regression(x, y), NUTS(), 10000)
Example : Bayesian Linear Regression

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Example : Bayesian Linear Regression

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Example : Bayesian Neural Network

Probabilistic
Programming

Bayesian
Learning
Let 𝑓(𝑥; 𝑤 ) be a neural network with three layers and sigmoid
Probabilistic
Programming activation:
Turing
𝑤 = [𝑤𝑤𝐿1 , 𝑤 𝐿2 , 𝑤 𝐿3 ]
Inference
MCMC
Variational Inference
𝑤𝐿1 )𝑤
𝑓(𝑥; 𝑤 ) = 𝜎(𝜎(𝜎(𝑥𝑤 𝑤𝐿2 )𝑤
𝑤𝐿3 )
Differentiable
Programming

Learning
Resources
Example : Bayesian Neural Network

Probabilistic
Programming

Bayesian
Learning
Let 𝑓(𝑥; 𝑤 ) be a neural network with three layers and sigmoid
Probabilistic
Programming activation:
Turing
𝑤 = [𝑤𝑤𝐿1 , 𝑤 𝐿2 , 𝑤 𝐿3 ]
Inference
MCMC
Variational Inference
𝑤𝐿1 )𝑤
𝑓(𝑥; 𝑤 ) = 𝜎(𝜎(𝜎(𝑥𝑤 𝑤𝐿2 )𝑤
𝑤𝐿3 )
Differentiable
Programming We define a Multivariavte Normal prior over 𝑤
Learning
Resources 𝑤 ∼ 𝒩(0, 𝐼)
Example : Bayesian Neural Network

Probabilistic
Programming
Some samples from 𝑝(𝑤
𝑤)
Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Example : Bayesian Neural Network

Probabilistic
Programming
Now we update our belief about network weights using the below
Bayesian dataset
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources

𝑦 ∼ Bernoulli(𝑓(𝑥; 𝑤 ))
Example : Bayesian Neural Network

Probabilistic
Programming

Bayesian function neural_network(X, L1, L2, L3, H_dim)


Learning
H = X * reshape(L1, (2, H_dim))
Probabilistic
Programming H = sigmoid.(H)
Turing

Inference H = H * reshape(L2, (H_dim, H_dim))


MCMC
Variational Inference H = sigmoid.(H)
Differentiable
Programming
O = H * reshape(L3, (H_dim, 1))
Learning O = sigmoid.(O)
Resources

return O
end
Example : Bayesian Neural Network

Probabilistic
Programming
@model function bayesian_neural_network(X, y, σ, H_dim)
n_L1 = 2 * H_dim
Bayesian
Learning n_L2 = H_dim * H_dim
n_L3 = H_dim
Probabilistic
Programming
Turing
Σ(n) = Diagonal(abs2.(σ .* ones(n)))
Inference
MCMC
Variational Inference
L1 ~ MvNormal(zeros(n_L1), Σ(n_L1))
L2 ~ MvNormal(zeros(n_L2), Σ(n_L2))
Differentiable
Programming L3 ~ MvNormal(zeros(n_L3), Σ(n_L3))
Learning
Resources O = neural_network(X, L1, L2, L3, H_dim)

for i in eachindex(y)
y[i] ~ Bernoulli(O[:][i])
end
end
Example : Bayesian Neural Network

Probabilistic
Programming
Some samples from 𝑝(𝑤
𝑤|𝑦)
Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Bayesian Neural Ordinary Differential Equations

Probabilistic
Programming
Imagine we have a differential equation with unknown
parameters.
Bayesian
Learning d𝑢1
= −𝛼𝑢1 − 𝛽𝑢1 𝑢2
Probabilistic d𝑡
Programming
d𝑢2
Turing
= −𝛿𝑢2 + 𝛾𝑢1 𝑢2
Inference d𝑡
MCMC
Variational Inference 1
Differentiable
Programming

Learning
Resources

1 Bayesian Neural Ordinary Differential Equations, arXiv:2012.07244


Bayesian Neural Ordinary Differential Equations

Probabilistic
Programming

Bayesian
Learning
How to differentiate through an ODE solver !?2
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning Lev Pontryagin


Resources
(1908-1988)

2 Neural Ordinary Differential Equations, arXiv:1806.07366


Useful Tools

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
SciML: Open Source Software for Scientific Machine Learning

Advanced differential equation solvers


Differentiable ODE solvers compatible with Turing
Variational Inference

Probabilistic
Programming Another idea is trying to find a tractable density 𝑞(𝑧; 𝜆) as close as
possible to the posterior.
Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Variational Inference

Probabilistic
Programming

Bayesian
Learning
log 𝑝(𝑥) = log ∫ 𝑝(𝑥, 𝑧)𝑑𝑧
Probabilistic
𝑝(𝑥, 𝑧)𝑞(𝑧; 𝜆)
Programming
Turing
= log ∫ 𝑑𝑧
𝑞(𝑧; 𝜆)
Inference
𝑝(𝑥, 𝑧)
= log 𝔼𝑞(𝑧;𝜆)
MCMC
Variational Inference

Differentiable
𝑞(𝑧; 𝜆)
Programming 𝑝(𝑥, 𝑧)
Learning ≥ 𝔼𝑞(𝑧;𝜆) log
Resources 𝑞(𝑧; 𝜆)
Variational Inference

Probabilistic
Programming

Bayesian
Learning
log 𝑝(𝑥) = log ∫ 𝑝(𝑥, 𝑧)𝑑𝑧
Probabilistic
𝑝(𝑥, 𝑧)𝑞(𝑧; 𝜆)
Programming
Turing
= log ∫ 𝑑𝑧
𝑞(𝑧; 𝜆)
Inference
𝑝(𝑥, 𝑧)
= log 𝔼𝑞(𝑧;𝜆)
MCMC
Variational Inference

Differentiable
𝑞(𝑧; 𝜆)
Programming 𝑝(𝑥, 𝑧)
Learning ≥ 𝔼𝑞(𝑧;𝜆) log
Resources 𝑞(𝑧; 𝜆)

The last term above is called Evidence Lower BOund:


𝑝(𝑥, 𝑧)
ℒ(𝜆) = 𝔼𝑞(𝑧;𝜆) log
𝑞(𝑧; 𝜆)
Variational Inference

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference log 𝑝(𝑥) = ℒ(𝜆) + 𝐷𝐾𝐿 (𝑞||𝑝)


MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Variational Inference

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference log 𝑝(𝑥) = ℒ(𝜆) + 𝐷𝐾𝐿 (𝑞||𝑝)


MCMC
Variational Inference

Differentiable Where 𝐷𝐾𝐿 is the Kullback–Leibler divergence


Programming

Learning
Resources
Variational Inference

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference log 𝑝(𝑥) = ℒ(𝜆) + 𝐷𝐾𝐿 (𝑞||𝑝)


MCMC
Variational Inference

Differentiable Where 𝐷𝐾𝐿 is the Kullback–Leibler divergence


Programming

Learning
Resources Maximizing ℒ(𝜆) is equivalent to minimizing 𝐷𝐾𝐿 (𝑞||𝑝)
Research Trends

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing
How to pick a variational distribution?
Inference
MCMC Is minimizing 𝐷𝐾𝐿 a good idea ?!
Variational Inference

Differentiable Can we somehow combine sampling and optimization?


Programming

Learning
Resources
Differentiable Programming

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
The moral of the story!
Turing
Advanced inference methods require our program to be differentiable
Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Differentiable Programming

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
The moral of the story!
Turing
Advanced inference methods require our program to be differentiable
Inference
MCMC
Variational Inference

Differentiable
Programming How to write a probabilistic program in a differentiable way?
Learning
Resources
Differentiable Programming
Differentiable Programming

Probabilistic
Programming
Softmax as a differentiable approximation of argmax
Bayesian 𝑒𝑥𝑖
Learning softmax(𝑥𝑖 ) =
Probabilistic
∑𝑗 𝑒𝑥𝑗
Programming

approximating the maximum is straightforward:


Turing

Inference
MCMC
Variational Inference
𝑛

Differentiable maximum element ≈ ∑ softmax(𝑥𝑖 )𝑥𝑖


Programming 𝑖
Learning
Resources
Differentiable Programming

Probabilistic
Programming
Now can we come up with a soft approximation of sort?3
Bayesian
Learning [10, 5, 20, 8, 40, 0]
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources

3 Image : Fast Differentiable Sorting and Ranking, arXiv:2002.08871


Differentiable Programming

Probabilistic
Programming
Now can we come up with a soft approximation of sort?3
Bayesian
Learning [10, 5, 20, 8, 40, 0]
Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources

3 Image : Fast Differentiable Sorting and Ranking, arXiv:2002.08871


Differentiable Programming

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference How about other algorithms?


MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Differentiable Programming

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference How about other algorithms?


MCMC

For example, a soft shortest path algorithm?


Variational Inference

Differentiable
Programming

Learning
Resources
Example: Inverse Optimization

Probabilistic
Programming Suppose we have a graph with unknown weights

Bayesian
Learning ?

Probabilistic
Programming
?
Turing
?
Inference
MCMC
Variational Inference

Differentiable
Programming
?
Learning
Resources

if we observe some agents traveling through this graph and


assuming they tend to choose shortest paths, Can we infer the
weights?
Useful Tools

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming
CVXpyLayers : Differentiable Convex Optimization Layers4
Learning
Resources

4 Differentiable Convex Optimization Layers, arXiv:1910.12430


Useful Tools

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming AlgoVision : Differentiable Algorithms and Algorithmic Supervision5
Learning
Resources

5 Learning with Algorithmic Supervision via Continuous Relaxations, arXiv:2110.05651


Summary

Probabilistic
Programming

Bayesian Probabilistic Programming + Differentiable Programming


Learning

Probabilistic
Programming
Turing

Inference
MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Learning Resources

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming

course A graduate course on Probabilistic


Turing

Inference
MCMC Programming (25 lectures) by Dr. Frank
Variational Inference
Wood at UBC.
Differentiable
Programming book An Introduction to Probabilistic
Learning Programming
Resources
Learning Resources

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference

Foundations of Probabilistic Programming


MCMC
Variational Inference

Differentiable
Programming

Learning
Resources
Learning Resources

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference
MCMC
Monte Carlo Methods
Variational Inference

Differentiable
Adrian Barbu , Song-Chun Zhu
Programming

Learning
Resources
Learning Resources

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming
Turing

Inference The Elements of Differentiable Programming


MCMC
Variational Inference Mathieu Blondel, Vincent Roulet
Differentiable
Programming

Learning
Resources
Learning Resources

Probabilistic
Programming

Bayesian
Learning

Probabilistic
Programming Neural Algorithmic Reasoning
Turing

Inference
MCMC
Homepage : algo-reasoning.github.io
Variational Inference

Differentiable Petar Velickovic, Andreea Deac, and Andrew


Programming
Dudzik
Learning
Resources

You might also like