0% found this document useful (0 votes)

45 views

Cheatsheet Variables Models

This document summarizes techniques for solving constraint satisfaction problems (CSPs) using variable-based models and factor graphs. It discusses backtracking search with dynamic variable and value ordering heuristics like most constrained variable and least constrained value. It also covers lookahead techniques like forward checking and arc consistency (AC-3). Approximate methods discussed include beam search. The document emphasizes that independence properties allow breaking large problems into smaller subproblems that can be solved in parallel.

Uploaded by

Tấn Lộc Phạm Huỳnh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Cheatsheet Variables Models

Uploaded by

Tấn Lộc Phạm Huỳnh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CS 221 – Artificial Intelligence https://stanford.

edu/~shervine

VIP Cheatsheet: Variables-based models Dynamic ordering

r Dependent factors – The set of dependent factors of variable Xi with partial assignment x
Afshine Amidi and Shervine Amidi is called D(x,Xi ), and denotes the set of factors that link Xi to already assigned variables.

r Backtracking search – Backtracking search is an algorithm used to find maximum weight

May 26, 2019 assignments of a factor graph. At each step, it chooses an unassigned variable and explores
its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead
(i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently,
although the worst-case runtime stays exponential: O(|Domain|n ).
Constraint satisfaction problems
r Forward checking – It is a one-step lookahead heuristic that preemptively removes incon-
In this section, our objective is to find maximum weight assignments of variable-based models. sistent values from the domains of neighboring variables. It has the following characteristics:
One advantage compared to states-based models is that these algorithms are more convenient
to encode problem-specific constraints. • After assigning a variable Xi , it eliminates inconsistent values from the domains of all its
neighbors.

Factor graphs • If any of these domains becomes empty, we stop the local backtracking search.
• If we un-assign a variable Xi , we have to restore the domain of its neighbors.
r Definition – A factor graph, also referred to as a Markov random field, is a set of variables
X = (X1 ,...,Xn ) where Xi ∈ Domaini and m factors f1 ,...,fm with each fj (X) > 0.
r Most constrained variable – It is a variable-level ordering heuristic that selects the next
unassigned variable that has the fewest consistent values. This has the effect of making incon-
sistent assignments to fail earlier in the search, which enables more efficient pruning.
r Least constrained value – It is a value-level ordering heuristic that assigns the next value
that yields the highest number of consistent values of neighboring variables. Intuitively, this
procedure chooses first the values that are most likely to work.
Remark: in practice, this heuristic is useful when all factors are constraints.
r Scope and arity – The scope of a factor fj is the set of variables it depends on. The size of
this set is called the arity.
Remark: factors of arity 1 and 2 are called unary and binary respectively.
r Assignment weight – Each assignment x = (x1 ,...,xn ) yields a weight Weight(x) defined as
being the product of all factors fj applied to that assignment. Its expression is given by:
m
Y
Weight(x) = fj (x)
j=1

r Constraint satisfaction problem – A constraint satisfaction problem (CSP) is a factor

graph where all factors are binary; we call them to be constraints:
∀j ∈ [[1,m]], fj (x) ∈ {0,1}

Here, the constraint j with assignment x is said to be satisfied if and only if fj (x) = 1.

r Consistent assignment – An assignment x of a CSP is said to be consistent if and only if

The example above is an illustration of the 3-color problem with backtracking search coupled
Weight(x) = 1, i.e. all constraints are satisfied. with most constrained variable exploration and least constrained value heuristic, as well as
forward checking at each step.
r Arc consistency – We say that arc consistency of variable Xl with respect to Xk is enforced
when for each xl ∈ Domainl :

• unary factors of Xl are non-zero,

• there exists at least one xk ∈ Domaink such that any factor between Xl and Xk is
non-zero.

Stanford University 1 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

r AC-3 – The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking Remark: independence is the key property that allows us to solve subproblems in parallel.
to all relevant variables. After a given assignment, it performs forward checking and then
successively enforces arc consistency with respect to the neighbors of variables for which the r Conditional independence – We say that A and B are conditionally independent given C
domain change during the process. if conditioning on C produces a graph in which A and B are independent. In this case, it is
Remark: AC-3 can be implemented both iteratively and recursively. written:

A and B cond. indep. given C ⇐⇒ A ⊥

⊥ B|C
Approximate methods
r Beam search – Beam search is an approximate algorithm that extends partial assignments r Conditioning – Conditioning is a transformation aiming at making variables independent
of n variables of branching factor b = |Domain| by exploring the K top paths at each step. The that breaks up a factor graph into smaller pieces that can be solved in parallel and can use
beam size K ∈ {1,...,bn } controls the tradeoff between efficiency and accuracy. This algorithm backtracking. In order to condition on a variable Xi = v, we do as follows:
has a time complexity of O(n · Kb log(Kb)).
The example below illustrates a possible beam search of parameters K = 2, b = 3 and n = 5. • Consider all factors f1 ,...,fk that depend on Xi

• Remove Xi and f1 ,...,fk

• Add gj (x) for j ∈ {1,...,k} defined as:

gj (x) = fj (x ∪ {Xi : v})

r Markov blanket – Let A ⊆ X be a subset of variables. We define MarkovBlanket(A) to be

the neighbors of A that are not in A.

r Proposition – Let C = MarkovBlanket(A) and B = X\(A ∪ C). Then we have:

A⊥
⊥ B|C

Remark: K = 1 corresponds to greedy search whereas K → +∞ is equivalent to BFS tree search.

r Iterated conditional modes – Iterated conditional modes (ICM) is an iterative approximate
algorithm that modifies the assignment of a factor graph one variable at a time until convergence.
At step i, we assign to Xi the value v that maximizes the product of all factors connected to
that variable.
Remark: ICM may get stuck in local minima.
r Gibbs sampling – Gibbs sampling is an iterative approximate method that modifies the
assignment of a factor graph one variable at a time until convergence. At step i:

• we assign to each element u ∈ Domaini a weight w(u) that is the product of all factors
connected to that variable, r Elimination – Elimination is a factor graph transformation that removes Xi from the graph
and solves a small subproblem conditioned on its Markov blanket as follows:
• we sample v from the probability distribution induced by w and assign it to Xi .
• Consider all factors fi,1 ,...,fi,k that depend on Xi
Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advan-
tage to be able to escape local minima in most cases.
• Remove Xi and fi,1 ,...,fi,k

Factor graph transformations • Add fnew,i (x) defined as:

r Independence – Let A,B be a partitioning of the variables X. We say that A and B are k
independent if there are no edges between A and B and we write:
Y
fnew,i (x) = max fi,l (x)
xi
A,B independent ⇐⇒ A ⊥
⊥B l=1

Stanford University 2 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

r Treewidth – The treewidth of a factor graph is the maximum arity of any factor created by
variable elimination with the best variable ordering. In other words,

Treewidth = min max arity(fnew,i )

orderings i∈{1,...,n}

The example below illustrates the case of a factor graph of treewidth 3.

r Locally normalized – For each xParents(i) , all factors are local conditional distributions.
Hence they have to satisfy:

X
p(xi |xParents(i) ) = 1
xi

As a result, sub-Bayesian networks and conditional distributions are consistent.

Remark: local conditional distributions are the true conditional distributions.

r Marginalization – The marginalization of a leaf node yields a Bayesian network without

that node.
Remark: finding the best variable ordering is a NP-hard problem.

Probabilistic programs
Bayesian networks
r Concept – A probabilistic program randomizes variables assignment. That way, we can write
In this section, our goal will be to compute conditional probabilities. What is the probability of down complex Bayesian networks that generate assignments without us having to explicitly
a query given evidence? specify associated probabilities.
Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial
HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block
models.
Introduction r Summary – The table below summarizes the common probabilistic programs as well as their
applications:
r Explaining away – Suppose causes C1 and C2 influence an effect E. Conditioning on the
effect E and on one of the causes (say C1 ) changes the probability of the other cause (say C2 ).
In this case, we say that C1 has explained away C2 .
Program Algorithm Illustration Example
r Directed acyclic graph – A directed acyclic graph (DAG) is a finite directed graph with
no directed cycles.
Language
r Bayesian network – A Bayesian network is a directed acyclic graph (DAG) that specifies Markov Model Xi ∼ p(Xi |Xi−1 )
a joint distribution over random variables X = (X1 ,...,Xn ) as a product of local conditional modeling
distributions, one for each node:

n
Y
P (X1 = x1 ,...,Xn = xn ) , p(xi |xParents(i) )
i=1 Hidden Markov Ht ∼ p(Ht |Ht−1 )
Object tracking
Model (HMM) Et ∼ p(Et |Ht )

Remark: Bayesian networks are factor graphs imbued with the language of probability.

Stanford University 3 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

with the convention F0 = BL+1 = 1. From this procedure and these notations, we get that
Hto ∼ o )
p(Hto |Ht−1 P (H = hk |E = e) = Sk (hk )
o∈{a,b} Multiple object
Factorial HMM
tracking
Et ∼ p(Et |Hta ,Htb ) Remark: this algorithm interprets each assignment to be a path where each edge hi−1 → hi is
of weight p(hi |hi−1 )p(ei |hi ).

r Gibbs sampling – This algorithm is an iterative approximate method that uses a small set of
assignments (particles) to represent a large probability distribution. From a random assignment
x, Gibbs sampling performs the following steps for i ∈ {1,...,n} until convergence:
Y ∼ p(Y ) Document
Naive Bayes
Wi ∼ p(Wi |Y ) classification • For all u ∈ Domaini , compute the weight w(u) of assignment x where Xi = u

• Sample v from the probability distribution induced by w: v ∼ P (Xi = v|X−i = x−i )

• Set Xi = v
α∈ RKdistribution
Latent Dirichlet Remark: X−i denotes X\{Xi } and x−i represents the corresponding assignment.
Zi ∼ p(Zi |α) Topic modeling
Allocation (LDA)
Wi ∼ p(Wi |Zi ) r Particle filtering – This algorithm approximates the posterior density of state variables
given the evidence of observation variables by keeping track of K particles at a time. Starting
from a set of particles C of size K, we run the following 3 steps iteratively:

• Step 1: proposal - For each old particle xt−1 ∈ C, sample x from the transition probability
distribution p(x|xt−1 ) and add x to a set C 0 .

• Step 2: weighting - Weigh each x of the set C 0 by w(x) = p(et |x), where et is the evidence
Inference
observed at time t.
r General probabilistic inference strategy – The strategy to compute the probability • Step 3: resampling - Sample K elements from the set C 0 using the probability distribution
P (Q|E = e) of query Q given evidence E = e is as follows: induced by w and store them in C: these are the current particles xt .

• Step 1: Remove variables that are not ancestors of the query Q or the evidence E by Remark: a more expensive version of this algorithm also keeps track of past particles in the
marginalization proposal step.

• Step 2: Convert Bayesian network to factor graph r Maximum likelihood – If we don’t know the local conditional distributions, we can learn
them using maximum likelihood.
• Step 3: Condition on the evidence E = e
Y
max p(X = x; θ)
θ
• Step 4: Remove nodes disconnected from the query Q by marginalization x∈Dtrain

• Step 5: Run probabilistic inference algorithm (manual, variable elimination, Gibbs sam- r Laplace smoothing – For each distribution d and partial assignment (xParents(i) ,xi ), add λ
pling, particle filtering) to countd (xParents(i) ,xi ), then normalize to get probability estimates.

r Algorithm – The Expectation-Maximization (EM) algorithm gives an efficient method at

r Forward-backward algorithm – This algorithm computes the exact value of P (H = hk |E = estimating the parameter θ through maximum likelihood estimation by repeatedly constructing
e) (smoothing query) for any k ∈ {1, ..., L} in the case of an HMM of size L. To do so, we proceed a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:
in 3 steps:
• E-step: Evaluate the posterior probability q(h) that each data point e came from a
• Step 1: for i ∈ {1,..., L}, compute Fi (hi ) =
P
Fi−1 (hi−1 )p(hi |hi−1 )p(ei |hi ) particular cluster h as follows:
hi−1
q(h) = P (H = h|E = e; θ)
• Step 2: for i ∈ {L,..., 1}, compute Bi (hi ) = Bi+1 (hi+1 )p(hi+1 |hi )p(ei+1 |hi+1 )
P
hi+1
• M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e
F (h )B (h )
• Step 3: for i ∈ {1,...,L}, compute Si (hi ) = P i i i i to determine θ through maximum likelihood.
Fi (hi )Bi (hi )
hi

Stanford University 4 Spring 2019

3rdQ BasCal Reviewer
No ratings yet
3rdQ BasCal Reviewer
6 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Random Variables
No ratings yet
Random Variables
17 pages
Ch13 4-LinearDynamicalSystems
No ratings yet
Ch13 4-LinearDynamicalSystems
20 pages
Rates of Uniform Consistency For K-NN Regression: Preprint
No ratings yet
Rates of Uniform Consistency For K-NN Regression: Preprint
9 pages
Constraint Satisfaction Problems: Slides by Prof WELLING
No ratings yet
Constraint Satisfaction Problems: Slides by Prof WELLING
39 pages
k-NN Algorithm Overview
No ratings yet
k-NN Algorithm Overview
8 pages
Module IV - K NN
No ratings yet
Module IV - K NN
15 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
General Mathematics Reviewer 2nd 2
No ratings yet
General Mathematics Reviewer 2nd 2
6 pages
Gen Math
No ratings yet
Gen Math
1 page
Improper 5
No ratings yet
Improper 5
2 pages
Analysis of Scale Invariance Property Applying Homogeneity
No ratings yet
Analysis of Scale Invariance Property Applying Homogeneity
6 pages
Mathheematics For Science Week 2 Lecture Notes
No ratings yet
Mathheematics For Science Week 2 Lecture Notes
8 pages
Copy of deep-learning
No ratings yet
Copy of deep-learning
28 pages
PIO Lecture Two
No ratings yet
PIO Lecture Two
56 pages
Chap6.1-KernelMethods
No ratings yet
Chap6.1-KernelMethods
36 pages
Csps
No ratings yet
Csps
52 pages
Module 4 A
No ratings yet
Module 4 A
29 pages
CMPS715CSP
No ratings yet
CMPS715CSP
34 pages
Tema 2.1 - Manejo de Restricciones
No ratings yet
Tema 2.1 - Manejo de Restricciones
12 pages
constraint-satisfaction
No ratings yet
constraint-satisfaction
65 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
CSC349a Final Formula Sheet
No ratings yet
CSC349a Final Formula Sheet
2 pages
ML Co4 Session 29
No ratings yet
ML Co4 Session 29
36 pages
AML_mod5
No ratings yet
AML_mod5
33 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Rational Functions
No ratings yet
Rational Functions
28 pages
Calculus For Dummies
No ratings yet
Calculus For Dummies
13 pages
OPTCON Optimization 2023 10 11
No ratings yet
OPTCON Optimization 2023 10 11
71 pages
Multivariable - Chapter1
No ratings yet
Multivariable - Chapter1
13 pages
Chapter 4 Continuous Random Variables and Probability Distribution (Part 1)
No ratings yet
Chapter 4 Continuous Random Variables and Probability Distribution (Part 1)
17 pages
Constraint Satisfaction Problems: Section 1 - 3
No ratings yet
Constraint Satisfaction Problems: Section 1 - 3
36 pages
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
No ratings yet
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
21 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
RBF.ppt
No ratings yet
RBF.ppt
45 pages
DGFGHF
No ratings yet
DGFGHF
4 pages
A General Introduction To Artificial Intelligence
No ratings yet
A General Introduction To Artificial Intelligence
58 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
CSU-Cabadbaran Advance Review For EE: Topic: Algebra 2 - Functions
No ratings yet
CSU-Cabadbaran Advance Review For EE: Topic: Algebra 2 - Functions
9 pages
P1 - Single Layer Feed Forward Networks
No ratings yet
P1 - Single Layer Feed Forward Networks
52 pages
Csps
No ratings yet
Csps
33 pages
Unit 4
No ratings yet
Unit 4
47 pages
Locally Weighted Regression
No ratings yet
Locally Weighted Regression
17 pages
ml_cheat (1)
No ratings yet
ml_cheat (1)
9 pages
DD Unit 4b
No ratings yet
DD Unit 4b
35 pages
csps
No ratings yet
csps
52 pages
Constraint Satisfaction Problems
No ratings yet
Constraint Satisfaction Problems
52 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Proba 2
No ratings yet
Proba 2
17 pages
Lect05 Instance ML
No ratings yet
Lect05 Instance ML
59 pages
Đạo hàm - Tiếng Anh
No ratings yet
Đạo hàm - Tiếng Anh
14 pages
First Elements Ordered Pairs Set Second Elements: Relation
No ratings yet
First Elements Ordered Pairs Set Second Elements: Relation
9 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
3.1 Characteristics of Polynomial Functions: y - Intercept and The Constant Term
No ratings yet
3.1 Characteristics of Polynomial Functions: y - Intercept and The Constant Term
6 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
Lecture 6a
No ratings yet
Lecture 6a
54 pages
CSP
No ratings yet
CSP
38 pages
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
Travel Brochure
No ratings yet
Travel Brochure
2 pages
Bateria 3s Carregador S-8254A
No ratings yet
Bateria 3s Carregador S-8254A
25 pages
Safety Data Sheet: (SDS) No. Bm-Msds 011A
No ratings yet
Safety Data Sheet: (SDS) No. Bm-Msds 011A
5 pages
Book by Icmai
No ratings yet
Book by Icmai
262 pages
Rehabilitation and Resettlement at Tata Steel Kalinga Nagar Project, Orissa
100% (1)
Rehabilitation and Resettlement at Tata Steel Kalinga Nagar Project, Orissa
40 pages
Database Anhar (Update)
No ratings yet
Database Anhar (Update)
232 pages
Pushti Pravesh
No ratings yet
Pushti Pravesh
34 pages
Retelling Name:: Title and Author
No ratings yet
Retelling Name:: Title and Author
4 pages
AMTED398078EN - Web 55
No ratings yet
AMTED398078EN - Web 55
1 page
DR Namrata Misra
No ratings yet
DR Namrata Misra
1 page
P&ID Symbols
No ratings yet
P&ID Symbols
17 pages
12-Month Bar & Beverage
No ratings yet
12-Month Bar & Beverage
14 pages
RGB Color Table: Color HTML / Css Name Hex Code #RRGGBB Decimal Code (R, G, B)
No ratings yet
RGB Color Table: Color HTML / Css Name Hex Code #RRGGBB Decimal Code (R, G, B)
5 pages
G1 Team Young - DLL Heat and Temp
No ratings yet
G1 Team Young - DLL Heat and Temp
3 pages
Annual Report 2009 10
No ratings yet
Annual Report 2009 10
34 pages
Homm3 the Shadow of Death Manual - Engleza
No ratings yet
Homm3 the Shadow of Death Manual - Engleza
36 pages
Keysight N77xx Series: User's Guide
No ratings yet
Keysight N77xx Series: User's Guide
102 pages
For Healing Duodenal Ulcer The Usual Duration of H2 Blocker Therapy Is
No ratings yet
For Healing Duodenal Ulcer The Usual Duration of H2 Blocker Therapy Is
36 pages
Abbreviations Powergrid
100% (1)
Abbreviations Powergrid
100 pages
Perturbation Theory
No ratings yet
Perturbation Theory
5 pages
Ada Lab
No ratings yet
Ada Lab
2 pages
A Comparative Review On Power Conversion Topologies and Energy Storage EV - IMPORTANT - CLASSIFICATION OF EVS - HEVS
No ratings yet
A Comparative Review On Power Conversion Topologies and Energy Storage EV - IMPORTANT - CLASSIFICATION OF EVS - HEVS
24 pages
Big Ben Tower
No ratings yet
Big Ben Tower
3 pages
Final Test
No ratings yet
Final Test
6 pages
Quantum Information Processing in Ion Traps II: Part I, Rainer Blatt
No ratings yet
Quantum Information Processing in Ion Traps II: Part I, Rainer Blatt
27 pages
Extracted Pages From 33
No ratings yet
Extracted Pages From 33
36 pages
Zine - Kaiju Don't Care About You v1-3
100% (1)
Zine - Kaiju Don't Care About You v1-3
13 pages
SDL J (Pedestal)
No ratings yet
SDL J (Pedestal)
4 pages
Thesis Sameer Musleh
No ratings yet
Thesis Sameer Musleh
157 pages
Format No. 16 List of Paper Setter - DCMOS
No ratings yet
Format No. 16 List of Paper Setter - DCMOS
2 pages