0% found this document useful (0 votes)

46 views

Module 4 - v1

To summarize the key information: - John called you saying the alarm is ringing - Mary did not call you - The alarm can be triggered by burglars or earthquakes - John sometimes calls when there is no alarm due to confusing phone ringing - Mary sometimes misses the alarm due to loud music Based on the evidence provided, there is not enough information to definitively say whether a burglary is occurring or not. John calling increases the probability but does not confirm it, and Mary not calling does not rule out the possibility. More information would be needed to make a conclusive determination.

Uploaded by

Shubham Pathak

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Module 4 - v1

Uploaded by

Shubham Pathak

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Module #4

CSE 3005:
Applied Artificial Intelligence
Contents
• Uncertainty in AI

2
Need for Reasoning with Uncertainty
• World is full of uncertainty
• Partial information
• Can’t encode rules for every condition
• Computers need to be able to handle uncertainty
• Probability – New foundation for AI
• Massive amounts of data are available today
• Statistics helps us learn lots of stuff from data

3
Probability Basics
• Begin with a set S, also called the sample space
• Eg. For a dice, what is the sample space?
• x ε S is a sample point / atomic event
• A probability space / probability model is a sample space with an assignment P(x) for all
values of x such that:
• 0 ≤𝑃 𝑥 ≤1
• σ𝑥 𝑃(𝑥) = 1
• An event A is a subset of S.
• Eg. A = “dice roll >= 4”
• A random variable is a function from sample points to some range.

4
Types of Probability Spaces
• Propositional / Boolean
• Eg. Cavity (Do I have a cavity?)
• Discrete random variables (finite or infinite)
• Eg. Weather = {Sunny, Rainy, Cloudy, Snowy, …}
• Values must be exhaustive and mutually exclusive.
• Continuous random variables (bounded or unbounded)
• Temperature = 21.6
• Temperature is between 20 and 35.
• Combination of different propositions

5
Axioms of Probability Theory
• All probabilities are in the range [0,1]
• P(AvB) = P(A) + P(B) – P(A^B)

6
Prior Probability
• Prior probability or unconditional probability correspond to belief
prior to arrival of new evidence.
• Probability distribution gives values of all possible assignments:
• P(Weather) = [0.72 0.1 0.08 0.001 …] (sum of elements total 1]
• Joint probability distribution for a set of random variables gives the
probability of every atomic event on those random variables

7
Conditional Probability
• Conditional Probability or Posterior Probability links 2 or more events that
depend on each other.
• Eg. P(cavity | toothache) = 0.8
• If we know more, the values will change.
• Calculate the values of the following:
• P(cavity | toothache, cavity) = ?
• P(cavity | toothache, sunny) = ?
• Answer:
•1
• 0.8

8
Conditional Probability
• P(A|B) = P(A^B)/P(B)
• Chain Rule: P(X1, X2, … XN) = P(Xn|X1 … Xn-1)*P(Xn-1|X1…Xn-
2)…P(X1)

9
• P(X1,X2,X3)=P(X3∣X1,X2)×P(X2∣X1)×P(X1)

10
Independent Events
• Two events are Independent if they do not depend on each other to occur.
• Eg. A = “A heads on the first toss” and B = “A tails on the second toss”
• For independent events,
𝑃(𝐴∩𝐵)
• 𝑃 𝐴𝐵 = =𝑃 𝐴
𝑃(𝐵)
• 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 ∗ 𝑃(𝐵)
• A and B are independent iff:
• P(A|B) = P(A) OR
• P(B|A) = P(B) OR
• P(A,B) = P(A)*P(B)

11
1.Independence of Events:
1. Two events, A and B, are considered independent if the occurrence of one event
doesn't affect the occurrence of the other. In simpler words, they are unrelated.
2.Probability of A given B:
1. The probability of A given B (P(A|B)) is a measure of how likely A is to happen,
considering that B has already occurred.
2. For independent events, P(A∣B)=P(A). This means knowing that B happened doesn't
change the probability of A.
3.Multiplication Rule for Independent Events:
1. The probability of both A and B happening (denoted as P(A∩B)) for independent
events is the product of their individual probabilities: P(A∩B)=P(A)×P(B).
2. This is because, if A and B are independent, the occurrence of one doesn't influence
the occurrence of the other.

12
1.Equivalent Conditions for Independence:
1. Events A and B are independent if any of the following is true:
1. P(A∣B)=P(A): The probability of A given B is the same as the probability of A. Knowing B doesn't
change the likelihood of A.
2. P(B∣A)=P(B): The probability of B given A is the same as the probability of B. Knowing A doesn't
change the likelihood of B.
3. P(A∩B)=P(A)×P(B): The probability of both A and B happening is the product of their individual
probabilities.

13
Bayes Rule
• 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 ∗ 𝑃 𝑦 = 𝑃 𝑦 𝑥 ∗ 𝑃(𝑥)
𝑃 𝑦 𝑥 ∗𝑃(𝑥)
𝑃 𝑥𝑦 =
𝑃(𝑦)
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑∗𝑃𝑟𝑖𝑜𝑟
• 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 =
𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒
𝑃 𝐸𝑓𝑓𝑒𝑐𝑡 𝐶𝑎𝑢𝑠𝑒 ∗𝑃(𝐶𝑎𝑢𝑠𝑒)
𝑃 𝐶𝑎𝑢𝑠𝑒 𝐸𝑓𝑓𝑒𝑐𝑡 =
𝑃(𝐸𝑓𝑓𝑒𝑐𝑡)

14
Sample Problem
• Any patient with a stiff neck (S) may or may not have meningitis (M).
The probability of a patient having a stiff neck, if they have meningitis,
is 0.8. However, meningitis is not widespread in the population –
having a probability of only 0.0001. On the other hand, stiff necks are
more common, with a probability of 0.1.
• . What is the probability of getting meningitis, if the person have stiff
neck?

15
Solution
• To be calculated in class!

16
Some Sample Practice Questions
• 1. What is the probability that you select the Queen of Hearts from a
deck of cards?
• 2. What is the probability that you select either a queen OR a heart
from a deck of cards?
• 3. What is the probability that you select first a queen AND THEN a
heart from a deck of cards?

17
1.Probability of selecting the Queen of Hearts:
•In a standard deck of 52 cards, there is only one Queen of Hearts.
•Probability = Number of favorable outcomes / Total number of possible outcomes
•Probability = 1 (Queen of Hearts) / 52 (total cards) = 1/52
2.Probability of selecting either a Queen OR a Heart:
•There are four Queens in a deck (Hearts, Diamonds, Clubs, Spades) and 13 Hearts in total (including the Queen of Hearts).
•However, we need to be careful not to double-count the Queen of Hearts, which is both a Queen and a Heart.
•Probability = (Number of favorable outcomes - Overlapping outcome) / Total number of possible outcomes
•Probability = (4 Queens ) + (13 Hearts - 1 Queen of Hearts) / 52 (total cards)
•Probability = (4+ 12) / 52 = 16/52
3.Probability of selecting first a Queen AND THEN a Heart:
•Since there are four Queens in a deck, the probability of selecting a Queen first is 4/52.
•Once a Queen is selected, there are now 51 cards left in the deck, and there are 13 Hearts remaining. So, the probability of
selecting a Heart second is 13/51.
•Probability = Probability of selecting a Queen * Probability of selecting a Heart given that a Queen was selected
•Probability = (4/52) * (13/51) = 1/52
So, to summarize:
1.Probability of selecting the Queen of Hearts: 1/52
2.Probability of selecting either a Queen OR a Heart: 16/52
3.Probability of selecting first a Queen AND THEN a Heart: 1/52

18
Some More Sample Practice Questions
• 4. What is the probability of getting a LETTER card (A,K,Q,J), given
that you picked a NUMBER card from the same suit?
• 5. What is the probability of drawing an ACE of HEARTS, given that
you have drawn an ACE from the deck of cards?
• 6. What is the probability of drawing an ACE of HEARTS, given that
you have drawn a FACE card from the deck of cards?

19
4)Probability of getting a LETTER card (A,K,Q,J) given that you picked
a NUMBER card from the same suit:
• For each suit (Hearts, Diamonds, Clubs, Spades), there are three
number cards (2, 3, 4, 5, 6, 7, 8, 9, 10).
• The probability of getting a LETTER card given that you picked a
NUMBER card from the same suit is zero because there are no LETTER
cards (A, K, Q, J) among the number cards.
• Therefore, the probability is 0.
• 5)1/4
• 6)0

20
Burglars OR Earthquakes
• You live in San Francisco, where there have been a lot of burglaries of
late, so you buy a burglar alarm. San Francisco is also where a number
of earthquakes will happen. The alarm is very sensitive, so both
burglars and earthquakes can set it off.
• One day, you are at a party and your neighbour John calls and tells
you that your alarm is ringing. On the other hand, your other
neighbour Mary does not call.
• Is your home being robbed?

21
• You have a new burglar alarm installed at home.
• It is fairly reliable at detecting burglary, but also sometimes responds to minor
earthquakes.
• You have two neighbors, John and Merry, who promised to call you at work
where they hear the alarm.
• John always calls when he hears the alarm, but sometimes confuses telephone
ringing with the alarm and calls too.
• Merry likes loud music and sometimes misses the alarm.
• Given the evidence of who has or has not called, we would like to estimate the
probability of a burglary.

22
Burglars OR Earthquakes
• Burglary –> Alarm
• Earthquake –> Alarm
• Alarm –> JohnCalls
• Alarm –> MaryCalls

23
24
• What is the probability that the alarm has sounded but neither a
burglary nor an earthquake has occurred, and both John and
Merry call?

25
• P(j^m^a^~b^~e) = P(j | a) P(m | a) P(a|~b, ~e) P(~b) P(~e) =

0.90 x 0.70 × 0.001 x 0.999 × 0.998 = 0.00062

26
Bayesian Networks
• A Bayesian network is a probabilistic graphical model which
represents a set of variables and their conditional dependencies using
a directed acyclic graph.
• Directed Acyclic Graph
• Table of conditional probabilities.

27
Bayesian Networks
• In general, joint distribution P over the set of variables requires
exponential space for representation and inference
• Bayesian Networks provide a graphical representation of conditional
independence relations in P
• Advantages of Bayesian Networks:
• Usually quite compact
• Requires assessment of fewer parameters
• Efficient inference

28
The Cavity, Toothache, Weather Problem
• One sunny day, you go to the dentist because you have a toothache.
The dentist checks if you have a cavity, using his instruments. If his
instruments get caught in your gums, then it is likely that you have a
cavity.
• Find the probability that you have a cavity, given you have a
toothache, the instruments get caught in your gums and it is a sunny
day.

29
Representing the Variables in a Bayesian
Network
• Weather = {Sunny}
• Cavity = {Yes, No}
• Toothache = {Yes, No}
• Catch = {Yes, No}
• Weather is independent of other
variables.
• Toothache and Catch are conditionally
independent of each other given
Cavity.

30
Revisiting the Burglaries / Earthquakes
Problem

32
Bayesian Networks – Qualitative Structure
• Graphical structure of Bayesian Network reflects conditional independence
among variables
• Each variable X is a node on the Directed Acyclic Graph
• Edges denote direct probabilistic influence
• Usually interpreted causally
• Parents of X are denoted by Par(X)
• Local semantics: X is conditionally independent of all NON-DESCENDENTS
given its parents
• In general, full joint probability or a Bayesian Network is defined as follows:
• 𝑃 𝑋1 , 𝑋2, … , 𝑋𝑛 , = ς𝑖 𝑃(𝑋𝑖 |𝑃𝑎𝑟 𝑋𝑖 )

33
Sequence Labeling
• Consider a problem where we have a sequence of observations,
which are triggered by a sequence of states.
• The observations are known and seen, but the states are what we are
interested in.

34
Coloured Ball Choosing
• Consider a situation where we have 3 urns (Urn1, Urn2 and Urn3), each of
which have 100 balls in them, such that some are red, some are blue and
some are green.
• We select a ball from an urn, record the observation, and put it back in the
urn.
• Then, we pick a ball from another urn and continue for a sequence.
• Our task is to now find a suitable sequence of labels of states which give us
the best probability for our observation.
• Solution: Hidden Markov Model.

35
Hidden Markov Model
• Coloured Ball Choosing
• There are 3 urns, each with 100 balls of different colours.

Urn1 Urn2 Urn3 Urn1 Urn2 Urn3

Urn1 0.1 0.4 0.5 Red 30 10 60
Urn2 0.6 0.2 0.2 Green 50 40 10
Urn3 0.3 0.4 0.3 Blue 20 50 30

36
Diagrammatic Representation

37
Problem Statement
• For an observation sequence, what is the sequence of states?
• Eg. Observation = RRGGBRGR
• Si = U1/U2/U3 (a particular state)
• O: Observation sequence
• S* = “best” state sequence
• Goal: Maximize P(S*|O) by choosing the “best” S
• Assume that each urn can be chosen with equal initial probability.

38
State Transitions Probability
• P(S) = P(S1-8)
• P(S) = P(S1)*P(S2|S1)*P(S3|S1-2)*…*P(S8|S1-7)
• By Markov / Bigram Assumption, k = 1
• P(S) = P(S1)*P(S2|S1)*P(S3|S2)*…*P(S8|S7)

39
Observation Sequence Probability
• The ball depends only on the urn chosen.
• P(O|S) = P(O1|S1)*P(O2|S2)…*P(O8|S8)

41
Algorithm (Best Forward Probability)
• Calculate the intermediate probability of the sequence from the
starting node to the previous node for all previous states.
• P(current) = P(previous) * LP * TP
• Select the node with the best probability

42
In our example…
• IP = 0.33 for all values.
• Observation Sequence = RRGGBRGR
• Add 2 more observations – $ (start symbol) and ^ (end symbol)
• Observation Sequence = $RRGGBRGR^
• LP(U1|R) = 0.3, LP(U2|R) = 0.1, LP(U3|R) = 0.6 Urn1 Urn2 Urn3

• TP(U1|U0) = TP(U2|U0) = TP(U3|U0) = 0.33 Urn1 0.1 0.4 0.5

• P(Previous) = 1 Urn2 0.6 0.2 0.2

Urn3 0.3 0.4 0.3
• P(Current|U1) = 1 * 0.3 * 0.33 = 0.1
• P(Current|U2) = 1 * 0.1 * 0.33 = 0.033
• P(Current|U3) = 1 * 0.6 * 0.33 = 0.2 Urn1 Urn2 Urn3
• Hence, we select U3 Red 30 10 60
Green 50 40 10
43
Blue 20 50 30
State Sequence = U3…
• Observation Sequence = $RRGGBRGR^
• LP(U1|R) = 0.3, LP(U2|R) = 0.1, LP(U3|R) = 0.6
• TP(U1|U3) = 0.3, TP(U2|U3) = 0.4, TP(U3|U3) = 0.3
Urn1 Urn2 Urn3
• P(Previous) = 0.2
Urn1 0.1 0.4 0.5
• P(Current|U1) = 0.2 * 0.3 * 0.3 = 0.018 Urn2 0.6 0.2 0.2

• P(Current|U2) = 0.2 * 0.1 * 0.4 = 0.008 Urn3 0.3 0.4 0.3

• P(Current|U3) = 0.2 * 0.6 * 0.3 = 0.036

• Hence, we select U3 Urn1 Urn2 Urn3
Red 30 10 60
Green 50 40 10
44
Blue 20 50 30
TODO: To be completed in class

45
Final State Sequence: U3U3U2U1U2U1U2U1
• Final Probability = 1.79159*10-6

46
Viterbi Algorithm
• The Viterbi Algorithm was first proposed by Andrew J. Viterbi in
1967.
• The Viterbi Algorithm is a dynamic programming algorithm.
• Used for finding the most likely sequence of hidden states (called the
Viterbi Path) that results in a series of observed events.
• In PoS tagging, the hidden states correspond to the tags, and the
observed events correspond to the words / tokens.
• It is used in speech recognition, speech synthesis, sequence labelling,
etc.

47
Viterbi Algorithm – Input & Output
• Input:
• State space (S) = {S1, S2, … S|T|}
• Observation space (O) = {O1, O2, … O|V|}
• Transition matrix (T) of size |T * T|
• Emission matrix (E) of size |V * T|
• Initial probabilities matrix (I) of size |1 * T|
• Sequence of observations (Y) of length N = Y 1 Y2 … YN
• Output:
• Most likely hidden state sequence (X) = T 1 T2 … TN

48
Viterbi(O, S, I, T, E, Y):X
• For each state s from 1 to |T| do:
• Viterbi[s, 1] = I(s) * E[s, 1] //To keep track of the probability
• BP[s, 1] = 0 //To keep track of the path
• For each step t from 2 to N do:
• For each state s from 1 to |T| do:
• Viterbi[s, t] = max[k in 1 to N](Viterbi[k, t-1] * T[k, s] * O[s, t])
• BP[s,t] = argmax[k in 1 to N](Viterbi[k, t-1] * T[k, s] * O[s, t])
• ZN = argmax[S in 1 to N](Viterbi[s, N])
• XN = S[ZN]
• For i in T … 2 do:
• Zi-1 = BP[Zi, i]
• Xi-1 = S[Zi-1]
• Return X

49
Complexity Analysis
• Time complexity = O(N * |T|2)
• Space complexity = O(N * |T|)

50
Implementation Example
• Consider a doctor diagnoses fever by asking patients how they feel.
The patients may only answer that they feel normal (n), dizzy (d), or
cold (c)
• There are 2 states, “Healthy” (H) and “Fever” (F) but the doctor
cannot observe them directly. They are hidden states.
• Every day, the patient can report to the doctor whether they are
normal, or cold, or dizzy. Based on this (and the previous day’s
statement), the patient should be diagnosed as either H or F.
• Find out the diagnosis of a patient who reports “Normal Cold Dizzy”
across 3 days.
51
Implementation Example – Inputs
• States = {H, F}
• Observations = {c, d, n}
• Initial Probabilities: H = 0.6, F = 0.4

Transition Healthy Fever Emissions Healthy Fever

Healthy 0.7 0.3 Normal 0.5 0.1
Fever 0.4 0.6 Cold 0.4 0.3
Dizzy 0.1 0.6

52
Steps – Creation of the Trellis

Normal Cold Dizzy

53
Steps – Calculation of the probabilities on Day
1
Transition Healthy Fever
0.6*
0.5 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.4*
0.1

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
54
Dizzy 0.1 0.6
Steps – Calculation of the BPs on Day 1
Transition Healthy Fever
0.6*
0.5 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.4*
0.1

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
55
Dizzy 0.1 0.6
Steps – Calculation of the Probabilities on Day
2
Transition Healthy Fever
0.6* 0.3*0.7*0.4
0.5 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.4*
0.1
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
56
Dizzy 0.1 0.6
Steps – Calculation of the BPs on Day 2
Transition Healthy Fever
0.3*0.7*0.4
0.3 0.084 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
57
Dizzy 0.1 0.6
Steps – Calculation of the Probabilities on Day
3
Transition Healthy Fever
0.3*0.7*0.4 0.084*0.7*0.1
0.3 0.084 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.027*0.6*0.6
0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
58
Dizzy 0.1 0.6
Steps – Calculation of the BPs on Day 3
Transition Healthy Fever
0.3*0.7*0.4 0.084*0.7*0.1
0.3 0.084 0.0058 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.027*0.6*0.6 0.0151
0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
59
Dizzy 0.1 0.6
Steps – Calculation of the Probabilities on the
END
Transition Healthy Fever
0.3*0.7*0.4 0.084*0.7*0.1
0.3 0.084 0.0058 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.0151

0.027*0.6*0.6 0.0151
0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
60
Dizzy 0.1 0.6
Steps – Calculation of the BPs on the END
Transition Healthy Fever
0.3*0.7*0.4 0.084*0.7*0.1
0.3 0.084 0.0058 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.0151

0.027*0.6*0.6 0.0151
0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
61
Dizzy 0.1 0.6
Steps – Calculation of the FINAL PATH
Transition Healthy Fever
0.3*0.7*0.4 0.084*0.7*0.1
0.3 0.084 0.0058 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.0151

0.027*0.6*0.6 0.0151
0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
62
Dizzy 0.1 0.6
Hidden States = {Healthy, Healthy, Fever}
Transition Healthy Fever
0.3*0.7*0.4 0.084*0.7*0.1
0.3 0.084 0.0058 Healthy 0.7 0.3
Fever 0.4 0.6
START 0.6 0.4

0.0151

0.027*0.6*0.6 0.0151
0.04 0.027
0.04*0.6*0.3

Normal Cold Dizzy Emissions Healthy Fever

Normal 0.5 0.1
Cold 0.4 0.3
63
Dizzy 0.1 0.6
Homework
• Use the Viterbi Algorithm to find the most probable sequence for the
coloured ball choosing problem.

64
65
Applications of HMM in NLP
• Used in sequence labeling tasks in NLP such as part-of-speech tagging
and named entity recognition.

66
Part-of-Speech Tagging
• Involves tagging each token with a part-of-speech (Eg. noun).
• Lets say that we have only 6 tags – noun (NN), verb (VB), adjective (JJ)
adverb (RB), function word (FW) to represent all other words and
punctation (.) to represent all punctuations.
• Consider the following sentence.
• The quick brown fox jumped over the lazy dog.
• The tagged sentence is
• The_FW quick_JJ brown_JJ fox_NN jumped_VB over_FW the_FW lazy_JJ
dog_NN ._.

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 67

Calculation of Part-of-Speech Tags
• Find the best Tag sequence (T*), given the word sequence (W).
• 𝑇 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑇 𝑊 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝑃 𝑇 ∗ 𝑃(𝑊|𝑇)) (by Bayes’
theorem)
• We get 𝑃 𝑇 = 𝑃 𝑡0 ∗ ς𝑁
𝑖=1 𝑃(𝑡𝑖 |𝑡𝑖−1 ) (Bigram assumption)
• Here, 𝑃 𝑡0 is the Initial Probability
• ς𝑁
𝑖=1 𝑃(𝑡𝑖 |𝑡𝑖−1) is the Transition Probability

• Similarly, 𝑃 𝑊 𝑇 = ς𝑁
𝑖=1 𝑃(𝑤𝑖 |𝑡𝑖 )
• 𝑃(𝑤𝑖 |𝑡𝑖 ) is the Lexical Probability

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 68

Training a Part-of-Speech Tagger
• 1. Use a part-of-speech tagged corpus (Eg. Brown Corpus)
• 2. For a set of T tags and a vocabulary of size V, learn the following
tables.
• Initial Probability table = |T*1|
• Transition Probability table = |T*T|
• Lexical Probability table = |V*T|

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 69

Common Tagsets
• Penn Treebank P.O.S. Tags (upenn.edu)
• BNC Tagset
• BNC: The BNC Basic (C5) Tagset (ox.ac.uk)
• BNC: List of Tags in the BNC Enriched Tagset (ox.ac.uk)
• Universal POS tags (universaldependencies.org)

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 70

Challenges of PoS Tagging
• Unknown Words are words which are not present at the time of
training.
• How to handle them?
• Solution:
• Consider a tag set of size |T|.
1
• Lexical probability of the word given each tag, is .
|𝑇|

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 71

Evaluation of PoS Tagging
• Evaluation Method: Train – Test set / n-fold cross-validation
• Evaluation Metric: Accuracy / Precision, Recall, F-Score
• Accuracy = No. of correctly tagged tokens / Total tokens
• Precision(T) = No. of times T is correctly tagged / No. of times a word
was tagged T
• Recall(T) = No. of times T is correctly tagged / No. of times the tag T is
present in the test set
2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑇 ∗𝑅𝑒𝑐𝑎𝑙𝑙(𝑇)
• 𝐹 − 𝑆𝑐𝑜𝑟𝑒 𝑇 =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑇 +𝑅𝑒𝑐𝑎𝑙𝑙(𝑇)

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 72

Named Entities
• Named entities are anything that can be referred to with a proper
name.
• Multiple class problem
• 3 classes – PER (person), LOC (location), ORG (organization)
• 4 classes – PER (person), LOC (location), ORG (organization), GPE (geo-political
entity)
• More classes – PER (person), LOC (location), ORG (organization), GPE (geo-
political entity) + classes for dates, times, numbers, prices, etc.
• Often can include multi word phrases

73
Examples of Named Entities
Class Examples
Person Sandeep Mathias
Location Bengaluru
Organization Presidency University
Geo-political Entity Prime Minister of India

74
Named Entity Tagging
• The task of Named Entity Recognition (NER):
• Find spans of text that constitute a named entity.
• Tag the entity with the proper NER class.

75
NER Input
• Citing high fuel prices, United Airlines said Friday it has increased
fares by $6 per round trip on flights to some cities also served by
lower-cost carriers.
• American Airlines, a unit of AMR Corp., immediately matched the
move, spokesman Tim Wagner said.
• United, a unit of UAL Corp., said the increase took effect Thursday
and applies to most routes where it competes against discount
carriers, such as Chicago to Dallas and Denver to San Francisco.

76
NER – Finding NER Spans
• Citing high fuel prices, [United Airlines] said [Friday] it has increased
fares by [$6] per round trip on flights to some cities also served by
lower-cost carriers.
• [American Airlines], a unit of [AMR Corp.], immediately matched the
move, spokesman [Tim Wagner] said.
• [United], a unit of [UAL Corp.], said the increase took effect
[Thursday] and applies to most routes where it competes against
discount carriers, such as [Chicago] to [Dallas] and [Denver] to [San
Francisco].

77
NER Output
• Citing high fuel prices, [ORG United Airlines] said [TIME Friday] it has
increased fares by [MONEY $6] per round trip on flights to some cities
also served by lower-cost carriers.
• [ORG American Airlines], a unit of [ORG AMR Corp.], immediately
matched the move, spokesman [PER Tim Wagner] said.
• [ORG United], a unit of [ORG UAL Corp.], said the increase took effect
[TIME Thursday] and applies to most routes where it competes against
discount carriers, such as [LOC Chicago] to [LOC Dallas] and [LOC Denver]
to [LOC San Francisco].

78
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [Washington] was born into slavery on the farm of James Burroughs.
• [Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [Washington] for what may well be his last state visit.
• In June, [Washington] legislators passed a primary seatbelt law.

79
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [PER Washington] was born into slavery on the farm of James Burroughs.
• [ORG Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [LOC Washington] for what may well be his last state visit.
• In June, [GPE Washington] legislators passed a primary seatbelt law.

80
BIO-Tagging
• Converting the NER tagging with 1 label for multiple words, to a
sequence labeling problem like PoS tagging with 1 tag per word.
• Consider the sentence: “[PER Jane Villanueva] of [ORG United] , a
unit of [ORG United Airlines Holding] , said the fare applies to the
[LOC Chicago] route.”
• Instead of just marking the spans, we also mark out whether it is the
beginning (B), or inside (I) part of the span. Words outside the span
are tagged as other (O).

81
BIO Tagging
• The sentence: “[PER Jane Villanueva] of [ORG United], a unit of [ORG
United Airlines Holding] , said the fare applies to the [LOC Chicago]
route.”
• Becomes:
• “Jane_B-PER Villanueva_I-PER of_O United_B-ORG ,_O a_O unit_O
of_O United_B-ORG Airlines_I-ORG Holding_I-ORG ,_O said_O the_O
fare_O applies_O to_O the_O Chicago_B-LOC route_O ._O”
• Total Number of Tags = 2n+1

82
Other BIO Tagging variants
• IO Label – I is inside the span, O is outside the span.
• BIO Label – B is beginning of the span, I is inside the span, O is
outside the span.
• BIOES Label – B is beginning of the span, I is inside the span, O is
outside the span, E is end of the span, and S is to represent a single
element tag.

83
Standard Algorithms for NER
• Many supervised sequence labeling models can be used.
• Hidden Markov Models (HMM)
• Conditional Random Fields (CRF)
• Maximum Entropy Markov Models (MEMM)
• Neural Sequence Models
• Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), etc.
• Pre-trained Language Models – Eg. BERT

دليل المعلم Mega Goal 3
No ratings yet
دليل المعلم Mega Goal 3
257 pages
Application of Digital Signal Processing in Radar: A Study
No ratings yet
Application of Digital Signal Processing in Radar: A Study
4 pages
08 Probability and Baysian Reasoning
No ratings yet
08 Probability and Baysian Reasoning
46 pages
5_Chat3_part_1
No ratings yet
5_Chat3_part_1
31 pages
Random events probability combinatorics
No ratings yet
Random events probability combinatorics
21 pages
Conditional probability, Bayes rule
No ratings yet
Conditional probability, Bayes rule
22 pages
Topic01 - Probability
No ratings yet
Topic01 - Probability
80 pages
Lecture 7-Probability
No ratings yet
Lecture 7-Probability
29 pages
Unit IV
No ratings yet
Unit IV
28 pages
Addition Rules Math
No ratings yet
Addition Rules Math
21 pages
02 Probability in Genetics
No ratings yet
02 Probability in Genetics
49 pages
Probability Lecture
No ratings yet
Probability Lecture
9 pages
Chapter04 Probability Distribution I
No ratings yet
Chapter04 Probability Distribution I
17 pages
Exercise 3 26 10 2023
No ratings yet
Exercise 3 26 10 2023
46 pages
10 Bayes Theorem
No ratings yet
10 Bayes Theorem
36 pages
Probability Concepts
No ratings yet
Probability Concepts
32 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
2. Statistics and Data Analysis
No ratings yet
2. Statistics and Data Analysis
229 pages
Earthquake Resistant Design: Basic Concepts of Probability
No ratings yet
Earthquake Resistant Design: Basic Concepts of Probability
40 pages
Weeks 1 and 2 Math381
No ratings yet
Weeks 1 and 2 Math381
37 pages
2 - Probability Distributions
No ratings yet
2 - Probability Distributions
63 pages
Presntation Slides
No ratings yet
Presntation Slides
43 pages
EMBA Day8
No ratings yet
EMBA Day8
73 pages
5juni2021 RandomVariabel
No ratings yet
5juni2021 RandomVariabel
17 pages
A Survey of Probability Concepts Chapter 5 0
No ratings yet
A Survey of Probability Concepts Chapter 5 0
33 pages
Part IA - Probability: Based On Lectures by R. Weber
No ratings yet
Part IA - Probability: Based On Lectures by R. Weber
78 pages
Mathematics Probability
No ratings yet
Mathematics Probability
2 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
5 Probability (2)
No ratings yet
5 Probability (2)
51 pages
Slides Chp05 Stats 20221
No ratings yet
Slides Chp05 Stats 20221
43 pages
3 Probability-Rev1
No ratings yet
3 Probability-Rev1
31 pages
Lecture 02
No ratings yet
Lecture 02
25 pages
Lecture 1 Probability Theory I Basic Concepts
No ratings yet
Lecture 1 Probability Theory I Basic Concepts
38 pages
Random Variables and Distributions - New
No ratings yet
Random Variables and Distributions - New
84 pages
Basic Statistics in Fluid Mechanics
No ratings yet
Basic Statistics in Fluid Mechanics
34 pages
L18 Probability
No ratings yet
L18 Probability
81 pages
Probability Theory 2
No ratings yet
Probability Theory 2
31 pages
Probability
No ratings yet
Probability
15 pages
Stats10 - Chapter 5
No ratings yet
Stats10 - Chapter 5
43 pages
report in stat.
No ratings yet
report in stat.
43 pages
Basic Probability Theory 2024
No ratings yet
Basic Probability Theory 2024
29 pages
Probability and Probability Distribution PDF
No ratings yet
Probability and Probability Distribution PDF
69 pages
Probabilities
100% (1)
Probabilities
106 pages
8.1 Experimental Probability & Theoretical Probability
No ratings yet
8.1 Experimental Probability & Theoretical Probability
19 pages
Chapter 14
No ratings yet
Chapter 14
10 pages
Probability and Probability Distribution
No ratings yet
Probability and Probability Distribution
52 pages
PROBABILITIES and Probability Distribution
100% (1)
PROBABILITIES and Probability Distribution
55 pages
Additional Notes - Unit 3 - Basic Probability
No ratings yet
Additional Notes - Unit 3 - Basic Probability
25 pages
Introduction to probability
No ratings yet
Introduction to probability
38 pages
3_prob-review
No ratings yet
3_prob-review
77 pages
Chapter Five
No ratings yet
Chapter Five
35 pages
Uncertainity Measure
No ratings yet
Uncertainity Measure
64 pages
4-AI Probability
No ratings yet
4-AI Probability
19 pages
Naive Bayes Algorithm: Machine Learning
No ratings yet
Naive Bayes Algorithm: Machine Learning
19 pages
DM Unit III CH 1final
No ratings yet
DM Unit III CH 1final
60 pages
Probability and Distribution
No ratings yet
Probability and Distribution
43 pages
Math 8 Q4 Week 8
100% (1)
Math 8 Q4 Week 8
11 pages
CsC 203 Discrete Structures Week 11
No ratings yet
CsC 203 Discrete Structures Week 11
3 pages
Summarized Probability Descriptive Statistics a1XwYBP5Ag
No ratings yet
Summarized Probability Descriptive Statistics a1XwYBP5Ag
72 pages
PROBABILITY Merged
No ratings yet
PROBABILITY Merged
47 pages
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Cse2018 Marks Entry 5cse05 m Swapna
No ratings yet
Cse2018 Marks Entry 5cse05 m Swapna
2 pages
Cse2011 DCCN m1 Notes
No ratings yet
Cse2011 DCCN m1 Notes
48 pages
Sodapdf
No ratings yet
Sodapdf
65 pages
Blue and Gold Minimalist Trifold Brochure-9
No ratings yet
Blue and Gold Minimalist Trifold Brochure-9
2 pages
CSE2013 Module 3 - Cloud Mechanisms
No ratings yet
CSE2013 Module 3 - Cloud Mechanisms
89 pages
Final - Module-4 Cloud Computing - May 8, 2023
No ratings yet
Final - Module-4 Cloud Computing - May 8, 2023
88 pages
Adobe Scan 16-Jan-2024
No ratings yet
Adobe Scan 16-Jan-2024
2 pages
Routing Overview - Static Route
No ratings yet
Routing Overview - Static Route
20 pages
Small Size, Big Power: Fast, Accurate, Versatile XRF Analysis
No ratings yet
Small Size, Big Power: Fast, Accurate, Versatile XRF Analysis
2 pages
SWIFTgpi Newsflash July Application Providers v02
No ratings yet
SWIFTgpi Newsflash July Application Providers v02
12 pages
Energy Analyzer For Three-Phase Systems: Benefits
No ratings yet
Energy Analyzer For Three-Phase Systems: Benefits
22 pages
MNDA Affiliate Ver001 ForReview
No ratings yet
MNDA Affiliate Ver001 ForReview
5 pages
Excelandia User Guide V2
No ratings yet
Excelandia User Guide V2
27 pages
Solidworks Swift Technology: Inspiration
No ratings yet
Solidworks Swift Technology: Inspiration
9 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Application of Robotic Process Automation
No ratings yet
Application of Robotic Process Automation
10 pages
Design: 2.1 How Not To Design
No ratings yet
Design: 2.1 How Not To Design
16 pages
Service Manual Lexmark X264 - X363 - X364 - 7013
No ratings yet
Service Manual Lexmark X264 - X363 - X364 - 7013
244 pages
Fragmented Contract Management: Challenges, Impacts and Solutions
No ratings yet
Fragmented Contract Management: Challenges, Impacts and Solutions
22 pages
Modern Systems Analysis and Design
No ratings yet
Modern Systems Analysis and Design
35 pages
Sumathi CMM
No ratings yet
Sumathi CMM
2 pages
Transmission Line Installation & Maintenance NC III
No ratings yet
Transmission Line Installation & Maintenance NC III
81 pages
Keywords: Registration System, Development
100% (1)
Keywords: Registration System, Development
31 pages
Isscc 2019 1
No ratings yet
Isscc 2019 1
3 pages
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
No ratings yet
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
76 pages
HL720 3
100% (1)
HL720 3
471 pages
Links For Competitive Programming
No ratings yet
Links For Competitive Programming
3 pages
The CIS Community Attack Model
No ratings yet
The CIS Community Attack Model
13 pages
CS-7 Operation Manual (en)_A47FBA01EN25
No ratings yet
CS-7 Operation Manual (en)_A47FBA01EN25
464 pages
ccs372 Vir Manual
No ratings yet
ccs372 Vir Manual
120 pages
SVANidhi Se Samriddhi FAQs
No ratings yet
SVANidhi Se Samriddhi FAQs
8 pages
Laudon-Traver Ec10 PPT ch01
No ratings yet
Laudon-Traver Ec10 PPT ch01
31 pages
DXi6900 Series Datasheet (DS00475A)
No ratings yet
DXi6900 Series Datasheet (DS00475A)
2 pages
Yaesu Amateur Radio Digital Specs 1V02 en-GB
No ratings yet
Yaesu Amateur Radio Digital Specs 1V02 en-GB
38 pages
Aiml Project
No ratings yet
Aiml Project
22 pages