AI Learning
AI Learning
AI Learning
Open Elective
• ROTE learning
• Discovery
• Analogy
• Discover new things or structures which were previously unknown (data mining,
scientific discovery)
• Build software agents that can adapt to their users or to other software agents
• Machine learning systems also discover patterns without prior expected results
• Open box: changes are clearly visible in the knowledge base and clearly
interpretable by the human users.
• Black box: changes done to the system are not readily visible or understandable.
4
Learner Architecture
• Machine learning systems has the four main components:
➢ Knowledge Base (KB):
✓ what is being learnt
✓ Representation of domain
✓ Description and representation of problem space
➢ Learner: takes output from the critic and modifies something in the KB or the
performer.
5
Learning Agent Architecture
6
Learning Examples
Problem Representation Performer (interacts Critic (human player) Learner(elicits new
with human) questions to modify
KBP
Animal guessing Binary decision tree Walk the tree and ask Human feedback Elicit a question from
game associated questions the user and add it to
the binary tree
Playing chess The board layout, game Chain through the rules Who won (credit Increase the weight for
rules, moves to identify move, use assignment problem) some rules and
conflict resolution to decrease for others.
choose one, output the
move.
Categorizing Vector of word frequencies, Apply appropriate A set of human- Modify the weights on
documents corpus of documents functions to identify categorized documents the function and
which category the file improve categorization
belongs to
Fixing computers Frequency matrix of causes Use known symptoms Human input about Update the frequency
and symptoms to identify potential symptoms and cause matrix with actual
causes. observed for a specific symptoms and
case outcomes
Identifying digits in Probability of digits, matrix Input the features for a Human categorized Modify the weights on
Optical Character of pixels, percentage of digit, output probability training set the network of
Recognition light, no: of straight lines that it is one in the set associations
from 0 to 9.
7
Learning Paradigms
Paradigm Description
• It is also called memorization because the knowledge, without any modification is,
simply copied into the knowledge base. Direct entry of rules and facts
• As computed values are stored, this technique can save a significant amount of time.
• Rote learning technique can also be used in complex learning systems provided
sophisticated techniques are employed to use the stored values faster and there is a
generalization to keep the number of stored information down to a manageable level.
• Checkers-playing program, for example, uses this technique to learn the board
positions it evaluates in its look-ahead search. 9
Rote Learning
• Depends on two important capabilities of complex learning systems:
10
Learning by taking advice
• This type is the easiest and simple way of learning.
• Also, there can be several sources for taking advice such as humans(experts),
internet etc.
• However, this type of learning has a more necessity of inference than rote learning.
• The programs shall operationalize the advice by turning it into a single or multiple
expressions that contain concepts and actions that the program can use while under
execution.
• This ability to operationalize knowledge is very critical for learning. This is also an
11
important aspect of Explanation Based Learning (EBL).
Learning in Problem Solving
• When the program does not learn from advice, it can learn by generalizing from its
own experiences.
➢ Learning by chunking
12
Learning in Problem Solving
Learning by parameter adjustment
• Here the learning system relies on evaluation procedure that combines information from
several sources into a single summary static.
• For example, the factors such as demand and production capacity may be combined into a
single score to indicate the chance for increase of production.
• But it is difficult to know a priori how much weight should be attached to each factor.
• The correct weight can be found by taking some estimate of the correct settings and then allow
the program modify its settings based on its experience.
• Features that appear to be good predictors of overall success will have their weights
increases, while those that do not will have their weights decreased.
• In game programs, for example, the factors such as piece advantage and mobility are
combined into a single score to decide whether a particular board position is desirable. This
single score is nothing but a knowledge which the program gathered by means of calculation.
13
Learning in Problem Solving
Learning by parameter adjustment
• The t terms are the values of the features that contribute to the evaluation. The
c terms are the coefficients or weights that are attached to each of these
values. As learning progresses, the c values will change.
• This method is very useful in situations where very little additional knowledge is
available or in programs in which it is combined with more knowledge intensive
methods.
15
Learning in Problem Solving
Learning with Macro-Operators
• A production system consists of a set of rules that are in if-then form. That is given a
particular situation, what are the actions to be performed. For example, if it is raining then
take umbrella.
• Production system also contains knowledge base, control strategy and a rule applier. To
solve a problem, a system will compare the present situation with the left hand side of the
rules. If there is a match then the system will perform the actions described in the right
hand side of the corresponding rule.
• Problem solvers solve problems by applying the rules. Some of these rules may be more
useful than others and the results are stored as a chunk.
• Several chunks may encode a single macro-operator and one chunk may
participate in a number of macro sequences.
17
Learning in Problem Solving
Learning by chunking
• Chunks learned in the beginning of problem solving, may be used in the later stage. The
system keeps the chunk to use it in solving other problems.
• Soar is a general cognitive architecture for developing intelligent systems. Soar requires
knowledge to solve various problems. It acquires knowledge using chunking mechanism.
The system learns reflexively when impasses have been resolved. An impasse arises
when the system does not have sufficient knowledge. Consequently, Soar chooses a new
problem space (set of states and the operators that manipulate the states) in a bid to
resolve the impasse. While resolving the impasse, the individual steps of the task plan
are grouped into larger steps known as chunks. The chunks decrease the problem space
search and so increase the efficiency of performing the task.
• In Soar, the knowledge is stored in long-term memory. Soar uses the chunking
mechanism to create productions that are stored in long-term memory. A chunk is nothing
but a large production that does the work of an entire sequence of smaller ones. The
productions have a set of conditions or patterns to be matched to working memory which
consists of current goals, problem spaces, states and operators and a set of actions to
perform when the production fires. Chunks are generalized before storing. When the
same impasse occurs again, the chunks so collected can be used to resolve it.
18
Learning in Problem Solving
The Utility Problem
• The utility problem in learning systems occurs when knowledge learned in an attempt to
improve a system's performance degrades it instead.
• The problem appears in many AI systems, but it is most familiar in speedup learning.
Speedup learning systems are designed to improve their performance by learning control
rules which guide their problem-solving performance. These systems often exhibit the
undesirable property of actually slowing down if they are allowed to learn in an
unrestricted fashion.
• Each individual control rule is guaranteed to have a positive utility (improve performance)
but, in concert, they have a negative utility (degrade performance).
• One of the causes of the utility problem is the serial nature of current hardware. The more
control rules that speedup learning systems acquire, the longer it takes for the system to
test them on each cycle.
• One solution to the utility problem is to design a parallel memory system to eliminate the
increase in match cost. his approach moves the matching problem away from the central
processor and into the memory of the system. These so-called active memories allow
memory search to occur in "nearly constant-time" in the number of data items, relying on
the memory for fast, simple inference and reminding.
19
Learning in Problem Solving
The Utility Problem
• PRODIGY program maintains a utility measure for each control rule. This measure takes
into account the average savings provided by the rule, the frequency of its application and
the cost of matching it.
• If not, it is placed in long term memory with the other rules. It is then monitored during
subsequent problem solving.
• If its utility falls, the rule I discarded.
• Empirical experiments have demonstrated the effectiveness of keeping only those control
rules with high utility.
20
Learning by Analogy
Qa=3 Qb=9 I1 I2
I3=I1+I2
Qc=?
One may infer, by analogy, that hydraulics laws are similar to Kirchoff's
laws, and Ohm's law.
21
Learning by Analogy
Examples of analogies:
22
Transformational Analogy
Look for a similar solution and copy it to the
new situation making suitable substitutions
where appropriate.
E.g. Geometry.
If you know about lengths of line segments and
a proof that certain lines are equal then we can
make similar assertions about angles.
Transformational analogy does not look at how the problem was solved -- it
only looks at the final solution. The history of the problem solution - the steps
involved - are often relevant.
24
Derivational Analogy
GIVEN: AB = CD GIVEN: <BAC = <DAE
PROVE: AC = BD = ( AB <- <BAC PROVE: <BAD = <CAE
D CD < - <DAE B C
AC <- <BAD
C
BD < - <CAE )
B D
A
A E
AB = CD <BAC = <DAE
BC = BC <CAD = <CAD
AB + BC = BC + CD <BAC + <CAD= <CAD + <DAE
AC = BD <BAD = <CAE
25
Explanation based Learning
26
Explanation based Learning
27
Explanation based Learning
28
Explanation based Learning
29
Explanation based Learning
30
Explanation based Learning
31
Learning by Discovery
An entity acquires knowledge without the help of a teacher.
• Integers-- it is possible to count the elements of this set and this is an the
image of this counting function -- the integers -- interesting set in its own right.
• Addition-- The union of two disjoint sets and their counting function
• Prime Numbers-- factorisation of numbers and numbers with only one factor
were discovered.
Many discoveries are made from observing data obtained from the world and making
sense of it -- E.g. Astrophysics - discovery of planets, Quantum mechanics - discovery of
sub-atomic particles.
• BACON holds some constant and attempts to notice trends in the data.
• Inferences made.
BACON has also been applied to Kepler's 3rd law, Ohm's law, conservation of 34
momentum and Joule's law.
Learning by Discovery
Clustering
• It is a common descriptive task where one seeks to identify a finite set of categories
or clusters to describe the data. For example, we may want to cluster houses to find
distribution patterns.
• A cluster is a collection of data objects that are similar to one another within the same
cluster and are dissimilar to the objects in other clusters. Clustering analysis helps
construct meaningful partitioning of a large set of objects.
35
Learning by Discovery
Clustering
The task of clustering is to maximize the intra-class similarity and minimize the interclass
similarity.
• Given N k-dimensional feature vectors, find a "meaningful" partition of the N
examples into c subsets or groups
• Discover the "labels" automatically
• c may be given, or "discovered“
• much more difficult than classification, since in the latter the groups are given, and we
seek a compact description
36
Learning by Discovery
AutoClass
• AutoClass is a clustering algorithm based upon the Bayesian approach for
determining optimal classes in large datasets.
·
• Given a set X={X1, …, Xn} of data instances Xi with unknown classes, the goal of
Bayesian classification is to search for the best class description that predicts the
data in a model space.
·
• Class membership is expressed probabilistically.
• AutoClass calculates the likelihood of each instance belonging to each class C and
then calculates a set of weights wij=(Ci / SjCj) for each instance.
• Weighted statistics relevant to each term of the class likelihood are calculated for
estimating the class model.
• The classification step is the most computationally intensive. It computes the weights
37
of every instance for each class and computes the parameters of a classification.
Formal Learning
Formal learning theory
• Theory of the learnable by Valiant: classifies problems by how difficult they are to learn.
• Formally, a device can learn a concept if it can, given positive and negative examples,
produce an algorithm that will classify future examples correctly with probability 1/h.
• If the number of training examples is a polynomial in h,t, f, then the system is said to be
trainable.
38
Formal Learning
39
Other Learning Models
Neural net learning and genetic learning
• Neural networks
40
Learning in Problem Solving
Neural net learning and genetic learning
• Neural networks
41
What is learning
42
What is learning
43