Part 3
Part 3
Part 3
In the next three chapters we examine the organization and manipulation of knowledge.
This chapter is concerned with search, an operation required in almost all Al programs.
Chapter 10 covers the comparison or matching of data structures and in particular
pattern matching, while Chapter II is concerned with the organization of knowledge
in memory.
Search is one of the operational tasks that characterize Al programs best.
Almost every Al program depends on a search procedure to perform its prescribed
functions. Problems are typically defined in terms of states, and solutions correspond
to goal States. Solving a problem then amounts to searching through the different
states until one or more of the goal states are found. In this. chapter we investigate
search techniques that will be referred to often in subsequent chapters.
9.1 INTRODUCTION
Consider the process of playing a game such as chess. Each board configuration
can be thought of as representing a different state of the game. A change of State
occurs when one of the players moves a piece. A goal state is any of the possible
board configurations corresponding to a checkmate.
It has been estimated that the game of chess has more than 10° possible
167
16$ Search and Control Strategies Chap. 9
states. (To see this, just note that there are about 20 alternative moves for each
board configuration and more than 1(0 different configurations. Thus, there are
more than 20° = I(] Iil * 2' > IO U"). This i4 another example ofthe combinatorial
explosion problem. The number of states grows exponentially with the number of
basic elements. Winning a game amounts to finding a sequence of states through
this maze of possible states that leads to one of the goal states.
An ''intelligent" chess playing program certainly would not play the game
by exploring all possible moves (it would never finish in our lifetime nor in your
distant descendent's lifetimes). Like a human, the program must eliminate many
questionable states when playing. But, even with the elimination of numerous states,
there is still much searching to be done since finding good moves at each state of
the game often requires looking ahead a few moves and evaluating the consequences.
This type of problem is not limited to games. Search is ubiquitous in Al. For
every interesting problem there are numerous alternatives to consider. When attempt-
ing to understand a natural language, a program must search to find matching words
that are known (a dictionary), s .tenee constructions, and matching contexts. In
vision perception, program searches must be performed to find model patterns that
match input scenes. In theorem proving, clauses must be found by searching axioms
and assertions which resolve together to give the empty clause. This requires a
search of literals which unify and then a search to find resolvable clauses. In planning
problems, a number of potential alternatives must be examined before a good workable
plan can be formulated. As in learning, many potential hypotheses be considered
before a good one is chosen.
Time and space complexities of algorithms may be defined in terms of their best.
their average, or their worst-case performance in completing some task. In evaluating
different search strategies, we follow the usual convention of considering worst-
Sec. 9.2 Preliminary Concepts 169
case performances and look for ways to improve on them. For this, we need the 0
(for order) notation.
Let! and g be functions of n, where algorithm A has size n. The size can be
the number of problem states, the number of input characters which specify the
problem or some similar number. Let 1(n) denote the time (or space) required to
solve a given problem using algorithm A. We say 'f is big 0 of g" written f =
0(g), if and only if there exists a constant c > 0 and an integer n0 , such that f(n)
cg(n) for all n n0 . Stated more simply, algorithm A solves a problem in at
most cg(n) units or steps for all but a finite number of steps. Based on this definition,
we say an algorithm is of linear time if it is 0(n). It is of quadratic time if it is
0(n 2 ). and of exponential time if it is 0(2") for some constant k (or if it is OlbA.
for any real number b > I).
For example, if a knowledge base has ten assertions (clauses), with an average
of five literals per clause, and a resolution proof is being performed with no particular
strategy; a worst-case prof may require as many as 1125 comparisons (52 X 10(9)!
2) for a single resolution and several times this number for a complete proof.
Task
node
and node/
Send to
piuhop
representation unless noted otherwise. And-Or graph searches are covered in Section
9.6.
In this section we describe three typical problems which illustate the concepts defined
above and which are used in subsequent sections to portray different search techniques.
The problems considered are the often-used examples, the eight puzzle and the
traveling salesman problem.
The eight puzzle consists of a 3-by-3 square frame which holds eight movable
square tiles which arc numbered from I to 8. One square is empty, permitting tiles
Sec. 9.3 Examples of Search Problems 171
3 8 I 2 3
825 8 4
47 785
Aitait Agoet
configurato confiratio Figure92 The eight puzzle game.
to be shifted (Figure 9.2). The objective of the puzzle is to find a sequence of tile
movements that leads from a starting configuration to a goal configuration such as
that shown in Figure 9.2.
The states of the eight puzzle are the different permutations of the tiles within
the frame. The operations are the permissible moves (one may consider the empty
space as being moveable rather than the tiles): up, down, left, and right. An optimal
or good solution is one that maps an initial arrangement of tiles to the goal configuration
with the smallest number of moves.
The search space for the eight puzzle problem may be depicted as the tree
shown in Figure 9.3.
• ,• .'
In the figure. the nodes are depicted as puzzle configurations. The root node
represents a randomly chosen starting configuration. and its successor nodes corre-
spond to the three single tile movements that are possible from the root A path is
a sequence of nodes starling from the root and progressing downward, to the goal
node.
The traveling salesman problem involves n cities with paths connecting the cities.
A tour is any path which begins with some starling city, visits each of the other
cities exactly once, and returns to the starting city. A typical tour is depicted in
Figure 9.4.
The objective of a traveling salesman problem is to find a minimal distance
tour. To explore all such tours requires an exponential amount of time. For cxanipI,
a minimal solution with only 10 cities is tractable (3.628.000 tours). One with 20
or more cities is not, since a worst-case search requires oil the order of 20! (about
23 x 10') tours. The state space for the problem can also be represented as a
graph as depicted in Figure 9.5.
Without knowing in advance the length ofa minimum tour, it would be necessary
to traverse each of the distinct paths shown in Figure 9.5 and compare their lengths.
This requires some O(n!) traverses through the graph, an exponential number.
The General Problem Solver was developed by Newell. Simon. and Shaw (Ernst
and Newell, 1969) in the late 1950s. It was important as a research tool for several
reasons and notable as the first Al system which cleanly serated the task knowledge
from the problem solving part.
General Problem Solver was designed to solve a variety of problems that
could be formulated as a set of objects and operators, where the operators Were
applied to the objects to transform them into a goal object through a sequence of
applications.
Given an initial object (state) and a goal object (state), the system attempted
to transform the initial object to the goal object through a series of operator application
transformations. It used a set of methods similar to those discussed in Chapter 8
for each goal type, to achieve that goal by recursively creating and solving subgoals.
The basic method is known as means-end analysis, which we now describe.
Startinç City
Next city
/V . V ...........
Figure 9.5 State space representation for the TSP.
I. Comparing the current state S i to a goal state S. and computing the difference
In carry ing out these methods, the General Problem Solver may transform
some S, into an intermediate state S, to reduce the difference D,, between states S
and S, then apply another operator O to the S. and so on until the state S, is
obtained. Differences that may occur between objects will. of course, depend on
the task domain. --
As an example, in proving theorems in propositional logic, some common
differences that occur are a variable may appear in one object and not in the other.
174 Search and Control Strategies Chap. 9
R&P-->O)
(P-->Q)&R
V 0) & A
(PVQ) & A
a variable may occur a different number of times between two objects, objects will
have different signs or different connectives, associative groupings will differ, and
so on.
To ir1ujate the search process, we assume the General Problem Solver operators
are rewrite rules of the following form:
WI: IAVB)-.(BVA)
R2: (A&B)-.(B&A)
R3: (A-.B)-.(B.-.A)
R4: (A-. B)-. ('AVB)
In a worst can situation the only information available will be the ability to
distinguish goal from nongoal nodes. When no further information is known a priori.
a search program must perform a blind or uninformed search. 4 blind or uninformed
search algorithm is one that uses no information other than the initial state, the
search operators, and a test for a solution. A blind search should proceed in a
systematic way by exploring nodes in some predetermined order or simply by selecting
nodes at random. We consider only systematic search procedures in this section.
Search programs may be required to return only a solution value when a goal
is found or to record and return the solution path as well. To simplify the descriptions
that follow, we assume that only the goal value is returned. To also return, the
path requires making a list of nodes on the path or setting back-pointers to ancestor
nodes along the path.
Breadth-First Search
Breadth-first searches are performed by exploring all nodes at a given depth before
proceeding to the next level. This means that all immediate children of nodes are
explored before any of the children's children are considered. Breadth first tree
search is illustrated in Figure 9.7. It has the obvious advantage of always finding a
minimal path length solution when one exists. However, a great many nodes may
need to be explored before a solution is found, especiall y if the tree is very full.
An algorithm for the breadth-first search is quite simple. It uses a queue structure
to hold all generated but still unexplored nodes-. The order in which nodes are
placed on the queue for removal and exploration determines the type of search.
The breadth-first algorithm proceeds as follows.
BREADTH-FIRST SEARCH
4. Remove and expand the first element from the queue and place all the children
at the end of the queue in any order.
5. Return to step 2.
The time complexity of the breadth-first search is 0(b"). This can be seen by
noting that all nodes up to the goal depth d are generated. Therefore, the number
generated is b + b + . + W which is 0(b"). The space complexity is also
0(b) since all nodes at a given depth must be stored , in order to generate the
nodes at the next depth, that is, bd I nodes must be stored at depth d - I to
generate nodes at depth d, which gives space complexity of 0(//'). The use of
both exponential time and space is one of the main drawbacks of the breadth-first
search.
Depth-First Search
/\.
Figure 9.8 Depth-first scireh of a tree,
Sec.. 9.4 Uninformed or Blind Search 171
DEPTH-FEW SEARCH
1. Place the starting node $ On the queue.
2. if the queue is empty, return failure and stop.
3. If the first element on the queue is a goal node g, return success and stop.
Otherwise,
4. Remove and expand the first element, and place the children at the front of
the queue (in any order).
S. Return to step 2.
The depth-first search is preferred over the breadth-first when the search tree
is known to have a plentiful number of goals. Otherwise, depth-first may never
find a solution. The depth cutoff also introduces some problems. If it is set too
shallow, goals may be missed; if set too deep, extia computation may be performed.
The time complexity of the depth-first tree search is the same as that for
breadth-first. 0(b4 ). It is less demanding in space requirements, however, since
only the path from the starting node to the current node needs to be stored .. Therefore,
if the depth cutoff is d, the space complexity is just 0(d).
Bidirectional Search
When a problem has a single goal state that is given explicitly, and all node generation
operators have inverses, bidirectional search can be used. (This is the case with
13-
178 Search and Control Strategies . Chap. 9
the eight puzzle described above, for example). Bidirectional search is performed
by searching forward from the initial node and backward from the goal node simulta-
neously. To do so, the program must store the nodes generated on both search
frontiers until a common node is found. With some modifications, all three of the
blind search methods described above may be used to perform bidirectional search.
For example, to perform bidirectional depth-first iterative deepening search
to a depth of k, the search is made from one direction and the nodes at depth k are
stored. At the same time, a search to a depth of k and k + I is made from the
other direction and all nodes generated are matched against the nodes stored from
the other side. These nodes need not, be stored, but a search of the two depths is
needed to account for odd-length paths. This process is repeated for lengths k = 0
to d12 from both directions.
The time and space complexities for bidirectional depth-first iterative deepening
search are both 0(b"2 ) when the node matching is done in constant time per node.
Since the number of nodes to be searched using the blind search methods
described above increase as fr' with depth d, such problems become intractable for
large depths. It, therefore, behooves us to consider alternative methods. Such method..
depend on some knowledge to limit the number of problem states visited. We turn
to these methods now in the next Section.
When more information than the initial state, the operators, and the goal test is
available, the size of the search space can usually be constrained. When this is the
case, the better the information available, the more efficient the search process
will be. Such methods are known as informed search methods. They often depend
on the use of heuristic information. In this section, we examine search strategies
based on the use of some problem domain information, and in particular, on the
use of heuristic search functions.
Heuristic Information
Information about the problem (the nature of the states, the cost of transforming
irom one state to another, the promise of taking a certain path, and the characteristic
of the goals) can sometimes be used to help guide the search more efficiently.
This information can often be expressed in the form of a heuristic evaluation function
a function of the nodes n and/or the goals g.
Recall that a heuristic is a rule of thumb or judgmental technique that leads
to a solution some of the time but provides no guarantee of success. It may in fact
end in failure. Heuristics play an important role in search strategies because of the
poncntial nature of most problems. They help to reduc' the number of alternatives
from an exponential number to a polynomial number and, thereby, obtain a solution
Sec. 9.5 Informed Search 179
Search methods based on hill climbing get their names from the way the nodes are
selected for expansion. At each point in the search path, a successor node that
appears to lead most quickly to the top of the hill (the goal) is selected for exploration.
This method requires that some information be available with which to evaluate
and order the most promising choices,
Hill climbing is like depth-first searching where the most promising child is
selected for expansion. When the children have been generated. alternative choices
are evaluated using some type of heuristic function. The path that appears most
promising is men chosen and no further reference to the parent or other children is
retained. This process continues from node-to-node with previously expanded nodes
being discarded. Atypical path is illustrated in Figure 9.9 where the numbers by a
node correspond to the computed estimates of the goal distance for alternative paths
Hill climbing can produce substantial savings over blind searches when an
informative, reliable function is available to guide the search to a global goal. It
suffers from some serious drawbacks when this is not the case. Potential problem
types named after certain terrestrial anomalies are the foothill, ridge, and plateau
traps.
The foothill trap results when local maxima o? peaks are found. In this case
the children all have less promising goal distances than the parent node. The search
is essentially trapped at the local node with no indication of goal direction. The
only way to remedy this problem is to try moving in some arbitrary direction a
few generations in the hope that the real goal direction will become evident, backtrack-
180 Search and Control Strategies Chap. 9
ins to an ancestor node and trying a secondary path choice, or altering the computation
procedure to expand ahead a few generations each time before choosing a path.
A second potential problem occurs when several adjoining nodes have higher
values than surrounding nodes. This is the equivalent of a ridge. It too is a form
of local trap and the only remedy is to try to escape as in the foothill case above.
1
Finally, the search ma encounter a plateau type of structure, that is, an area
in which all neighboring nodes have the same values. Once again, one of the methods
noted above must be tried to escape the trap.
The problems encountered with hill climbing can be avoided using a best-
first search approach.
Best-First Search
Best-first search also depends on the use of a heuristic to select most promising
paths to the goal node. Unlike hill climbing, however, this algorithm retains all
estimates computed for previously generated nodes and makes its selection based
on the best among them all. Thus, at an y point in the search process. hest-trt
moves forward from the most promising of all the nodes generated so far. In so
doing, it avoids the potential traps encountered in hill climbing. The best-first process
as estimates
is illustrated in Figure 9.10 where numbers by the nodes may he regarded
of the distance or cost to reach the goal node.
The algorithm we give for best first search differs from the previous blind
search algorithms only in the way the nodes are saved and ordered on the queue.
The algorithm reads as follows.
BEST-FIRST SEARCH
I. Place the starting node s on the queue.
2. If the queue is empty, return failure and stop.
Sec. 9.5 Informed Search 181
3. If the first element on the queue is a goal node g. return success and stop.
Otherwise,
4. Remove the first element from the queue, expand it and compute the estimated
goal distances for each child Place the children on the queue (at either end)
and arrange all queue elements in ascending order corresponding to goal distance
from the front of the queue.
S. Return to step 2.
Best-first searches will always , find good paths to a goal, even when local
anomalies are encountered. All that is required is that a good measure of goal
distance be used.
Branch-and-Bound Search
BRANCH-AND-BOUND SEARCH
• 1. Place the start 'node of zero path length on the queue.
2. Until the queue is empty or a goal node has been found: (a) determine if the
first path in the queue contains a goal node. (b) if the first path contains a
goal node exit with success, (c) if the first path does not contain a goal node.
182 Search and Control Strategies Chap. 9
remove the path from the queue and form new paths by extending the removed
path by one step, (d) compute the cost of the new paths and add them to the
queue, (e) sort the paths on the queue with lowest-cost paths in front.
3. Otherwise, exit with failure.
The previous heuristic methods offer good strategies but fail to describe how the
shortest distance to a goal should be estimated. The A* algorithm is a specialization
of best-trst search. It provides general guidelines with which to estimate goal distances
for general search graphs.
At each node along a path to the goal, the A* algorithm generates all successor
nodes and computes an estimate of the distance (cost) from the start node to a goal
node through each of the successors. It then chooses the successor with the shortest
estimated distance for expansion. The successors for this node are then generated.
their distances estimated, and the process continues until a goal is found or the
search ends in failure.
The form of the heuristic estimation function for A* is
where the two components g t (n) and h*(n) are estimates of the cost (or distance)
from the start node to node n and the Cost from node n to a goal node, respectively.
The asterisks are used to designate estimates of the corresponding true values f(n)
Sec. 9.5 Informed Search 183
= g(n) + h(n). For state space tree problems g*(n) = g(n) since there is only one
path and the distance g*(n) will be known to be the true minimum from the start
to the current node n. This is not true in general for graphs, since alternate paths
from the start node to n may exist.
For this type of problem, it is convenient to maintain two lists of node types
designated as open and closed. Nodes on the open list are nodes that have been
generated but not yet expanded while nodes on the closed list are nodes that have
been expanded and whose children are, therefore, available to the search program.
The A* algorithm proceeds as follows.
A SEARCH
1. Place the starting node s on open.
2. If open is empty, stop and return failure.
3. Remove from open the node n that has the smallest value of f*(n). 11 the
node is a goal node, return success and stop. Otherwise.
4. Expand n, generating all of its successors n' and place n on closed. For every
successor n', if n' is not already on open or closed attach a back-pointer to
n computef(n') and place it on open.
5. Each n' that is already on open or closed should be attached to back-pointers
which reflect the lowest g*(n) path. If n was on closed and its pointer was
changed, remove it and place it on open.
6. Return to step 2.
Iterative Deepening A
The depth-first and breadth-first strategies given earlier for Or trees and graphs can
easily be adapted for And-Or trees. The main difference lies in the way termination
conditions are determined, since all goals following an And node must be realized,
whereas a single goal node following an Or node will do. Consequently, we describe
a more general optimal strategy that subsumes these types, the AO* (0 for ordered)
algorithm.
As in the case of the At algorithm, we use the open list to hold nodes that
have been generated but not expanded and the closed list to hold nodes that have
been expanded (successor nodes that are available). The algorithm is a variation of
the original given by Nilsson (1971). It requires that nodes traversed in the tree be
labeled as solved or unsolved in the solution process to account for And node
solutions which require solutions to all successor nodes. A solution is found when
the start node is labeled as solved.
THE A0 ALGORITHM
1. Place the start node s on open.
2. Using the search tree constructed thus far, compute the most promising solution
tree T0.
Sec. 9.7 Summary 185
3. Select a node n that is both on open and a part of T0 . Remove n from open
and place it on closed.
4. If n is a terminal goal node, label n as solved. If the solution of n results in
any of n's ancestors being solved, label all the ancestors as solved. If the
start node s is solved, exit with success where 7'0 is the solution tree. Remove
from open all nodes with a solved ancestor.
S. If n is not a solvable node (operators cannot be applied), label n as unsolvable.
If the start node is labeled as unsolvable, exit with failure. If any of it's
ancestors become unsolvable because it is, label them unsolvable as well.
Remove from open all nodes with unsolvable ancestors.
6. Otherwise, expand node a generating all of its successors. For each such
successor node that Contains more than one subproblem, generate their successors
to give individual subproblems. Attach to each newly generated node a back
pointer to its predecessor. Compute the cost estimate h* for each newly generated
node and place all such nodes that do not yet have descendents on open.
Next, recompute the values of h* at n and each ancestor of n:
7. Return to step 1
It can be shown that AO* will always find a minimum-cost solution tree if
one exists, provided only that h*(n) h(n), and all arc costs are positive. Like
A* , the efficiency depends on how closely h* approximates it.
9.7 SUMMARY
search. Heuristic evaluation functions are used in best-first search strategies to find
good solution paths. A solution is not always guaranteed with this type of search,
but in most practical cases, good or acceptable solutions are often found.
We saw several examples of informed searches, including general best-first,
hill climbing, branch-and-bound, A*, and finally, the optimal And-Or heuristic search
known as the OA* algorithm. Desirable properties of heuristic search methods were
also defined.
EXERCISES
9.1. Games and puzzles are often used to describe search problems because they are easy
to describe. One such puzzle is the farmer-fox-goose-grain puzzle. In this puzzle, a
farmer wishes to cross a river taking his fox, goose, and grain with him. He can use
a boat which will accommodate only the farmer and one possession. If the fox is left
alone with the goose, the goose will be eaten. If the goose is left alone with the
grain it will be eaten .. Draw a state space search tree for this puzzle using leftbank
and rightbank to denote left and right river banks iespectively.
9.2. For the search tree given below, use breadth-first searching and list the elements of
the queue just before selecting and expanding each next stare until a goal node is
reached. (Goal states designated with .)
/N /CN E F'
HI LM
9.9. Give three different heuristics for an h(n) to be used in solving the eight puzzle.
9.10. Using the search tree given below. list the elements of the queue just before the next
node is expanded. Use best-first search where the numbers correspond to estimated
cost-to-goal for each corresponding node.
A 30
C25
I t.\
0 22
19 J7
E 19
K6
F16
.
t\ I
L3
G 10
9.11. Repeat Problem 9.10 when the cost of node B is changed to 18.
MO
H 12
N4
9.12. Give the time and space complexities for the search methods of Problems 9 2 and
9.3.
9.13. Discuss some of the potential problems when using bill climbing search. Give examples
of the problems cited.
9.14. Discuss and compare hill climbing and best-first search techniques.
9.15. Give an example of an admissible heuristic for the eight puzzle
9.16. Give two examples of problems in which solutions requiring the minimum search are
more appropriate than optimal solutions. Give reasons for your choices.
9.17. Write a LISP program to perform a breadth-first search on a solution space irce con-
structed using property lists. For example, children nodes e. f. and g of node 1) of
the tree would be constructed with the LISP function
9.18. Write a LISP program to perform a depth-first search on the tree constructed in Problem
9.17.
Matching, Techniques
10.1 INTRODUCTION
Matching is the process of comparing two or more structures to discover their like--
nesses or differences The structures may represent a wide range of objects including
physical entitles, words or phrases in some language. complete classes of things,
general concpts. relations between complex entities, and the like. The representations
will be given in one or more of the formalisms like FOPL, networks, or some
other scheme, and matching will invoke comparing the component parts of such
structures.
Matching is used in a variety of programs for different reasons. It may serve
to control the sequence of operations. to identify or classify objects, to determine
188
Sec. 10.1 Introduction 189
would not match since ?x could not be bound to two different constants.
in some extreme cases, a complete change of representational form may be
required in either one or both structures before a match can be attempted. This
will be the case, for example, when one visual object is represented as a vector of
pixel gray levels and objects to be matched are represented as descriptions in predicate
logic or some other high level statements. A direct comparison is impossible unless
one form has been transformed into the other.
In subsequent chapters we will see examples of many problems where exact
matches are inappropriate, and some form of partial matching is more meaningful.
Typically in such cases, one is interested in finding a best match between pairs of
structures. This will be the case in object classification problems, for example,
when object descriptions are subject to corruption by noise or distortion. In such
cases, a measure of the degree of match may also be required.
Other types of partial matching may require finding a match between certain
key eJernents while ignoring all other elements in the pattern. For example. a human
language input Unit should be flexible enough to recognize any of the following
three statements as expressing a choice of preference for the low-calorie food item.
Finally, some problems may obviate the need for a form of fuzzy matching
where an entity's degree of membership in one or more classes is appropriate.
Some classjIjctjon problems will apply here if the boundaries betieen the classes
are not distinct, and an object may belong to more than one class.
Figure 10.1 illustrates the general match process where an input description
is being compared with other descriptions. As stressed earlier, the term object is
used here in a general sense. It does not necessarily imply physical objects. Al!
objects will be represented in some formalism such as a vector of attribute values,
propositional logic or FOPL statements, rules, frame-like structures, or other scheme.
Transformations, if required. may involve simple instantiations or unifications among
clauses or more complex operations such as transforming a two-dimensional scene
to a description in some formal language. Once the descriptions have been transformed
into the same schema, the matching process is performed element-by-element Using
a relational or other test (like equality or ranking). The test results may then be
combined in some way' to provide an overall measure of similarity. The choice of
measure will depend on the match criteria and representation scheme employed.
The output of the matcher is a description of the match. It may be a simple
yes or no response or a list of variable bindings, or as complicated as a detailed
annotation of the similarities and differences between the matched objects.
To summarize then, matching may be exact, used with or without pattern
variables, partial, or fuzzy, and any matching algorithm will be based on such
factors as
Oblint
to
___,j____._ flepre.entar,on —a.- Transformations
Match
[
- cottparator ,- Result
Merri.
IcI! Representations Transformations
We are already familiar with many of the repreemttation structures used in rn,jlchin
programs. Typically, they will be some type of ItsI structures that represenl clauses
in propositional or predicate logic such a
or rules, such as
- wife
,on br,dqe.pa"ne's
-,
name: data-structures
alto: university-course
department: computer-science
credits: 3-hours
prerequisites:(if-needed check catalog)
(a)
Variables
AU of the structures we shalIcons ider here are constructed front basic atomic elements,
numbers, and characters. Character string elements may represent either constants
or variables. If variables, they may be classified by either the type of match permitted
or by their value domains.
We can classify match variables by the number of items that can replace
them (one or more than one). An open variable can be replaced by a single item.
while a segment variable can be replaced by zero or more items. Open variables
are labeled with a preceding question mark ( ) x. 'v. ?class). They may match or
assume the value of any single string element or word. but they are sometimes
subject to consistency constraints. For example. to he consistent, the variable ?X
can be bound only to the same top level element in any single structure. Thus (a
x d ?x e) may match (a b d h e. but not (a b d it Segment ariable types will
be preceded with an asterisk *x . *1 . *words . This type of variable can match an
arbitrary number or segment of contiguous atomic elements (anN sublist including
the empty list). For example. (t d (c f) *v) hill match the patterns
(a (b l tI(e [) ' h). (ci (c]) (t))
or other similar patterns Segment variables may also he subject to consistency
constraints similar to open variables.
Variables may also he classified by their value domains. This distinction will
be useful when we consider similarity measures below. The variables may be either
quantitative, having a meaningful origin or zero point and a meaningful interval
difference between two values, or they may be qualitative in which (here is no
origin nor meaningful interval value difference. These two ipes may be further
subdivided as follows.
such objects. Of course each state can be given a numerical code. For example.
"marital status" has states of married, single, divorced, or widowed. These states
have no numerical significance, and no particular order nor rank. The states could
be assigned numerical codes however, such as married = I. single = 2. divorced
= 3, and widowed = 4.
Binary variable. Qualitative discrete variables which may assume only one
of two values, such as 0 or I, good or bad, yes or no, high or low.
Two other structures we shall consider in this section are graphs and trees. One
type of graph we are already familiar with is the associative network (Chapter 6).
Such structures provide a rich variety of representation schemes. More generally, a
graph G (V. E) is an ordered pair of sets V and E. The elements of V are nodes
or vertices and the elements of E are a subset of V X V called edges (or arcs or
links). An edge joints two distinct vertices in V.
Directed graphs, or digraphs, have directed edges or arcs with arrows. If an
arc is directed from node n to n1 , node n, is said to be a parent or successor of n,,
and n, is the child or successor of n, Undirected graphs have simple edges without
arrows connecting the nodes. A path is a sequence of edges connecting two nodes
where the endpoint of one edge is the start of its successor. A cycle is a path in
which the two end points coincide. A connected graph is a graph for which every
pair of vertices is joined by a path. A graph is complete if every element of V x
V is an edge.
A tree is a connected graph in which there are no cycles, and each node has,
at most, one parent. A node with no parent is called the root node, and nodes with
no children are called leaf nodes. The depth of the root node is defined as zero.
The depth of any other node is defined to be the depth of its parent plus I. Pictorial
representations of some graphs and a tree are given in Figure 10.4.
Recall that graph representations typically use labeled nodes and .arcs where
14-
194 Matcj'iing Techniques Chap. 10
Ia) - tel
Figure 11)4 Examples iii tat g eneral connected g raph. Ib, diraph. I disconnetted graph
and idi tree of depth 3
the nudes correspond to entities and the arcs to relations Labels for the nodes and
arcs are attribute values.
Next, we turn to the problem of comparing structures without the use of pattern
matching variables. This requires consideration of measures used to determine the
likeness or similarit y between two or more structures, The similarit y between txI)
structures is .a measure of the degree of associaton or likeness between the ishiects
attributCs and other characteristic parts. If the describing variables are qualitEtalIc.
a distance metric is often used to measure the proximity.
Distance Metrics
For all elements .x, v of the set E, the function ci is a metric if and onl y it
a. d(x.x) 0
b. d(x,v) 0
c. d(x.v) = d(y,.r)
d. d(x.v) 5 d(.t.:) 5- d(:,v)
v
['•
For the case p fhis metric is the familiar Euclidean distance When p = I. il,,
is the so-called absolute or cit y block distance.
Probabilistic Measures
where the prime C) denotes transpose (row vector) and C is the inverse of C.
The X and V vectors may be adjusted fdr zero means by first subtractin g the vector
means u and ui..
Another popular probability measure is the product moment correlation r,
given by
= Cov(X.Y)
r
lVar(X)*Var(Y)1I'
where Coy and Var denote covariance and variance respectively. The correlation
r. which ranges between - I and + I, is a measure of similarity frequently used in
vision applications.
Other probabilistic measures often used in Al applications are based on the
scatter of attribute values. These measures are related to the degree of clustering
among the objects. In addition, conditional probabilities are sometimes used. For
example, they may be used to measure the liklihood that a given X is a member.
of class C. P( C J X ), the conditional probability of C given an observed X These
measures can establish the proximity of two or more objects. These and related
measures are discussed further in Chapter 12.
Qualitative Measures
EII rt..tJ
X might he horned and Y might he lotte tailed In thiscase, the cittr a is the
number ot animals having both horns and long tails Note that ii u+0+
il. the total number of objects
Various measures of association for such hinar ', arijhlc, have been delined
For example
a - a-i-il
a + h ± e ± (I - it 0
a a
Contingency tables are also useful for describing other qualitatise variables.
both ordinal and nominal. Since the methods are similar to those for binar y variables.
we omit the details here.
Whate'er the variable types used in a measure, they should all he properk.
scaled or normalized to prevent variables having large values from negating the
eltects of smaller valued variables. This could happen when one variable is scaled
in millimeters and another variable in meters.
Similarity Measures
For many problems, distance metrics are not appropriate Instead, a measure of
similarity satisfy ing conditions different from those of Table 10.1 may be more
appropriate Of course, measures of dissimilarit y (or similarity), like distance, should
decrease (or increase) as objects become more alike. There is strong evidence,
however, to suggest that similarities are not in general symmetric (Tversky, 1977)
and hence, any similarity measure between a subject description A and its referrent
B, denoted by .c(A,B), is not necessarily equal: that is, in general, s(A,B) k s(B,.4)
or "A is like B" may not be the same as "B is like A."
Tests on subjects have shown that in similarity comparisons, the focus of
attention is on the subject and, therefore, subject features are given higher weights
than the referrent. For example, in tests comparing countries, statements like "North
Korea is similar to Red China" and "Red China is similar to North Korea" or "the
Sec. 10.3 Measures for Matching 197
USA is like Mexico" and "Mexico is like the USA" were not rated as s)ilimetrical
or equal. The likenesses and differences in these cases are directional. Moreos-er.
like many interpretations in Al. similarities may depend strongly on the contest in
which the comparisons are made. They may also depend on the purpose of the
comparison.
An interesting family of similarity measures which takes into account such
factors as asymmetry and has some intuitive appeal has recentl y been proposed
(Tversky. 1977). Such measures may be adapted to give more realistic results for
similarity measures in Al applications where context and purpose should i nfluence
the similarity comparisons.
Let 0 = { O i .0, ....... . the universe of objects of interest andlet A he the
set of iittributes or features used to represent o A similarit y measure s.hich ts a
function of three disjoint sets of attributes common tO iny two objects A, and 4 is
given as
s(A,.A) EtA, &A,. 4, - A,. A, - A, 1(2)
where It, & A, is the set of features common to both o, and o,. A, - .1 1 the set of
features belonging to o, and not o. and A, A, is the set of featurcs belonging to
o. and not o,. The function F is a real valued nonnegative function. Under tamrly
general assumptions equation 10.2 can be written as
for some a.b.c 0 and where is an additive inters al metric function. The function
f(A) may be chosen as any nonnegative function of the set A. like the numhr of
attributes in A or the average distance between points in A. Equation 10 .3 ma y he
normalized to give values of similarity ranging between 0 and I by writing
ftA & A
S(A A 7 ) = -------- L.__L____
( 10 4)
f(.4, & A,)±aJ(A, - .4,) + iii/ 0, —A,)
Fuzzy Measures
Finally, we can define a distance between the two fuzi y sets A and B as
d(A.B) = - ] -
= (I - i,()]) 10.61
which gives the mean trait membership difference between two objects ., and .i,,.
Of course .s(.v., 0 corresponds to equal likeness or maximal similarit y , and
I for i j corresponds to maximum dissimilarity.
Matching Substrings
Since many of the representation structures are just character strings, a basic function
required in man y match algorithms is to determine if a substring S consistin g of
fit characters occurs somewhere in a string S 1 of pm characters, In n. A direct
approach to this problem is to compare the two strings character-by-character. starting
with the first characters of both S 1 and S. If any two characters disagree. the
process is repeated, starting with the second character of S 1 and matching again
against S character-by-character until a match is found or disagreement occurs
again. This process continues until a match occurs or Si has no more characters.
Let i and j be position indices for string S 1 and k a position index for S. We
can perform the substring match with the following algorithm.
Sec. 10.4 Matching Like Patterns 199
i:=O
while i(n-m+1) do
begin
i:=i+1; j =i; k:-1;
while S,(jl=S211k) do
begin
it km writeln(success')
also do
begin
:j+1;.k:=k4-1
end
end
end
writeln('fail')
end.
This algorithm requires m(n - rn) comparisons in the worst case. A more
efficient algorithm will not repeat the same comparisons over and over again. One
such algorithm uses two indices, i and j, where i indexes (counts) the character
positions in S 1 and is set to a "match state" value ranging from 0 tom (like the
states in a finite automaton). The state 0 corresponds to no matched characters
between the strings, while the state I corresponds to the first letter in S, matching
character i in S 2 . State 2 corresponds to the first two consecutive letters in S2
matching letters i and i + I in S 1 respectively, and so on, with state m corresponding
to a successful match. Whenever consecutive letters fail to match, the state index
is reduced accordingly. We leave the actual details as an exercise.
Matching Graphs
Two graphs O and G match if they have the same labeled nodes and same labeled
arcs and all node-to-node arcs are the same. More generally, we wish to determine
if C 2 with m nodes is a subgraph of G with n nodes, where n m. In a worst
case match, this will require n!/(n - m)! node comparisons and 0(m) arc comparison
Consequently, we will see that most graph matching applications deal with sm
manageable graphs only or use some form of heuristics to limit the number
comparisons.
Finding subgraph isomorphisms is also an important matching problem. An
isomorphism between the graphs G 1 and G 2 with vertices (nodes) Vt. V2 and edges
El, E2. that is, (Vl,El) and (V2,E2), respectively, is a one-to-one mapping to I
between Vl and V2, such that for all vi € Vt. f(H) = v2, and for each arc el €
El connecting vi and vi', there is a corresponding arc e2 e E2 connecting f(vl)
and f(vl'). An example of an application in which graph isomorphisms are used to
determine the similarity between two graphs is given in the next section.
200 Matching Techniques Chap. 10
An exact match of two sets having the same number of elements requires that their
intersection also have that number of elements. Partial matches 6f two sets can
also be determined by taking their intersection. If the two sets have the same number
of elements and all elements are of equal importance, the degree of match can be
the proportion of the total members which match. If the number of elements differ
between the sets, the proportion of matched elements to the minimum of the total
number of members can be used as a measure of likeness. When the elements are
not of equal importance, weighting factors can be used to score the matched elements.
For example, a measure such as
One of the best examples of nontrivial pattern matching is in the unification of two
FOPL litetaIs. Recall the procedure for unif y ing two literals, both of which may
variables
vaables (see Chapter 4). For example. to unifyP(f(a,.r).v.v) and PCv.h.:)
we first rename variables so that the two predicates have no variables in common.
This can be done by replacing the x in the second predicate with a to give P(u,h,:t.
Next, we compare the two symbol-by-symbol from left to right until a disagreement
is found. Disagreements can be between two different variables, a nonvariable term
and a variable, or two nonvariable terms. If no disagreement is found, the two are
identical and we have succeeded.
If a disagreement is found and both are nonvariable terms, unification is impossi-
ble; so we have failed. If both are variables, one is replaced throughout by the
other. (After any substitution is made, it should be recorded in a substitution worktist
for later use.) Finally, if the disagreement is a variable and a nonvariable term, the
variable is replaced by the entire term. Of course, in this last step, replacement is
Sec. 10.5 Partial Matching 201
possible only if the term does not contain the variable that is being replaced. This
matching process is repeated until the two are unified or until a failure occurs.
For the two predicates P. above, a disagreement is first found between the
term f(a,x) and variable u. Since f(a,) does not contain the variable u, we replace
u with f(a,x) everywhere it occurs in the literal. This gives a substitution set of
{f(a,x)Iu} and the partially matched predicates P(f(o,x),y,y) and P(f(a,x),b.:).
Proceeding with the match, we find the next disagreement pair, y and h. a
variable and term, respectively. Again, we replace the variable y with the term b
and update the substitution list to get {f(a,x)/u, b/y}. The final disagreement pair is
two variables. Replacing the variable in the second literal with the first we get the
substitution set {f(a,x)Iu,b/y,ylz} or, equivalently, {f(a,9 1u ,b 1 v,b 1 4 . Note that this
procedure can always give the must general unifier.
We conclude this section with an example of a LISP program which uses
both the open and the segment pattern matching variables to find a match between
a pattern and a clause.
Notice that when a segment variable is encountered (the *v). match is recursively
executed on the cdrs of both pattern and clause or on the cdr of clause and pattern
as v matches one or more than one item respectively.
[fl.• I
[
Figure 10.5 Discrete version ot
stretchable overlay image.
Sec. 10.5 Partial Matching 203
displacements and infinite cost for displacements of more than two increments.
Other pieces would be assigned higher costs for Unit and larger position displacements
when stronger constraints were applicable.
The matching problem here is to find a least cost location and distortion pattern
for the reference sheet with regard to the sensed picture. Attempting to compare
each component of some reference to each primitive part of a sensed picture is a
combinatonally explosive problem. However, in using the template-spring reference
image and heuristic methods (based on dynamic programming techniques) to compare
against different segments of the sensed picture. the search and match process can
be made tractible..
Any matching metric used in the least cost comparison would need to take
into account the sum of the distortion costs C, the sum of the costs for reference
and sensed component dissimilarities C, and the sum of penalty Costs for missing
components C,, Thus, the total cost is given by
tlO.8
Distortions occurring in representations are not the only reasons for partial matches.
For example, in problem solving or analogical inference, differences are expected.
In such cases the two structures are matched to isolate the differences in order that
they may be reduced or transformed. Once again, partial matching techniques are
appropriate. The problem is best illustrated with another example
In a vision application (Eshera and Fu, 1984), an industrial part may be described
using a graph structure where the set of nodes correspond to rectangular or cylindrical
block subparts. The arcs in the graph correspond to positional relations between
the subparts. Labels for rectangular block nodes contain length, width, and height.
while labels for cylindrical block nodes give radius and height. The arc labels give
location and distances between block nodes, where location can be above, to the
right of. behind, inside, and so on.
Figure 10.6 illustrates a segment of such a graph. In the figure the following
abbreviations are used:
,,, hJ
\ (V d2)
lJ V
Graphs such as this are called attributed relational graphs (ATRs). Such a
graph C) is defined formally as a sextuple
G = (N,B,A,G,.(;5)
as a fuzzy set, and a metric similar to equation 10.6 may then be used to match
compare the two objects based on their attribute memberships.
If the attribute', represent linguistic variables such as height. weight, facial-
appearance. color ot-eves. and type-of-hair, each variable may be assigned a limited
number 01 values. For example, a reasonable assignment for height would he the
integers 10 to 96 corresponding to height in 'inches. Eye colors could he assigned
brown, black, blue. hazel, and so on An object description of tall, slim, pretty.
blue e y ed, blonde s ill have characteristic function values for the b ye attributes of
u.,(o 1 ) and u (o) for objects o l and o respectively A measure of fuzzy similarity
between the two objects can then he defined as
.1 ) ( i .0,) = I (I - (I).
where
(I - - - u 4,(o)) 2 j (109)
I, [,
Production lOr rule-based) systems are described in Chapter IS. They are popular
architectures for evert s)stenls. A typical system will contain a Knowledge Base
which contains structures representing the domain expert's knowledge in the form
of rules or productions. a working memory which holds parameters for the current
problem. and an inference engine with rule interpreter which determines which
rules are applicable for the current problem (Figure 10.7).
The basic inference cycle of a production system is match, select, and execute
as indicated in Figure 10.7. These operations are performed as follows.
Match. During the match portion of the cycle, the conditions in the left
hand side (LHS) of the rules in the knowledge base are matched against the contents
206 Matching Techniques Chap. 10
-0 riCh
Select
Execute
s__j
Figure 10.7 Production system components and basic cycle,
i
of working memory to determine wh ch rules have their LUIS conditions satisfied
with consistent bindings to working memory terms. Rules which are found to be
applicable (that match) are put in a conflict set.
Select. From the conflict set, one of the rules is selected to execute. The
selection strategy may depend on recency of useage, specificity of the rule, or
other criteria.
Execute. The rule selected from the conflict set is executed by carrying
out the action or conclusion part of the rule, the right hand side (RHS) of the rule.
This ma y involve an I/O operation, adding., removing or changing clauses in Working
Memory or' simply causing a halt.
The above cycle is repeated until no rules are put in the conflict set or until a
stopping condition is reached.
A typical knowledge base will contain hundreds or even thousands of rules
and each rule will contain several (perhaps as man y as ten or more) conditions.
Working memories typically contain hundreds of clauses as well. Consequently.
exhaustive matching of all rules and their LUIS conditions against working-memory
clauses may require tens of thousands of comparisons. This accounts for the claim
made in the introductory paragraph that as much as 90 17c of the computing time for
such systems can be related to matching operations.
To eliminate the need to perform thousands of matches per cycle, an efficient
match algorithm called RETE has been developed (Forgy. 1982). It was initially
developed as part of the OPS family of programming languages (Brownston, et al.,
195) This algorithm uses several novel features, including methods to avoid repetitive
matching on successive cycles. The main time-saving features of RETE are as
follows.
1. In most expert systems, the contents of working memory change very little
from cycle to cycle. There is a persistence in the data known as temporal redundancy.
Sec. 10.7 The RETE Matching Algorithm 207
Changes to
working memory
>atcherCt
C
__________ IHS rsde
Conditions
Figure 10.8 Changes to working memor are mapped it) the conflict set
sets up a link between rules and their LHS conditions, whereas statements like
link specific LHS terms to all rules which contain the term in th same LI-IS positions
When a change is made to working memory, such as the addition of the clause
208 Matching Techniques Chap. 10
eond-1 cond-t
father
(R6
((father 7y ?x)
(father a '50 cond-2
(grandfather
Ofld
(R12
(father 2 y ) x)
--
Rt3R23
(male 'yl
R13
((father 'y 'e(
(male a( mond-1 cond-1
-- father
on ?x "i)l
Vcond
male
(R23
((father ?a ?yl. ,; :d: .\\
t
(brother', ?x
-
(untIe ?z 'yl(
(father bill joe), all rules which contain father as an LHS condition are easily identified
and retrieved.
In RETE, the retrieval and subsequent testing of rule conditions is initiated
with the creation of a token which is passed to the network constructed by the rule
compiler. The network provides paths for all applicable tests which can lead to
consi s tent bindings and hence to complete-LHS satisfaction of rules. The matcher
traverses the network finding all rules which newly match or no longer match Working
Memory element ,;. The output from the matcher are data structures which consist
of pairs of elements •. a rule name and list of working-memory elements that match
its LHS. like (R6 ((father bob sam) (father mike bob)).
The reader will, notice that the indexing methods described above are similar
to those presented in the following chapter. Other time-saving tricks are also employed
in RETE however, the ones noted above are the most important. They provide a
substantial saving over exhaustive matching of hundreds or even tens of thousands
of conditions.
Chap. 10 Exercises 209
10.8 SUMMARY
EXERCISES
10.1. Indicate whether or not consistent substitutions can t made which result ill matches
for the following pairs of clauses. If substitutions can be made. given example, tit
valid ones.
a. P(a.f(x,b).gtt(a.y)Lz). P(a.f,yf.g(f(x.yflc)
b. P(a,x) V Q(b,y,fty)) V R(x,y).
P(x,a) V Q(f(y).y.b) V R(y.x)
C. R(a,b,c) V Q(.v,z) V P(f(a,x,bI,
P(z) V O(x.y,b) V R(x.y,z)
10.2. State what variable bindings. if any, will make the following lists match
IS-
210 Matching Techniques Chap. 10
10.3. Write a LISP function called "match" that takes two arguments and returns T if
the two are identical, returns the two arguments if one is a variable and the other a
term and returns nil, otherwise.
10.4. Identify the following variables as nominal, ordinal, binary or interval:
temperature sex
wavelength university class
population intelligence
quality of restaurant
10,5. What is the difference between a bag and a set? Give examples of both. Hov. could
a program determine whether a data structure was either a bag or a set?
10.6. Compute the Mahalanohis distance between two normal distributions having zero
means, variances of 4 and 9, and a covariance of 5.
10.7. Give three dierent examples of functionsf that can be used in the similarity equations
10.3 and 10.4.
10.8. Choose two simple objects 01 and 02 that are somewhat similar in their features
Al and A2, respectively, and compute the similarity of the two using a form ol
equation 10.4.
10.9. Define two fuzzy sets ''tall" and "short'' and compute the distance between theill
using equation 10.5.
10.10. For the two sets defined in Problem 10.9. compute the similarity of the two using
equation 10.6.
10.11. Write a LISP function to find the intersection of two sets using the marking method
described in the subsection entitled Matching Sets and Bags.
10,12. Write a LISP function that determines if two sets match exactly.
10.13. Write pseucocode to unify two FOPL literals.
10.14. Write a LISP program based on the pseudocodc developed in Problem 10.13
10.15. Write pscudocodc to find the similarity between two attributed relational graphs
(AGRs).
10,16. Suppose an expert system working memory has n clauses each with an average ol
four if .. then conditions per clause and a knowledge base with 200 ules. Each
rule has an avereage of five conditions. What is the time complexity of a matching
algorithm which performs exhaustive matching?
10.17. Estimate the average time savings if the RETE algorithm wals used in the previous
problem.
10. 18. Write a PROLOG program that determines if two eis match exactly
10.19. Write a PROLOG program that determines if two sets match except possibly for the
first elements of each set.
II
Knowledge Organization
and Management
211
212 Knowledge. Organization and Management Chap. 11
it the knowledge is poorl y organized. Such problems can easil y become intractible
or at best intolerable.
In this chapter, we investigate various approacheslo the effecti'e organization
of knowledge within memors.. We reco g nize that while the reprcsentitom of knoAl-
edue is still an Important taclor, we are more concerned here with the broader
pn)hlein, that of organization and maintenance for efficient storage and recall as
wl I as for i ts manipulation.
111 INTRODUCTION
with this change. our memories exhibit some rather remarkable properties We are
able to adapt to varied changes in the environment and still improse our pertorivanee
This is because our memor y sssIem is continuousl y adapting through a rrilif,ilion
process. Ness knossledge is continualIN being added to our memories. existin g knoss I.
edge is continualk being revised. and less important knowledge is er,idualk being
forgotten. Our memories are continuall y bein g reorL'anhlcd to expand ttu r recall
and reasonin g abilities. This process leads to iinpro ed memor\ performance ih ri tu h
out most of our lives.
When dcxc loping computer memories for intelligent ' sICilts . 55 C I t1\ -, ,t ill
ssiine useful insight b learning \% hat xx e can from human meinoi sx sic ins c
xx ou Id expect ci iniputer memors 55
illstems
soiue of the same feat nrc For s
example. h u loan memories tend to he limitless in capaci1 .anxl ihe pit is ide a
uniform grade of recall sers ice, independent of the amount of inIorinaii'ii sitired
For later use, xxe ha\e sunititarucd these and other desirable characteristics that
' c f e el alleffective computer memor y organh/ation sxsteili should possess
i
These characterist cs suggest that memory he organized around ,n
lusters of knowledge Related clusters should be g rouped and stored in close princrin
it\ to each other and he linked to similar concepts through ussuciatixe relations
Access to any given cluster should he possible through either direct or indirect
links such is concept pointers indexed h) meaning. Index kc "fill s nnninoinnniu'
meanings should provide links to the same know ledge clusters 1 hese notions are
illustrated graphieulk in Figure II. I where the clusters represent urhitrars groups
of close!> related know ledge such as objects and their properties or basic conceptual
cate g ories. The links connecting the clusters are Iwoxxa y pointers which provide
relational associations between the clusters thev connect.
214 Knowledge Organization and Management Chap. II
assratIve links
One tricky aspect of systems that must function in dynamic environments is due to
the so-called frame problem. This is the problem of knowing what changes hase
and have not taken place following some action. Some changes will he the direct
result of the action. Other changes will be the result of secondary or side etiects
rather than the result of the action. For example, if a robot is cleaning the floor ,, in
a house, the location of the floor sweeper changes with the robot even though this
Source input
'It
Retrieve relevant Fai
knowledge
t
Succeed
Marco,,
Reorganize
memory
Figure 11.2 Memoni organi,ation
functions
Indexing and Retrieval Techniques 215
Sec. 11.2
is not explicitly stated. Other objects not attached to the robot remain in their rigina)
places: The actual changes must somehow be reflected in memory. a feat that requires
some ability to infer. Effective memory organization and management methods must
take into account effects caused by the frame problem.
In the remainder of this chapter we consider three basic problems related to
knowledge organization; ( I ) classifying and computing indices for input information
presented to a system. 12) access and retrieval of kno ledge from memory through
the use of the computed indices, and (3) the reorganization of memory struciure
when necessary to accommodate additions, revisions, and forgetting. These tunetion
are depicted in Figure 11.2.
When a know ledge base is too large to he held in main niernon . it iliust he stored
as a tile in secondary storage (disk, drum or tape). Storage and retrieal of intoi In -At ion
in secondary memory is then performed through the transfer ol equalsi/c ph 'deal
blocks consisting of between 2 12561 and 2H4096) bytes. When an item of intornia-
tion is retrieed or stored, at least one complete block must he transferred bet's een
main and secondary memory . The time required to transfer a block t\pIcall\ r.iilges
between It) ms. and 100 ms. . about the same amount of time required to sequeritiall
search the whole block for an item. Clearl y . then, grouping related knoss ledge
together as a unit can help to reduce the number of block transfers, and hence die
total access time.
An example of et)eet i e grouping alluded to abos e . an he found in some
expert s y stem KB organizations. Grouping together rules s hih share some of the
saIflC conditions (propositions) and conclusions call block transfer tutics since
such rules are likely to he needed during the saute problem sols ing session (oiic
qucntly . collecting rules together h similar conditions or content call to teduec
the number of block transfers required. A noted before, the RF II al,orithiv -
scribed III previous chapter. is all of this i fi e ol oreani/ai[oil
Indexed Organization
are pairs of record key values and block addresses. The key value is the key of the
first record stored in the corresponding block. To retrieve an item of knowledge
from the main file, the index file is searched to find the desired record key and
obtain the corresponding block address. The block is then accessed using this address.
Items within the block are then searched sequentially for the desired record.
An indexed file contains a list of the entry pairs (k.b) where the values k are
the keys of the first record in each block whose starling address is b. Figure 11.3
illustrates the process used to locate a record using the key value of 378. The
largest key value less than 378 (375) gives the block address (800) where the item
will be found Once the 8(0 block has been retrieved, it can be searched linearh
to locate the record with key value 378. This key could he an y alphanumeric string
that uniquely identifies a block, since such strings usually have a collation order
defined b y their code set.
If the index file is large, a binary search can he used to speed up the index
file search. A binary search will significantly reduce the search time over linear
search when the number of items is not too small. When a file contains n records,
the average time for a linear search is proportional to n/2 compared to a binary
search time on the order of ln,(n).
Further reductions in search time can be realized using secondary or higher
order (hierarchically) arranged index tiles. In this case the secondary index file
would contain key and block-address pairs for the primary index tile. Similar indexing
would apply for higher order hierarchies where a separate hi used for each
level. Both binary search and hierarchical index file organization may be needed
when the KB is a very large tile.
index KB file
fIe bIok ddree of eord, k key
(kb) b k Other record fieId
key 009. 100 100 009...............
y.elue p 138,200 100 110 ....................
378, 100 014....
100 021..
100 032....
375, 800 200 138...
41.0,900 .
200 165.
_800 375
800 377
800 378
800 382
800 391
800 405
900 410
900 412
When the total number of records in a KB tile is n with r records stored per
block giving a total of b blocks tn = r * hI. the average search time for a nonindexed,
sequential search is b / 2 block access tinces plus it 2 record tests. This compares
with an index search time of h / 2 index tests, one block access, and r 2 record
tests: A binary index search on the other hand would require only ln(/n index
tests, one block access, and r 2 record tests. Therefore. we see that for aric ii
and moderately large r 13() to SO), the time savings possible using hinar indexed
access can be substantial.
Indexing in LISP can he implemented ith property lists. A-lists, and or
tables. For example. a KB can be partitioned into segments b y storing each segment
as a list under the property value for that seement. Each list indexed in this sa
can be found v ith the get property function and then searched sequentiallN or sorted
and searched with binary search methods. A hash-table is a special data structure
in LISP which provides a means of rapid access through kes hashing. We resiess
the hashing process next.
Hashed Files
Indexed organizations that permit efficient access are based on the use of a hash
function. A hash function. h. transforms ke y values k into integer storage location
indices through a simple computation. When a maximum number of items or categories
C are to be stored, the hashed values h(k) will range front to C - I. Therefore.
given any key value k. h(k) should map into one of 0 ....- I.
An effective, but simple hash function can be computed by choosing the largest
prime numberp less than or equal to C. converting the key value k Into an integer
- k' if necessary, and then using the value k mod p as the index value h. For example.
if C is lO). the largest prime less than C is p 997. Thus. it the record key
salue is 12345789 (a social securit y number. the hashed value is h = (k iiod
997) = 273.
When using hashed access, the value of C should he chosen large enough to
accommodate the maximum number of categories needed. The use of the prime
number p in the algorithm helps to insure that the resultant indices are soiiics hat
uniformly distributed or hashed throughout the range 0 . - C -
This type of organization is well suited for groups of items coresponding to
C different categories. When two or more items belon g to the same cate g or y . the
will have the same hashed values. These values are culled .cvnonv,ns. One \a to
accommodate collisions (simultaneous attempts to access synonyms) is with data
structures known as buckets. A bucket is a linked list of one or more Items, where
each item is a record, block, list or other data structure. The first item in each
bucket has an address corresponding to the hashed address Figure II .4 illustrates
a form of hashed memory organization which uses buckets to hold all Items ith
the same hashed key value. The address of each bucket in this case is the indexed
location in an array.
218 Knowledge Organization and Management Chap. 11
Hashed address
Conceptual Indexing
The indexing schemes described above are based on lexical ordering, where the
collation order of a key value determines the relative location of the record Keys
for these items are typically chosen as a coded field (employee number, name,
part number, and so on) which uniquely identifies the item. A better approach to
indexed retrieval is one which makes use of the content or meaning associated
with the stored entities rather than some nonmeaningful key value. This suggests
the use of indices which name and define or otherwise describe the entity being
retrieved. Thus, if the entity is an object, its name and characteristic attributes
WOU ' d make meaningful indices. If the entity is an abstract object such as a concept.
the name and other defining traits would be meaningful as indices.
How are structures indexed by meaning, and how are they organized in mel11or
for retrieval? One straightforward and popular approach uses associative networks
(see Chapter 7) similar to the structures illustrated in Figure 11.1. Nodes within
the network correspond to different knowledge entities, whereas the links are indices
or pointers to the entities. Links connecting two entities name the association or
relationship between them. The relationship between entities may be defined as a
hierarchical one or just through associative links
As an example of an indexed network, the concept of computer science ICS
should be accessible directly through the CS name or indirectly through associative
links like a universit y major, a career field, or a type of classroom course. These
notions are illustrated in Figure 11.5.
Object attributes can also serve as indices to locate items or categories based
on the attribute values. In this case, the best attribute keys are those which provide
the greatest discrimination among objects within the same category. For example,
suppose we wish to organize knowledge by object types. In this case, the choice
of attributes should depend on the use intended for the knowledge. Since objects
Integrating Knowledge in Memory 219
Sec. 11.3
may be classified with an unlimited number of attributes (color. size, shape, markings.
and so on). those attributes which are most discriminable with respect to the concet
meaning should be chosen. Alternatively, object features with the most predictive
power make the best indices. A good index for bird types is one based on individual
differences like feet. size, beak shape, sounds emitted, special markings, and so
forth. Attribute values possessed by all objects are useful for forming categories
but poor for identifying an object within the category.
Truly intelligent methods of indexing will be content associative and usually
require some inferring. Like humans, a system may fail to locate an Item when it
has been modified in memory. In such cases, cues related to the item ma y be
needed. For example, you may fail to remember whether or not you hav, ever
discussed American politics with a foreigner until you have considered under what
circumstances you may have talked with freigners (at a university, while traveling
or living abroad, or just a chance meeting). An example of this type of indexing
strategy is discussed in Section 11.4.
Hypertext
One of the earliest computer models of memory wasthe Human Associative Memor\
(HAM) system developed by John Anderson and Gordon Bower (1973). This memory
is organized as a network of propositional binary trees. An example of a simple
tree which represents the statement ''In a park a hippie touched a debutante'' is
illustrated in Figure 11.6. When an informant asserts this statement to HAM, the
system parses the sentence and builds a binary tree representation. Nodes in the
tree are assigned unique numbers, while links are labeled with the following functions:
y 9 \ 71\
L
park psi hipo.e 4 5
As HAM is informed of new sentences, they are parsed and formed into ne
tree-like memory structures or integrated with existing ones. For example. to add
the fact that the hippie was tall, the following .suhtree is attached to the tree structure
of Figure 11.6 by merging the common node hippie (node 3) into a single node.
21
,/3
•
patt 3. 24
hippie tall
When HAM is posed with a query, it is formed into a tree structure called a
probe. This structure is then matched against existing ' memory structures for the
best match. The Structure with the closest match is used to formulate an anser to
the query.
Matching is accomplished by first locating the leaf nodes in memory that
match leaf nodes in the probe. The corresponding links are then checked to see it
they.have the same labels and in the same order. The search process is constrained
by searching only node groups that have the same relation links, based on reeene
of usage. The search is not exhaustive and nodes accessed infrequently may be
forgotten. Access to nodes in HAM is accomplished through word indexing in
LISP (node words in tree structures are accessed directly through property lists or
A-lists).
Roger Schank and his students at Yale University have developed several computer
systems which perform different functions related to the use of natural language
222 Knowledge Organization and Management Chap. 11
Frame $MEET
Content
4
EV1 EV2 EVI tV2 EV1 EV2
to the same MOP category are entered. common e'rent features are used to generalize
the E-MOP. This information is collected in the traitie contents. Specialization ina
also he required when over- generalization has occurred. Thus, mctnor\ is cntinualIv
being reorganized as ness facts are entered This process prevents the addition of
excessive memory entries and touch redundancy which would result it eser' event
entered resulted in the addition of a separate event. Reorganization can also cause
forgetting. since originally assigned indices may he changed when ness structures
are formed. When this occurs, an iem cannot be located so the s y stem attempts
to derive . new indices from the context and through other indices by reconstructing
related events.
To see how CYRUS builds and maintains a memory organtzatton. we briefly
examine how a basic E-MOP grows and undergoes revision with time Initially,
the $MEET E-MOP of Figure 11.7 would consist of the Content part of the frame
only. Then, after a first -meeting occurred, indices relevant and unique to that meeting
are established and recorded, and pointers are set to the corresponding event. Subse-
quent meeings also result in the determination of new event indices, or, if two or
more of the new meetings have some features in common, a new sub-EMOP would
be formed with indices established and pointers set to the new E-MOP. This process
continues with new indices to events added or new E-MOPs formed and indexed
as new meetings occur. Furthermore, the content portion of all E-MOPs is continually
monitored and modified to better describe the common events it indexes. Thus,
when a number of meeting events exhibit some new property, the frame content is
generalized to include this property and new indices are determined. When over-
generalization occurs, subsequent events will result in a correction through some
specialization and recoruputation of indices.
After the two diplomatic meetings described above had been entered, indices
are developed by the system to index the events (EVI and EV2) using features
which discriminate between the two meetings (Figure 11.7). If a third meeting is
now entered, say one between Vance and Sadat of Egypt. which is also about
Arab-Israeli peace, new E-MOPs will be formed since this meeting has some features
in common with the Begin (VI) meeting. One of the new E-MOPs that is formed
is indexed under the previous topic index. It has the following structure:
Topic
E- MOP) SALT
Topic Arab-Israeli peace
Underlyins topc: peace
Involves: Israel and the Arabs EV2
Participants: heads of state
Participants'
nationalities
Israel Egypt
I,
F's/i EV2
The key issues in this type of organization are the same as those noted earlier.
They are (I) the selection and computation of good indices for new events so that
simiiaevents can be located in memory for new event integration. (2) monitoring
and reorganization of memory to accommodate new events as they occur, and (3)
access of the correct event information when provided clues for retrieval.
Chap. 11 ExerciseS 225
11.5 SUMMARY
EXERCISES
11.1. What important characteristics should a computer memory organization System po.ssess
11.2. Explain why each of the characteristics named in Problem 11.1 are important.
11.3. What basic operations must a program perform in order to access specific chunks of
knowledge?
11.4. Suppose 64-byte records arc stored in . . ize 2 bytes. Describe a suitable
index file to access the records using the f0V wing keys (start with block address
16-
226 Knowledge Organization and Management Chap. 11
time when a block can be located and read on the average within 60 ma. and the
time to search each record is one m. per block? Compare this time to the time
• required to search a single block for the same information.
11.6. Referring to Problem 11.4, describe how a hashing method could be applied to
• search for the indicated records.
11.7. Draw a conceptual indexing tree structure using the same keys as those given in
• Problem 11.4, but with the addition of a generalized node named farm-animals.
11.8. Using the same label links as those used in HAM, develop propositional trees for
• the following sentences.
The birds were singing in the park.
John and Mary went dancing at the prom.
Do not drink the water.
11.9. For the previous problem, add the sentence "There are lots of birds and they are
small and yellow."
11.10. Develop an E-MOP for a general episode to fill up a car with gasoline using the
elements Actor, Participant, Objects, Actions, and Goals.
11.11. Show how the E-MOP of Problem 11.10 would be indexed and accessed for the
two events of filling the car at a self-service and at a full-service location.
11.12. Are the events of Problem II. II good candidates for specialized E-MOPs Explain
your answer.
11.13. Give an example of a hashing function that does not distribute key values uniformly
over the key space.
11.14. Draw a small hypertext network that you might want to browse where the general
network subject of artificial intelligence is used. Make up your own subtopics and
show all linkages which you feel are useful, including link directions between subtopics.
11.15. Show how the E-MOP of Figure 11.7 would be generalized when peace was one of
the topics discussed at every meeting.
11.16. Modify the E-MOP of Figure 11.7 to accommodate a new meeting between Vance
and King Hussain of Jordan. The topic of their meeting is Palestinian refugees.