Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ai Complete Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

UNIT-I

What is Artificial Intelligence (AI)?


In today's world, technology is growing very fast, and we are getting in touch with different
new technologies day by day.

Here, one of the booming technologies of computer science is Artificial Intelligence which
is ready to create a new revolution in the world by making intelligent machines. The
Artificial Intelligence is now all around us. It is currently working with a variety of subfields,
ranging from general to specific, such as self-driving cars, playing chess, proving theorems,
playing music, Painting, etc.

AI is one of the fascinating and universal fields of Computer science which has a great
scope in future. AI holds a tendency to cause a machine to work as a human.

Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial
defines "man-made," and intelligence defines "thinking power", hence AI means "a
manmade thinking power." So, we can define AI as:

"It is a branch of computer science by which we can create intelligent machines which can
behave like a human, think like humans, and able to make decisions."

Artificial Intelligence exists when a machine can have human based skills such as learning,
reasoning, and solving problems

With Artificial Intelligence you do not need to preprogram a machine to do some work,
despite that you can create a machine with programmed algorithms which can work with
own intelligence, and that is the awesomeness of AI.

It is believed that AI is not a new technology, and some people says that as per Greek
myth, there were Mechanical men in early days which can work and behave like humans.

Why Artificial Intelligence?


Before Learning about Artificial Intelligence, we should know that what is the importance
of AI and why should we learn it. Following are some main reasons to learn about AI:
o With the help of AI, you can create such software or devices which can solve real
world problems very easily and with accuracy such as health issues, marketing,
traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant, such as Cortana,
Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an environment
where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new Opportunities.

Goals of Artificial Intelligence


Following are the main goals of Artificial Intelligence:

1. Replicate human intelligence

2. Solve Knowledge-intensive tasks

3. An intelligent connection of perception and action

4. Building a machine which can perform tasks that requires human intelligence.

5. Creating some system which can exhibit intelligent behavior, learn new things by
itself, demonstrate, explain, and can advise to its user.

Advantages of Artificial Intelligence


Following are some main advantages of Artificial Intelligence:

o High Accuracy with less errors: AI machines or systems are prone to less errors
and high accuracy as it takes decisions as per pre-experience or information.
o High-Speed: AI systems can be of very high-speed and fast-decision making,
because of that AI systems can beat a chess champion in the Chess game.
o High reliability: AI machines are highly reliable and can perform the same action
multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as defusing a
bomb, exploring the ocean floor, where to employ a human can be risky.
o Digital Assistant: AI can be very useful to provide digital assistant to the users such
as AI technology is currently used by various E-commerce websites to show the
products as per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as a self
driving car which can make our journey safer and hassle-free, facial recognition for
security purpose, Natural language processing to communicate with the human in
human-language, etc.

Disadvantages of Artificial Intelligence


Every technology has some disadvantages, and the same goes for Artificial intelligence.
Being so advantageous technology still, it has some disadvantages which we need to keep
in our mind while creating an AI system. Following are the disadvantages of AI:

o High Cost: The hardware and software requirement of AI is very costly as it requires
lots of maintenance to meet current world requirements.
o Can't think out of the box: Even we are making smarter machines with AI, but still
they cannot work out of the box, as the robot will only do that work for which they
are trained, or programmed.
o No feelings and emotions: AI machines can be an outstanding performer, but still
it does not have the feeling so it cannot make any kind of emotional attachment
with human, and may sometime be harmful for users if the proper care is not taken.
o Increase dependency on machines: With the increment of technology, people are
getting more dependent on devices and hence they are losing their mental
capabilities.
o No Original Creativity: As humans are so creative and can imagine some new ideas
but still AI machines cannot beat this power of human intelligence and cannot be
creative and imaginative.
Top 4 Techniques of Artificial Intelligence
Artificial Intelligence can be divided into different categories based on the machine’s capacity

to use past experiences to predict future decisions, memory, and self awareness.

Below are the various categories of Artificial Intelligence:

1. Machine Learning

It is one of the applications of AI where machines are not explicitly programmed to

perform certain tasks; rather, they learn and improve from experience automatically.

Deep Learning is a subset of machine learning based on artificial neural networks for

predictive analysis. There are various machine learning algorithms, such as Unsupervised

Learning, Supervised Learning, and Reinforcement Learning. In Unsupervised Learning,

the algorithm does not use classified information to act on it without any guidance. In

Supervised Learning, it deduces a function from the training data, which consists of a set

of an input object and the desired output. Reinforcement learning is used by machines

to take suitable actions to increase the reward to find the best possibility which should

be taken in to account.

2. NLP (Natural Language Processing)

It is the interactions between computers and human language where the computers are

programmed to process natural languages. Machine Learning is a reliable technology for

Natural Language Processing to obtain meaning from human languages. In NLP, the audio

of a human talk is captured by the machine. Then the audio-to-text conversation occurs,
and then the text is processed where the data is converted into audio. Then the machine

uses the audio to respond to humans. Applications of Natural Language Processing can

be found in IVR (Interactive Voice Response) applications used in call centres, language

translation applications like Google Translate and word processors such

as Microsoft Word to check the accuracy of grammar in text. However, the nature of

human languages makes Natural Language Processing difficult because of the rules

which are involved in the passing of information using natural language, and they are

not easy for the computers to understand. So NLP uses algorithms to recognize and

abstract the rules of the natural languages where the unstructured data from the human

languages can be converted to a format that is understood by the computer. Moreover,

NLP can also be found in content optimization such as paraphrasing applications,

which helps to improve the readability of complex text.

3. Automation and Robotics

The purpose of Automation is to get the monotonous and repetitive tasks done by

machines which also improve productivity and in receiving cost-effective and more

efficient results. Many organizations use machine learning, neural networks, and graphs

in automation. Such automation can prevent fraud issues while financial transactions

online by using CAPTCHA technology. Robotic process automation is programmed to

perform high volume repetitive tasks which can adapt to the change in different

circumstances.
4. Machine Vision

Machines can capture visual information and then analyze it. Here cameras are used to

capture the visual information, the analogue to digital conversion is used to convert the

image to digital data, and digital signal processing is employed to process the data.

Then the resulting data is fed to a computer. In machine vision, two vital aspects are

sensitivity, which is the ability of the machine to perceive impulses that are weak and

resolution, the range to which the machine can distinguish the objects. The usage of

machine vision can be found in signature identification, pattern recognition, and medical

image

analysis, etc.

Problem Solving in Artificial Intelligence

• Problem definition: Detailed specification of inputs and acceptable


system solutions.
• Problem analysis: Analyse the problem thoroughly.
• Knowledge Representation: collect detailed information about the
problem and define all possible techniques.
• Problem-solving: Selection of best techniques.

Components to formulate the associated problem:

• Initial State: This state requires an initial state for the problem which
starts the AI agent towards a specified goal. In this state new methods also
initialize problem domain solving by a specific class.
• Action: This stage of problem formulation works with function with a
specific class taken from the initial state and all possible actions done in this
stage.
• Transition: This stage of problem formulation integrates the actual action
done by the previous action stage and collects the final stage to forward it
to their next stage.
• Goal test: This stage determines that the specified goal achieved by the
integrated transition model or not, whenever the goal achieves stop the
action and forward into the next stage to determines the cost to achieve the
goal.
• Path costing: This component of problem-solving numerical assigned what
will be the cost to achieve the goal. It requires all hardware software and
human working cost.

Search Algorithms in Artificial Intelligence


Search algorithms are one of the most important areas of Artificial Intelligence. This topic
will explain all about the search algorithms in AI.

Problem-solving agents:
In Artificial Intelligence, Search techniques are universal problem-solving methods.
Rational agents or Problem-solving agents in AI mostly used these search strategies or
algorithms to solve a specific problem and provide the best result. Problemsolving agents
are the goal-based agents and use atomic representation. In this topic, we will learn
various problem-solving search algorithms.
Search Algorithm Terminologies:
o Search: Searchingis a step by step procedure to solve a search-problem in a given search space.
A search problem can have three main factors:

a. Search Space: Search space represents a set of possible solutions, which a system may
have.

b. Start State: It is a state from where agent begins the search.

c. Goal test: It is a function which observe the current state and returns whether the goal
state is achieved or not.

o Search tree: A tree representation of search problem is called Search tree. The root of the search
tree is the root node which is corresponding to the initial state.

o Actions: It gives the description of all the available actions to the agent.

o Transition model: A description of what each action do, can be represented as a transition model.

o Path Cost: It is a function which assigns a numeric cost to each path.

o Solution: It is an action sequence which leads from the start node to the goal node. o

Optimal Solution: If a solution has the lowest cost among all solutions.

Properties of Search Algorithms:


Following are the four essential properties of search algorithms to compare the efficiency
of these algorithms:

Completeness: A search algorithm is said to be complete if it guarantees to return a


solution if at least any solution exists for any random input.

Optimality: If a solution found for an algorithm is guaranteed to be the best solution


(lowest path cost) among all other solutions, then such a solution for is said to be an
optimal solution.

Time Complexity: Time complexity is a measure of time for an algorithm to complete its
task.

Space Complexity: It is the maximum storage space required at any point during the
search, as the complexity of the problem.
Types of search algorithms
Based on the search problems we can classify the search algorithms into uninformed (Blind
search) search and informed search (Heuristic search) algorithms.

Uninformed/Blind Search:
The uninformed search does not contain any domain knowledge such as closeness, the
location of the goal. It operates in a brute-force way as it only includes information about
how to traverse the tree and how to identify leaf and goal nodes. Uninformed search
applies a way in which search tree is searched without any information about the search
space like initial state operators and test for the goal, so it is also called blind search. It
examines each node of the tree until it achieves the goal node.

It can be divided into five main types:

• Breadth-first search

• Uniform cost search

• Depth-first search

• Iterative deepening depth-first search

• Bidirectional Search
• Depth-limited search

Informed Search
Informed search algorithms use domain knowledge. In an informed search, problem
information is available which can guide the search. Informed search strategies can find a
solution more efficiently than an uninformed search strategy. Informed search is also called
a Heuristic search.

A heuristic is a way which might not always be guaranteed for best solutions but
guaranteed to find a good solution in reasonable time.

Informed search can solve much complex problem which could not be solved in another
way.

An example of informed search algorithms is a traveling salesman problem.

1. Greedy Search

2. A* Search

Uninformed Search Algorithms


Uninformed search is a class of general-purpose search algorithms which operates
in brute force-way. Uninformed search algorithms do not have additional
information about state or search space other than how to traverse the tree, so it is
also called blind search.

Following are the various types of uninformed search algorithms:

1. Breadth-first Search

2. Depth-first Search

3. Depth-limited Search

4. Iterative deepening depth-first search


5. Uniform cost search

6. Bidirectional Search

1. Breadth-first Search:
o Breadth-first search is the most common search strategy for traversing a tree or graph. This

algorithm searches breadthwise in a tree or graph, so it is called breadth-first search. o BFS

algorithm starts searching from the root node of the tree and expands all successor node at

the current level before moving to nodes of next level.

o The breadth-first search algorithm is an example of a general-graph search algorithm.

o Breadth-first search implemented using FIFO queue data structure.

Advantages:

o BFS will provide a solution if any solution exists.

o If there are more than one solutions for a given problem, then BFS will provide the minimal
solution which requires the least number of steps.

Disadvantages:

It requires lots of memory since each level of the tree must be saved into memory to

expand the next level. o BFS needs lots of time if the solution is far away from the root node.

Example:
In the below tree structure, we have shown the traversing of the tree using BFS algorithm

from the root node S to goal node K. BFS search algorithm traverse in layers, so it will

follow the path which is shown by the dotted arrow, and the traversed path will be: 1. S--

> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of
nodes traversed in BFS until the shallowest Node. Where the d= depth of shallowest
solution and b is a node at every state.

T (b) = 1+b2+b3+.......+ bd= O (bd)

Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
frontier which is O(bd).

Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.

Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the
node.

2. Depth-first Search
o Depth-first search is a recursive algorithm for traversing a tree or graph data structure. o It is

called the depth-first search because it starts from the root node and follows each path to its

greatest depth node before moving to the next path. o DFS uses a stack data structure for its

implementation.

o The process of the DFS algorithm is similar to the BFS algorithm.


Note: Backtracking is an algorithm technique for finding all possible solutions using
recursion.

Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the path from
root node to the current node.

o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right path).

Disadvantage:

o There is the possibility that many states keep re-occurring, and there is no guarantee of finding

the solution. o DFS algorithm goes for deep down searching and sometime it may go to the

infinite loop.

Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow
the order as:

Root node--->Left node ----> right node.

It will start searching from root node S, and traverse A, then B, then D and E, after traversing
E, it will backtrack the tree as E has no other successor and still goal node is not found.
After backtracking it will traverse node C and then G, and here it will terminate as it found
goal node.

Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.

Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)

Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)

Space Complexity: DFS algorithm needs to store only single path from the root node,
hence space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).

Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps
or high cost to reach to the goal node.

3. Depth-Limited Search Algorithm:


A depth-limited search algorithm is similar to depth-first search with a predetermined limit.
Depth-limited search can solve the drawback of the infinite path in the Depth-first search.
In this algorithm, the node at the depth limit will treat as it has no successor nodes further.

Depth-limited search can be terminated with two Conditions of failure:

o Standard failure value: It indicates that problem does not have any solution.

o Cutoff failure value: It defines no solution for the problem within a given depth

limit.

Advantages:

Depth-limited search is Memory efficient.

Disadvantages:

o Depth-limited search also has a disadvantage of incompleteness. o It may not be optimal if the

problem has more than one solution.

Example:
Completeness: DLS search algorithm is complete if the solution is above the depthlimit.

Time Complexity: Time complexity of DLS algorithm is O(bℓ).

Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).

Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not
optimal even if ℓ>d.

4. Uniform-cost Search Algorithm:


Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph.
This algorithm comes into play when a different cost is available for each edge. The primary
goal of the uniform-cost search is to find a path to the goal node which has the lowest
cumulative cost. Uniform-cost search expands nodes according to their path costs form
the root node. It can be used to solve any graph/tree where the optimal cost is in demand.
A uniform-cost search algorithm is implemented by the priority queue. It gives maximum
priority to the lowest cumulative cost. Uniform cost search is equivalent to BFS algorithm
if the path cost of all edges is the same.

Advantages:

o Uniform cost search is optimal because at every state the path with the least cost is chosen.

Disadvantages:
o It does not care about the number of steps involve in searching and only concerned about path cost.
Due to which this algorithm may be stuck in an infinite loop.

Example:

Completeness:

Uniform-cost search is complete, such as if there is a solution, UCS will find it.

Time Complexity:

Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node.
Then the number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0
and end to C*/ε.
Hence, the worst-case time complexity of Uniform-cost search isO(b1 + [C*/ε])/.

Space Complexity:

The same logic is for space complexity so, the worst-case space complexity of Uniformcost
search is O(b1 + [C*/ε]).

Optimal:

Uniform-cost search is always optimal as it only selects a path with the lowest path cost.

5. Iterative deepening depth-first Search:


The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search
algorithm finds out the best depth limit and does it by gradually increasing the limit until
a goal is found.

This algorithm performs depth-first search up to a certain "depth limit", and it keeps
increasing the depth limit after each iteration until the goal node is found.

This Search algorithm combines the benefits of Breadth-first search's fast search and
depth-first search's memory efficiency.

The iterative search algorithm is useful uninformed search when search space is large, and
depth of goal node is unknown.

Advantages:

o It combines the benefits of BFS and DFS search algorithm in terms of fast search and memory
efficiency.

Disadvantages: o The main drawback of IDDFS is that it repeats all the work of

the previous phase.

Example:
Following tree structure is showing the iterative deepening depth-first search. IDDFS
algorithm performs various iterations until it does not find the goal node. The iteration
performed by the algorithm is given as:
1'st Iteration-----> A 2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G 4'th Iteration------>A, B, D, H, I, E, C, F, K, G In the
fourth iteration, the algorithm will find the goal node.

Completeness:

This algorithm is complete is ifthe branching factor is finite.

Time Complexity:

Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).

Space Complexity:

The space complexity of IDDFS will be O(bd).

Optimal:

IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the
node.

6. Bidirectional Search Algorithm:


Bidirectional search algorithm runs two simultaneous searches, one form initial state
called as forward-search and other from goal node called as backwardsearch, to find
the goal node. Bidirectional search replaces one single search graph with two small
subgraphs in which one starts the search from an initial vertex and other starts from
goal vertex. The search stops when these two graphs intersect each other.

Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.

Advantages:

o Bidirectional search is fast. o Bidirectional search requires less memory Disadvantages:

o Implementation of the bidirectional search tree is difficult. o In bidirectional search, one should

know the goal state in advance.


Example:
In the below search tree, bidirectional search algorithm is applied. This algorithm divides
one graph/tree into two sub-graphs. It starts traversing from node 1 in the forward
direction and starts from goal node 16 in the backward direction.

The algorithm terminates at node 9 where two searches meet.

Completeness: Bidirectional Search is complete if we use BFS in both searches.

Time Complexity: Time complexity of bidirectional search using BFS is O(bd).

Space Complexity: Space complexity of bidirectional search is O(bd).

Optimal: Bidirectional search is Optimal.

Informed Search Algorithms


So far we have talked about the uninformed search algorithms which looked through
search space for all possible solutions of the problem without having any additional
knowledge about search space. But informed search algorithm contains an array of
knowledge such as how far we are from the goal, path cost, how to reach to goal node,
etc. This knowledge help agents to explore less to the search space and find more
efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed search
algorithm uses the idea of heuristic, so it is also called Heuristic search.

Heuristics function: Heuristic is a function which is used in Informed Search, and it finds
the most promising path. It takes the current state of the agent as its input and produces
the estimation of how close agent is from the goal. The heuristic method, however, might
not always give the best solution, but it guaranteed to find a good solution in reasonable
time. Heuristic function estimates how close a state is to the goal. It is represented by h(n),
and it calculates the cost of an optimal path between the pair of states. The value of the
heuristic function is always positive.

Admissibility of the heuristic function is given as:

h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be
less than or equal to the estimated cost.

Pure Heuristic Search:


Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes
based on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the
CLOSED list, it places those nodes which have already expanded and in the OPEN list, it
places nodes which have yet not been expanded.

On each iteration, each node n with the lowest heuristic value is expanded and generates
all its successors and n is placed to the closed list. The algorithm continues unit a goal
state is found.

In the informed search we will discuss two main algorithms which are given below:

o Best First Search Algorithm(Greedy search) o

A* Search Algorithm

1.) Best-first Search Algorithm (Greedy Search):


Greedy best-first search algorithm always selects the path which appears best at that
moment. It is the combination of depth-first search and breadth-first search algorithms. It
uses the heuristic function and search. Best-first search allows us to take the advantages
of both algorithms. With the help of best-first search, at each step, we can choose the most
promising node. In the best first search algorithm, we expand the node which is closest to
the goal node and the closest cost is estimated by heuristic function, i.e.

1. f(n)= g(n).

Were, h(n)= estimated cost from node n to the goal.

The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm:


o Step 1: Place the starting node into the OPEN list. o Step 2: If the OPEN list is empty, Stop and

return failure.

o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and places it
in the CLOSED list.

o Step 4: Expand the node n, and generate the successors of node n.

o Step 5: Check each successor of node n, and find whether any node is a goal node or not. If any
successor node is goal node, then return success and terminate the search, else proceed to
Step 6.

o Step 6: For each successor node, algorithm checks for evaluation function f(n), and then check if
the node has been in either OPEN or CLOSED list. If the node has not been in both list, then add
it to the OPEN list. o Step 7: Return to Step 2.

Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms. o This algorithm is more efficient than BFS and DFS algorithms.

Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario. o It can get

stuck in a loop as DFS. o This algorithm is not optimal.

Example:
Consider the below search problem, and we will traverse it using greedy best-first search.
At each iteration, each node is expanded using evaluation function f(n)=h(n) , which is
given in the below table.

In this search example, we are using two lists which are OPEN and CLOSED Lists.
Following are the iteration for traversing the above example.

Expand the nodes of S and put in the CLOSED list

Initialization: Open [A, B], Closed [S]


Iteration 1: Open [A], Closed [S, B]

Iteration 2: Open [E, F, A], Closed [S, B]


: Open [E, A], Closed [S, B, F]

Iteration 3: Open [I, G, E, A], Closed [S, B, F]


: Open [I, E, A], Closed [S, B, F, G]

Hence the final solution path will be: S----> B----->F----> G

Time Complexity: The worst case time complexity of Greedy best first search is O(bm).

Space Complexity: The worst case space complexity of Greedy best first search is O(bm).
Where, m is the maximum depth of the search space.

Complete: Greedy best-first search is also incomplete, even if the given state space is
finite.

Optimal: Greedy best first search algorithm is not optimal.

2.) A* Search Algorithm:


A* search is the most commonly known form of best-first search. It uses heuristic function
h(n), and cost to reach the node n from the start state g(n). It has combined features of
UCS and greedy best-first search, by which it solve the problem efficiently. A* search
algorithm finds the shortest path through the search space using the heuristic function.
This search algorithm expands less search tree and provides optimal result faster. A*
algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).

In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence
we can combine both costs as following, and this sum is called as a fitness number.

At each point in the search space, only those node is expanded which have the lowest value of
f(n), and the algorithm terminates when the goal node is found.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.

Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and
stops.

Step 3: Select the node from the OPEN list which has the smallest value of evaluation
function (g+h), if node n is goal node then return success and stop, otherwise

Step 4: Expand node n and generate all of its successors, and put n into the closed list.
For each successor n', check whether n' is already in the OPEN or CLOSED list, if not then
compute evaluation function for n' and place into Open list.

Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the
back pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Advantages:
o A* search algorithm is the best algorithm than other search algorithms. o A* search algorithm is

optimal and complete. o This algorithm can solve very complex problems.

Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and approximation.
o A* search algorithm has some complexity issues.

o The main drawback of A* is memory requirement as it keeps all generated nodes in the memory, so
it is not practical for various large-scale problems.

Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of all
states is given in the below table so we will calculate the f(n) of each state using the formula f(n)=
g(n) + h(n), where g(n) is the cost to reach any node from start state.

Here we will use OPEN and CLOSED list.


Solution:

Initialization: {(S, 5)}

Iteration1: {(S--> A, 4), (S-->G, 10)}

Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}

Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}

Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path
with cost 6.

Points to remember:

o A* algorithm returns the path which occurred first, and it does not search for all remaining paths.

o The efficiency of A* algorithm depends on the quality of heuristic. o A* algorithm expands all

nodes which satisfy the condition f(n)<="" li=""> Complete: A* algorithm is complete as
long as:

o Branching factor is finite. o Cost at every action is fixed.

Optimal: A* search algorithm is optimal if it follows below two conditions:

o Admissible: the first condition requires for optimality is that h(n) should be an admissible

heuristic for A* tree search. An admissible heuristic is optimistic in nature. o Consistency: Second

required condition is consistency for only A* graphsearch.

If the heuristic function is admissible, then A* tree search will always find the least cost
path.

Time Complexity: The time complexity of A* search algorithm depends on heuristic


function, and the number of nodes expanded is exponential to the depth of solution d. So
the time complexity is O(b^d), where b is the branching factor.

Space Complexity: The space complexity of A* search algorithm is O(b^d)

Production System in AI
A production system (popularly known as a production rule system) is a
kind of cognitive architecture that is used to implement search algorithms
and replicate human problem-solving skills. This problem
solving knowledge is encoded in the system in the form of little quanta
popularly known as productions.
It consists of two components: rule and action.
Rules recognize the condition, and the actions part has the knowledge of
how to deal with the condition. In simpler words, the production system in
AI contains a set of rules which are defined by the left side and right side of
the system. The left side contains a set of things to watch for
(condition), and the right side contains the things to do (action).
What are the Elements of a Production Sys-
tem?
An AI production system has three main elements which are as follows:
• Global Database: The primary database which contains all the
information necessary to successfully complete a task. It is further
broken down into two parts: temporary and permanent. The
temporary part contains information relevant to the current
situation only whereas the permanent part contains information
about the fixed actions.
• A set of Production Rules: A set of rules that operates on the
global database. Each rule consists of a precondition and
postcondition that the global database either meets or not. For
example, if a condition is met by the global database, then the
production rule is applied successfully.
• Control System: A control system that acts as the decision-maker,
decides which production rule should be applied. The Control
system stops computation or processing when a termination
condition is met on the database.

What are the Features of a Production Sys-


tem?
A production system has the following features:
1. Simplicity: Due to the use of the IF-THEN structure, each sentence is
unique in the production system. This uniqueness makes the
knowledge representation simple to enhance the readability of the
production rules.
2. Modularity: The knowledge available is coded in discrete pieces by
the production system, which makes it easy to add, modify, or delete
the information without any side effects.
3. Modifiability: This feature allows for the modification of the
production rules. The rules are first defined in the skeletal form and
then modified to suit an application.
4. Knowledge-intensive: As the name suggests, the system only stores
knowledge. All the rules are written in the English language. This
type of representation solves the semantics problem.

UNIT-II
Heuristics
A heuristic is a technique that is used to solve a problem faster than the classic methods. These
techniques are used to find the approximate solution of a problem when classical methods do
not. Heuristics are said to be the problem-solving techniques that result in practical and quick
solutions.

Heuristics are strategies that are derived from past experience with similar problems. Heuristics
use practical methods and shortcuts used to produce the solutions that may or may not be
optimal, but those solutions are sufficient in a given limited timeframe.

Heuristic search techniques in AI (Artificial Intelligence)

Hill Climbing Algorithm


It is a technique for optimizing the mathematical problems. Hill Climbing is widely used
when a good heuristic is available.

It is a local search algorithm that continuously moves in the direction of increasing


elevation/value to find the mountain's peak or the best solution to the problem. It
terminates when it reaches a peak value where no neighbor has a higher value.
Travelingsalesman Problem is one of the widely discussed examples of the Hill climbing
algorithm, in which we need to minimize the distance traveled by the salesman.

It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that. The steps of a simple hill-climbing algorithm are listed below: Step
1: Evaluate the initial state. If it is the goal state, then return success and Stop.

Step 2: Loop Until a solution is found or there is no new operator left to apply.

Step 3: Select and apply an operator to the current state.

Step 4: Check new state:

If it is a goal state, then return to success and quit.

Else if it is better than the current state, then assign a new state as a current state.

Else if not better than the current state, then return to step2.

Step 5: Exit.

Branch and Bound Algorithm


Branch and bound is an algorithm design paradigm which is generally used for solving
combinatorial optimization problems. These problems are typically exponential in terms of
time complexity and may require exploring all possible permutations in worst case. The
Branch and Bound Algorithm technique solves these problems relatively quickly.

Depth-first branch-and-bound search is a way to combine the space saving of


depth-first search with heuristic information for finding optimal paths. It is
particularly applicable when there are many paths to a goal. As in A* search, the
heuristic functionh(n) is non-negative and less than or equal to the cost of a
lowest-cost path from n to a goal node.
The idea of a branch-and-bound search is to maintain the lowest-cost path to
a goal found so far, and its cost. Suppose this cost is bound. If the search
encounters a path p such that cost(p)+h(p)≥ bound, path p can be pruned. If a
non-pruned path to a goal is found, it must be better than the previous best path.
This new solution is remembered and bound is set to the cost of this new
solution. The searcher then proceeds to search for a better solution.
Branch-and-bound search generates a sequence of ever-improving solutions.
The final solution found is the optimal solution.
Branch-and-bound search is typically used with depth-first search, where the
space saving of the depth-first search can be achieved. It can be implemented
similarly to depth-bounded search, but where the bound is in terms of path cost
and reduces as shorter paths are found. The algorithm remembers the lowestcost
path found and returns this path when the search finishes.
1: procedure

DF_branch_and_bound(G,s,goal,h,bound0,,goal,ℎ,bound0)
2: Inputs

3: G: graph with nodes N and arcs A


4: s: start node
5: goal: Boolean function on nodes

6: h: heuristic function on nodes

7: bound0: initial depth bound (can be ∞∞ if not specified)


8: Output

9: a lowest-cost path from s to a goal node if there is a solution


with cost less than bound0

10: or ⊥ if there is no solution with cost less than bound0
11: Local


12: best_path: path or ⊥
13: bound: non-negative real

14: procedure cbsearch(⟨n0,…,nk)

15: if cost(⟨n0,…,nk⟩)+h(nk)< bound then


16: if goal(nk) then
17: best_path:=⟨n0,…,nk⟩
18: bound:=cost(⟨n0,…,nk⟩)

19: else

20: for each arc ⟨nk,n⟩∈A do


21: cbsearch(⟨n0,…,nk,n⟩)

22: best_path:=⊥
23: bound:=bound0
24: cbsearch(⟨s⟩)
25: return best_path
And-Or graph

PROBLEM REDUCTION:
So far we have considered search strategies for OR graphs through which we want to find a single
path to a goal. Such structure represent the fact that we know how to get from anode to a goal state if
we can discover how to get from that node to a goal state along any one of the branches leaving it.

AND-OR GRAPHS
The AND-OR GRAPH (or tree) is useful for representing the solution of problems that can solved by
decomposing them into a set of smaller problems, all of which must then be solved. This
decomposition, or reduction, generates arcs that we call AND arcs. One AND arc may point to any
number of successor nodes, all of which must be solved in order for the arc to point to a solution.
Just as in an OR graph, several arcs may emerge from a single node, indicating a variety of ways in
which the original problem might be solved. This is why the structure is called not simply an
ANDgraph but rather an AND-OR graph (which also happens to be an AND-OR tree)

AO* Search (And-Or) Graph – Artificial Intelligence


AO* Search (And-Or) Graph, Advantages and Disadvantages – Artificial Intelligence –
Artificial Intelligence

The Depth-first search and Breadth-first search given earlier for OR trees or graphs can be
easily adopted by AND-OR graph. The main difference lies in the way termination
conditions are determined since all goals following an AND node must be realized;
whereas a single goal node following an OR node will do. So for this purpose, we are
using AO* algorithm.
Like A* algorithm here we will use two arrays and one heuristic function.

OPEN: It contains the nodes that have been traversed but yet not been marked solvable
or unsolvable.
CLOSE: It contains the nodes that have already been processed. h(n):
The distance from the current node to the goal node.

AO* Search Algorithm


Step 1: Place the starting node into OPEN.

Step 2: Compute the most promising solution tree say T0.


Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from OPEN
and place it in CLOSE

Step 4: If n is the terminal goal node then leveled n as solved and leveled all the ancestors
of n as solved. If the starting node is marked as solved then success and exit.

Step 5: If n is not a solvable node, then mark n as unsolvable. If starting node is marked as
unsolvable, then return failure and exit.

Step 6: Expand n. Find all its successors and find their h (n) value, push them into OPEN.

Step 7: Return to Step 2.

Step 8: Exit.

Advantages of AO* Star


• It is an optimal algorithm.
• If traverse according to the ordering of nodes.
• It can be used for both OR and AND graph.

Disadvantages of AO* Star


• Sometimes for unsolvable nodes, it can’t find the optimal path.
• Its complexity is than other algorithms.
EXAMPLE

We have encountered a wide variety of methods, including adversarial search and instant
search, to address various issues. Every method for issue has a single purpose in mind: to
locate a remedy that will enable that achievement of the objective. However there were
no restrictions just on bots' capability to resolve issues as well as arrive at responses in
adversarial search and local search, respectively.
Constraint Satisfaction Problems in Artificial
Intelligence
These section examines the constraint optimization methodology, another form or real
concern method. By its name, constraints fulfilment implies that such an issue must be
solved while adhering to a set of restrictions or guidelines.

Whenever a problem is actually variables comply with stringent conditions of principles, it


is said to have been addressed using the solving multi - objective method. Wow what a
method results in a study sought to achieve of the intricacy and organization of both the
issue.

Three factors affect restriction compliance, particularly regarding:

o It refers to a group of parameters, or X.


o D: The variables are contained within a collection several domain. Every variables has a distinct
scope.

o C: It is a set of restrictions that the collection of parameters must abide by.


In constraint satisfaction, domains are the areas wherein parameters were located after
the restrictions that are particular to the task. Those three components make up a
constraint satisfaction technique in its entirety. The pair "scope, rel" makes up the number
of something like the requirement. The scope is a tuple of variables that contribute to the
restriction, as well as rel is indeed a relationship that contains a list of possible solutions
for the parameters should assume in order to meet the restrictions of something like the
issue.

Issues with Contains A certain amount Solved

For a constraint satisfaction problem (CSP), the following conditions must be met:

o States area o fundamental idea while behind remedy.

The definition of a state in phase space involves giving values to any or all of the
parameters, like as

X1 = v1, X2 = v2, etc.

There are 3 methods to economically beneficial to something like a parameter:

1. Consistent or Legal Assignment: A task is referred to as consistent or legal if it complies with all
laws and regulations.

2. Complete Assignment: An assignment in which each variable has a number associated to it and
that the CSP solution is continuous. One such task is referred to as a completed task.

3. A partial assignment is one that just gives some of the variables values. Projects of this nature are
referred to as incomplete assignment.

Domain Categories within CSP


The parameters utilize one of the two types of domains listed below:

o Discrete Domain: This limitless area allows for the existence of a single state with numerous
variables. For instance, every parameter may receive a endless number of beginning states.

o It is a finite domain with continous phases that really can describe just one area for just one
particular variable. Another name for it is constant area.
Types of Constraints in CSP
Basically, there are three different categories of limitations in regard towards the
parameters:

o Unary restrictions are the easiest kind of restrictions because they only limit the value of one
variable.

o Binary resource limits: These restrictions connect two parameters. A value between x1 and x3 can
be found in a variable named x2.

o Global Resource limits: This kind of restriction includes a unrestricted amount of variables.

The main kinds of restrictions are resolved using certain kinds of resolution
methodologies:

o In linear programming, when every parameter carrying an integer value only occurs in linear
equation, linear constraints are frequently utilised.

o Non-linear Constraints: With non-linear programming, when each variable (an integer value)
exists in a non-linear form, several types of restrictions were utilised.
Note: The preferences restriction is a unique restriction that operates in the actual world.
Think of a Sudoku puzzle where some of the squares have initial fills of certain integers.

You must complete the empty squares with numbers between 1 and 9, making sure that
no rows, columns, or blocks contains a recurring integer of any kind. This solving multi -
objective issue is pretty elementary. A problem must be solved while taking certain
limitations into consideration.

The integer range (1-9) that really can occupy the other spaces is referred to as a domain,
while the empty spaces themselves were referred as variables. The values of the variables
are drawn first from realm. Constraints are the rules that determine how a variable will
select the scope.

Mini-Max Algorithm in Artificial Intelligence


o Mini-max algorithm is a recursive or backtracking algorithm which is used in decisionmaking and
game theory. It provides an optimal move for the player assuming that opponent is also playing
optimally.
o Mini-Max algorithm uses recursion to search through the game-tree.
o Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tictactoe, go,
and various tow-players game. This Algorithm computes the minimax decision for the current
state.

o In this algorithm two players play the game, one is called MAX and other is called MIN.

o Both the players fight it as the opponent player gets the minimum benefit while they get the
maximum benefit.

o Both Players of the game are opponent of each other, where MAX will select the maximized value
and MIN will select the minimized value.

o The minimax algorithm performs a depth-first search algorithm for the exploration of the
complete game tree.

o The minimax algorithm proceeds all the way down to the terminal node of the tree, then backtrack
the tree as the recursion.

Working of Min-Max Algorithm:


o The working of the minimax algorithm can be easily described using an example. Below we have
taken an example of game-tree which is representing the two-player game.
o In this example, there are two players one is called Maximizer and other is called Minimizer.

o Maximizer will try to get the Maximum possible score, and Minimizer will try to get the minimum
possible score.

o This algorithm applies DFS, so in this game-tree, we have to go all the way through the leaves to
reach the terminal nodes.

o At the terminal node, the terminal values are given so we will compare those value and backtrack
the tree until the initial state occurs. Following are the main steps involved in solving the twoplayer
game tree:

Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility
function to get the utility values for the terminal states. In the below tree diagram, let's
take A is the initial state of the tree. Suppose maximizer takes first turn which has worstcase
initial value =- infinity, and minimizer will take next turn which has worstcase initial value
= +infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we
will compare each value in terminal state with initial value of Maximizer and determines
the higher nodes values. It will find the maximum among the all.

o For node D max(-1,- -∞) => max(-1,4)= 4 o For Node E max(2, -∞) => max(2,

6)= 6 o For Node F max(-3, -∞) => max(-3,-5) = -3 o For node G max(0, -∞) =

max(0, 7) = 7

Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with

+∞, and will find the 3rd layer node values. o For node B= min(4,6) = 4 o For node C= min (-

3, 7) = -3
Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes
value and find the maximum value for the root node. In this game tree, there are only 4
layers, hence we reach immediately to the root node, but in real games, there will be more
than 4 layers. o For node A max(4, -3)= 4

That was the complete workflow of the minimax two player game.

Properties of Mini-Max algorithm:


o Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist), in the finite
search tree.
o Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
Time complexity- As it performs DFS for the game-tree, so the time complexity of MinMax
algorithm is O(bm), where b is branching factor of the game-tree, and m is the maximum
depth of the tree.

o Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS which is
O(bm).

Limitation of the minimax Algorithm:


The main drawback of the minimax algorithm is that it gets really slow for complex games
such as Chess, go, etc. This type of games has a huge branching factor, and the player has
lots of choices to decide. This limitation of the minimax algorithm can be improved from
alpha-beta pruning which we have discussed in the next topic.

UNIT-III
First-Order Logic in Artificial intelligence
In the topic of Propositional logic, we have seen that how to represent statements using
propositional logic. But unfortunately, in propositional logic, we can only represent the
facts, which are either true or false. PL is not sufficient to represent the complex sentences
or natural language statements. The propositional logic has very limited expressive power.
Consider the following sentence, which we cannot represent using PL logic.

o "Some humans are intelligent", or o "Sachin likes cricket."

To represent the above statements, PL logic is not sufficient, so we required some more
powerful logic, such as first-order logic.
First-Order logic:
o First-order logic is another way of knowledge representation in artificial intelligence. It is an
extension to propositional logic. o FOL is sufficiently expressive to represent the natural language
statements in a concise way.

First-order logic is also known as Predicate logic or First-order predicate logic. Firstorder
logic is a powerful language that develops information about the objects in a more easy
way and can also express the relationship between those objects.

o First-order logic (like natural language) does not only assume that the world contains facts like
propositional logic but also assumes the following things in the world:

o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ...... o Relations: It

can be unary relation such as: red, round, is adjacent, or n-any relation such as: the sister of,

brother of, has color, comes between o Function: Father of, best friend,

third inning of, end of, ......

o As a natural language, first-order logic also has two main parts:

a. Syntax

b. Semantics

Syntax of First-Order logic:


The syntax of FOL determines which collection of symbols is a logical expression in
firstorder logic. The basic syntactic elements of first-order logic are symbols. We write
statements in short-hand notation in FOL.

Basic Elements of First-order logic:


Following are the basic elements of FOL syntax:

Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....
Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed
from a predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).

Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).


Chinky is a cat: => cat (Chinky).

Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.

First-order logic statements can be divided into two parts:

o Subject: Subject is the main part of the statement.

o Predicate: A predicate can be defined as a relation, which binds two atoms together in a
statement.

Consider the statement: "x is an integer.", it consists of two parts, the first part x is the
subject of the statement and second part "is an integer," is known as a predicate.
Quantifiers in First-order logic:
o A quantifier is a language element which generates quantification, and quantification specifies
the quantity of specimen in the universe of discourse.

These are the symbols that permit to determine or identify the range and scope of the variable
in the logical expression. There are two types of quantifier:

a. Universal Quantifier, (for all, everyone, everything)

b. Existential quantifier, (for some, at least one).

Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.

The Universal quantifier is represented by a symbol , which resembles an inverted A.


Note: In universal quantifier we use implication "→".
If x is a variable, then x is read as:

o For all x o For each x o For

every x.

Example:
All man drink coffee.

Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within
its scope is true for at least one instance of something.

It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
Note: In Existential quantifier we always use AND or Conjunction symbol ( ∧).
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:

o There exists a 'x.' o For some 'x.' o For at least one 'x.'
Example:
Some boys are intelligent.

Points to remember:
The main connective for universal quantifier ∀ is implication →.
o The main connective for existential quantifier ∃ is and ∧.

Properties of Quantifiers:
o In universal quantifier, ∀x∀y is similar to ∀y∀x. o In Existential quantifier, ∃x∃y is similar to
∃y∃x. o ∃x∀y is not similar to ∀y∃x.

Free and Bound Variables:


The quantifiers interact with variables which appear in a suitable way. There are two types
of variables in First-order logic which are given below:

Free Variable: A variable is said to be a free variable in a formula if it occurs outside the
scope of the quantifier.

Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.

Bound Variable: A variable is said to be a bound variable in a formula if it occurs within


the scope of the quantifier.

Example: ∀x [A (x) B( y)], here x and y are the bound variables.

Resolution in FOL
Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs,
i.e., proofs by contradictions. It was invented by a Mathematician John Alan Robinson in
the year 1965.

Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions.
Resolution is a single inference rule which can efficiently operate on the conjunctive
normal form or clausal form.

Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit
clause.

Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to


be conjunctive normal form or CNF.
50.7M

866 History of
Java

Note: To better understand this topic, firstly learns the FOL in AI.
The resolution inference rule:
The resolution rule for first-order logic is simply a lifted version of the propositional rule.
Resolution can resolve two clauses if they contain complementary literals, which are
assumed to be standardized apart so that they share no variables.

Where li and mj are complementary literals.

This rule is also called the binary resolution rule because it only resolves exactly two
literals.

Steps for Resolution:


1. Conversion of facts into first-order logic.
2. Convert FOL statements into CNF

3. Negate the statement which needs to prove (proof by contradiction)

4. Draw resolution graph (unification).

1.
What is Unification?
o Unification is a process of making two different logical atomic expressions identical by finding
a substitution. Unification depends on the substitution process. o It takes two literals as input
and makes them identical using substitution.

o Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎, then it can be
expressed as UNIFY(Ψ1, Ψ2).

o The UNIFY algorithm is used for unification, which takes two atomic sentences and returns a
unifier for those sentences (If any exist).

o Unification is a key component of all first-order inference algorithms. o It returns fail if the

expressions do not match with each other.

o The substitution variables are called Most General Unifier or MGU.

Conditions for Unification:


Following are some basic conditions for unification:

o Predicate symbol must be same, atoms or expression with different predicate symbol can never
be unified.

o Number of Arguments in both expressions must be identical. o Unification will fail if there

are two similar variables present in the same expression.

Implementation of the Algorithm


Step.1: Initialize the substitution set to be empty.

Step.2: Recursively unify atomic sentences:

a. Check for Identical expression match.

b. If one expression is a variable vi, and the other is a term ti which does not contain variable vi,
then:

a. Substitute ti / vi in the existing substitutions

b. Add ti /vi to the substitution setlist.


c. If both the expressions are functions, then function name must be similar, and the number
of arguments must be the same in both the expression.

Inference in First-Order Logic


Inference in First-Order Logic is used to deduce new facts or sentences from existing
sentences. Before understanding the FOL inference rule, let's understand some basic
terminologies used in FOL.

Substitution:

Substitution is a fundamental operation performed on terms and formulas. It occurs in all


inference systems in first-order logic. The substitution is complex in the presence of
quantifiers in FOL. If we write F[a/x], so it refers to substitute a constant "a" in place of
variable "x".

Note: First-order logic is capable of expressing facts about some or all objects in the
universe.
Equality:

First-Order logic does not only use predicate and terms for making atomic sentences but
also uses another way, which is equality in FOL. For this, we can use equality symbols
which specify that the two terms refer to the same object.

Example: Brother (John) = Smith.

As in the above example, the object referred by the Brother (John) is similar to the object
referred by Smith. The equality symbol can also be used with negation to represent that
two terms are not the same objects.

Example: ¬(x=y) which is equivalent to x ≠y.

FOL inference rules for quantifier:


As propositional logic we also have inference rules in first-order logic, so following are some
basic inference rules in FOL:

o Universal Generalization o

Universal Instantiation o
Existential Instantiation o

Existential introduction

1. Universal Generalization:

o Universal generalization is a valid inference rule which states that if premise P(c) is true for any
arbitrary element c in the universe of discourse, then we can have a conclusion as x
P(x).

o It can be represented as: .

o This rule can be used if we want to show that every element has a similar property. o

In this rule, x must not appear as a free variable.

Example: Let's represent, P(c): "A byte contains 8 bits", so for x P(x) "All bytes contain
8 bits.", it will also be true.

2. Universal Instantiation:

o Universal instantiation is also called as universal elimination or UI is a valid inference rule.


It can be applied multiple times to add new sentences.

o The new KB is logically equivalent to the previous KB.

o As per UI, we can infer any sentence obtained by substituting a ground term for the variable.

o The UI rule state that we can infer any sentence P(c) by substituting a ground term c (a constant
within domain x) from x P(x) for any object in the universe of discourse.

o It can be represented as: .

Example:1.

IF "Every person like ice-cream"=> x P(x) so we can infer that


"John likes ice-cream" => P(c)

3. Existential Instantiation:

o Existential instantiation is also called as Existential Elimination, which is a valid inference rule in
first-order logic.
o It can be applied only once to replace the existential sentence.

o The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB was satisfiable.

o This rule states that one can infer P(c) from the formula given in the form of ∃x P(x) for a new
constant symbol c.

o The restriction with this rule is that c used in the rule must be a new term for which P(c ) is true.

o It can be represented as:

Example:

From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),

So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge
base.

o The above used K is a constant symbol, which is called Skolem constant. o The

Existential instantiation is a special case of Skolemization process.

4. Existential introduction

o An existential introduction is also known as an existential generalization, which is a valid inference


rule in first-order logic.

o This rule states that if there is some element c in the universe of discourse which has a property
P, then we can infer that there exists something in the universe which has the property P.

o It can be represented as:

o Example: Let's say that,


"Priyanka got good marks in English."
"Therefore, someone got good marks in English."

Generalized Modus Ponens Rule:


For the inference process in FOL, we have a single inference rule which is called Generalized Modus
Ponens. It is lifted version of Modus ponens.
Generalized Modus Ponens can be summarized as, " P implies Q and P is asserted to be true,
therefore Q must be True."
According to Modus Ponens, for atomic sentences pi, pi', q. Where there is a substitution θ such
that SUBST (θ, pi',) = SUBST(θ, pi), it can be represented as:

Example:
We will use this rule for Kings are evil, so we will find some x such that x is king, and x
is greedy so we can infer that x is evil.

Forward Chaining and backward chaining in AI


In artificial intelligence, forward and backward chaining is one of the important topics, but
before understanding forward and backward chaining lets first understand that from
where these two terms came.

Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence,
which applies logical rules to the knowledge base to infer new information from known
facts. The first inference engine was part of the expert system. Inference engine commonly
proceeds in two modes, which are:

a. Forward chaining

b. Backward chaining

Horn Clause and Definite clause:

Horn clause and definite clause are the forms of sentences, which enables knowledge base
to use a more restricted and efficient inference algorithm. Logical inference algorithms use
forward and backward chaining approaches, which require KB in the form of the firstorder
definite clause.

Definite clause: A clause which is a disjunction of literals with exactly one positive literal
is known as a definite clause or strict horn clause.

Horn clause: A clause which is a disjunction of literals with at most one positive literal is known
as horn clause. Hence all the definite clauses are horn clauses.
Example: (¬ p V ¬ q V k). It has only one positive literal k.

It is equivalent to p q → k.

A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when
using an inference engine. Forward chaining is a form of reasoning which start with atomic
sentences in the knowledge base and applies inference rules (Modus Ponens) in the
forward direction to extract more data until a goal is reached.

The Forward-chaining algorithm starts from known facts, triggers all rules whose premises
are satisfied, and add their conclusion to the known facts. This process repeats until the
problem is solved.

Properties of Forward-Chaining:

o It is a down-up approach, as it moves from bottom to top.

o It is a process of making a conclusion based on known facts or data, by starting from the initial
state and reaches the goal state.

o Forward-chaining approach is also called as data-driven as we reach to the goal using available
data.

o Forward -chaining approach is commonly used in the expert system, such as CLIPS, business, and
production rule systems.

B. Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method
when using an inference engine. A backward chaining algorithm is a form of reasoning,
which starts with the goal and works backward, chaining through rules to find known facts
that support the goal.

Properties of backward chaining:

o It is known as a top-down approach.

o Backward-chaining is based on modus ponens inference rule. o In backward chaining, the goal is

broken into sub-goal or sub-goals to prove the facts true.


o It is called a goal-driven approach, as a list of goals decides which rules are selected and used.

o Backward -chaining algorithm is used in game theory, automated theorem proving tools,
inference engines, proof assistants, and various AI applications.

o The backward-chaining method mostly used a depth-first search strategy for proof.

Techniques of knowledge representation


There are mainly four ways of knowledge representation which are given as follows:

1. Logical Representation

2. Semantic Network Representation

3. Frame Representation

4. Production Rules

1. Logical Representation
Logical representation is a language with some concrete rules which deals with
propositions and has no ambiguity in representation. Logical representation means
drawing a conclusion based on various conditions. This representation lays down some
important communication rules. It consists of precisely defined syntax and semantics
which supports the sound inference. Each sentence can be translated into logics using
syntax and semantics.

Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences in the logic. o

It determines which symbol we can use in knowledge representation.

o How to write those symbols.

Semantics:
o Semantics are the rules by which we can interpret the sentence in the logic. o Semantic also

involves assigning a meaning to each sentence. Logical representation can be categorised

into mainly two logics:

a. Propositional Logics

b. Predicate logics
Note: We will discuss Prepositional Logics and Predicate logics in later chapters.

Advantages of logical representation:


1. Logical representation enables us to do logical reasoning.

2. Logical representation is the basis for the programming languages.

Disadvantages of logical Representation:


1. Logical representations have some restrictions and are challenging to work with.

2. Logical representation technique may not be very natural, and inference may not be so efficient.
Note: Do not be confused with logical representation and logical reasoning as logical
representation is a representation language and reasoning is a process of thinking
logically.
2. Semantic Network Representation
Semantic networks are alternative of predicate logic for knowledge representation. In
Semantic networks, we can represent our knowledge in the form of graphical networks.
This network consists of nodes representing objects and arcs which describe the
relationship between those objects. Semantic networks can categorize the object in
different forms and can also link those objects. Semantic networks are easy to understand
and can be easily extended.

This representation consist of mainly two types of relations:


a. IS-A relation (Inheritance)

b. Kind-of-relation

Example: Following are some statements which we need to represent in the form of nodes
and arcs.

Statements:
a. Jerry is a cat.

b. Jerry is a mammal

c. Jerry is owned by Priya.

d. Jerry is brown colored.

e. All Mammals are animal.

In the above diagram, we have represented the different type of knowledge in the form of
nodes and arcs. Each object is connected with another object by some relation.

Drawbacks in Semantic representation:


1. Semantic networks take more computational time at runtime as we need to traverse the complete
network tree to answer some questions. It might be possible in the worst case scenario that after
traversing the entire tree, we find that the solution does not exist in this network.

2. Semantic networks try to model human-like memory (Which has 1015 neurons and links) to store
the information, but in practice, it is not possible to build such a vast semantic network.

3. These types of representations are inadequate as they do not have any equivalent quantifier, e.g.,
for all, for some, none, etc.

4. Semantic networks do not have any standard definition for the link names.
5. These networks are not intelligent and depend on the creator of the system.

Advantages of Semantic network:


1. Semantic networks are a natural representation of knowledge.

2. Semantic networks convey meaning in a transparent manner.

3. These networks are simple and easily understandable.

3. Frame Representation
A frame is a record like structure which consists of a collection of attributes and its values
to describe an entity in the world. Frames are the AI data structure which divides
knowledge into substructures by representing stereotypes situations. It consists of a
collection of slots and slot values. These slots may be of any type and sizes. Slots have
names and values which are called facets.

Facets: The various aspects of a slot is known as Facets. Facets are features of frames
which enable us to put constraints on the frames. Example: IF-NEEDED facts are called
when data of any particular slot is needed. A frame may consist of any number of slots,
and a slot may include any number of facets and facets may have any number of values.
A frame is also known as slot-filter knowledge representation in artificial intelligence.

Frames are derived from semantic networks and later evolved into our modern-day classes
and objects. A single frame is not much useful. Frames system consist of a collection of
frames which are connected. In the frame, knowledge about an object or event can be
stored together in the knowledge base. The frame is a type of technology which is widely
used in various applications including Natural language processing and machine visions.

Example: 1
Let's take an example of a frame for a book

Slots Filters
Title Artificial Intelligence

Genre Computer Science

Author Peter Norvig

Edition Third Edition

Year 1996

Page 1152

Advantages of frame representation:


1. The frame knowledge representation makes the programming easier by grouping the related
data.

2. The frame representation is comparably flexible and used by many applications in AI.

3. It is very easy to add slots for new attribute and relations.

4. It is easy to include default data and to search for missing values.

5. Frame representation is easy to understand and visualize.

Disadvantages of frame representation:


1. In frame system inference mechanism is not be easily processed.

2. Inference mechanism cannot be smoothly proceeded by frame representation.

3. Frame representation has a much generalized approach.

4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If condition then
action". It has mainly three parts:
o The set of production rules o

Working Memory o The

recognize-act-cycle

In production rules agent checks for the condition and if the condition exists then
production rule fires and corresponding action is carried out. The condition part of the
rule determines which rule may be applied to a problem. And the action part carries out
the associated problem-solving steps. This complete process is called a recognize-act
cycle.

The working memory contains the description of the current state of problems-solving
and rule can write knowledge to the working memory. This knowledge match and may fire
other rules.

If there is a new situation (state) generates, then multiple production rules will be fired
together, this is called conflict set. In this situation, the agent needs to select a rule from
these sets, and it is called a conflict resolution.

Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus) o IF (on the bus

AND paid AND empty seat) THEN action (sit down). o IF (on bus

AND unpaid) THEN action (pay charges). o IF (bus arrives at

destination) THEN action (get down from the bus).

Advantages of Production rule:


1. The production rules are expressed in natural language.

2. The production rules are highly modular, so we can easily remove, add or modify an individual
rule.

Disadvantages of Production rule:


1. Production rule system does not exhibit any learning capabilities, as it does not store the result
of the problem for the future uses.

2. During the execution of the program, many rules may be active hence rule-based production
systems are inefficient.
Scripts

A script is a structured representation describing a stereotyped sequence of events in a particular


context.

Scripts are used in natural language understanding systems to organize a knowledge base in terms of the
situations that the system should understand. Scripts use a frame-like structure to represent the
commonly occurring experience like going to the movies eating in a restaurant, shopping in a
supermarket, or visiting an ophthalmologist.

Thus, a script is a structure that prescribes a set of circumstances that could be expected to follow on
from one another.

Scripts are beneficial because:

Events tend to occur in known runs or patterns.

A casual relationship between events exist.

An entry condition exists which allows an event to take place.

Pre requisites exist upon events taking place.

Components of a script
The components of a script include:

Entry condition: These are basic condition which must be fulfilled before events in the script can occur.

Results: Condition that will be true after events in script occurred.

Props: Slots representing objects involved in events

Roles: These are the actions that the individual participants perform.
Track: Variations on the script. Different tracks may share components of the same scripts.

Scenes: The sequence of events that occur.

Conceptual Dependency in
Artificial Intelligence
Conceptual Dependency:
In 1977, Roger C. Schank has developed a Conceptual Dependency structure.
The Conceptual Dependency is used to represent knowledge of Artificial Intelligence.
It should be powerful enough to represent these concepts in the sentence of natural
language. It states that different sentence which has the same meaning should have
some unique representation. There are 5 types of states in Conceptual Dependency:
1. Entities
2. Actions
3. Conceptual cases
4. Conceptual dependencies
5. Conceptual tense

Main Goals of Conceptual Dependency:


1. It captures the implicit concept of a sentence and makes it explicit.
2. It helps in drawing inferences from sentences.
3. For any two or more sentences that are identical in meaning. It should be only one
representation of meaning.
4. It provides a means of representation which are language independent.
5. It develops language conversion packages.

Rules of Conceptual Dependency:


Rule-1: It describes the relationship between an actor and the event he or she
causes.
Rule-2: It describes the relationship between PP and PA that are asserted to describe
it.
Rule-3: It describes the relationship between two PPs, one of which belongs to the
set defined by the other.
Rule-4: It describes the relationship between a PP and an attribute that has already
been predicated on it.
Rule-5: It describes the relationship between two PPs one of which provides a
particular kind of information about the other.
Rule-6: It describes the relationship between an ACT and the PP that is the object of
that ACT.
Rule-7: It describes the relationship between an ACT and the source and the
recipient of the ACT.
Rule-8: It describes the relationship between an ACT and the instrument with which
it is performed. This instrument must always be a full conceptualization, not just a
single physical object.
Rule-9: It describes the relationship between an ACT and its physical source and
destination.
Rule-10: It represents the relationship between a PP and a state in which it started
and another in which it ended.
Rule-11: It represents the relationship between one conceptualization and another
that causes it.
Rule-12: It represents the relationship between conceptualization and the time at
which the event occurred described.
Rule-13: It describes the relationship between one conceptualization and another,
that is the time of the first.
Rule-14: It describes the relationship between conceptualization and the place at

which it occurred. Languages used in Artificial


Intelligence
Artificial Intelligence has become an important part of human life as we are now highly
dependent on machines. Artificial Intelligence is a very important technology to
develop and build new computer programs and systems, which can be used to
simulate various intelligence processes like learning, reasoning, etc.

o Python o R

o Lisp o Java o C++ o Julia o Prolog


1.Python
o It is easy to learn than any other programming

language. o It is also a dynamically-typed language. o

Python is an Object-oriented language. o It

provides extensive community support and a

framework for ML and DL. o Open-source.

o Large standard sets of libraries. o Interpreted

language.

2.Java
Java has so many features which make Java best in industry and to develop artificial
intelligence applications:

o Portability o Cross-platform. o Easy to learn and

use. o Easyto-code Algorithms. o Built-in garbage

collector. o Swing and

Standard Widget Toolkit. o Simplified work with large-scale projects.

o Better user interaction. o Easy to debug.

3. Prolog
o Supports basic mechanisms such as o Pattern

Matching, o Tree-based data structuring, and o

Automatic backtracking. o Prolog is a declarative

language rather than imperative.

4. Lisp
o The program can be easily modified, similar to data. o

Make use of recursion for control structure rather than iteration.

o Garbage Collection is necessary. o We can easily execute data structures as programs. o

An object can be created dynamically.


5. R programming
o R is an open-source programming language, which is free of cost, and also you can add packages
for other functionalities.
o R provides strong & interactive graphics capability to users. o It enables you

to perform complex statistical calculations.

o It is widely used in machine learning and AI due to its high-performance capabilities.

6. Julia
o Common numeric data types. o Arbitrary precision values. o Robust mathematical

functions. o Tuples, dictionaries, and code introspection.

o Built-in package manager. o Dynamic type system.

o Ability to work for both parallel and distributed computing. o Macros and metaprogramming

capabilities. o Support for multiple dispatches. o

Support for C functions.

7. C++
o C++ is one of the fastest languages, and it can be used in statistical techniques. o It can
be used with ML algorithms for fast execution.

o Most of the libraries and packages available for Machine learning and AI are written in
C++. o It is a user friendly and simple language.

UNIT-IV
What is NLP?
NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence. It is the technology that is used by
machines to understand, analyse, manipulate, and interpret human's languages. It helps
developers to organize knowledge for performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech recognition, relationship
extraction, and topic segmentation.

History of NLP
(1940-1960) - Focused on Machine Translation (MT)

The Natural Languages Processing started in the year 1940s.

1948 - In the Year 1948, the first recognisable NLP application was introduced in Birkbeck College,
London.

1950s - In the Year 1950s, there was a conflicting view between linguistics and computer
science. Now, Chomsky developed his first book syntactic structures and claimed that
language is generative in nature.

In 1957, Chomsky also introduced the idea of Generative Grammar, which is rule based
descriptions of syntactic structures.

(1960-1980) - Flavored with Artificial Intelligence (AI)

Advantages of NLP
o NLP helps users to ask questions about any subject and get a direct response within seconds. o
NLP offers exact answers to the question means it does not offer unnecessary and unwanted
information.

o NLP helps computers to communicate with humans in their languages. o It is very time

efficient.

o Most of the companies use NLP to improve the efficiency of documentation processes, accuracy
of documentation, and identify the information from large databases.

Disadvantages of NLP
A list of disadvantages of NLP is given below:

o NLP may not show context.

o NLP is unpredictable o NLP may require more keystrokes.

o NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is built for
a single and specific task only.

Components of NLP
There are the following two components of NLP -

1. Natural Language Understanding (NLU)


Natural Language Understanding (NLU) helps the machine to understand and analyse
human language by extracting the metadata from content such as concepts, entities,
keywords, emotion, relations, and semantic roles.

NLU mainly used in Business applications to understand the customer's problem in both spoken
and written language.

NLU involves the following tasks -

o It is used to map the given input into useful representation. o

It is used to analyze different aspects of the language.

2. Natural Language Generation (NLG)


Natural Language Generation (NLG) acts as a translator that converts the computerized
data into natural language representation. It mainly involves Text planning, Sentence
planning, and Text Realization.
Note: The NLU is difficult than NLG.
Difference between NLU and NLG

NLU NLG

NLU is the process of reading and interpreting NLG is the process of writing or generating
language. language.

It produces non-linguistic outputs from natural It produces constructing natural language


language inputs. outputs from non-linguistic inputs.

Language Models in NLP


A language model is the core component of modern Natural Language
Processing (NLP). It’s a statistical tool that analyzes the pattern of human
language for the prediction of words.

NLP-based applications use language models for a variety of tasks, such as


audio to text conversion, speech recognition, sentiment analysis, summarization,
spell correction, etc.

How does Language Model Works?

Language Models determine the probability of the next word by analyzing the text in
data. These models interpret the data by feeding it through algorithms.

The algorithms are responsible for creating rules for the context in natural language. The
models are prepared for the prediction of words by learning the features and
characteristics of a language. With this learning, the model prepares itself for
understanding phrases and predicting the next words in sentences.
For training a language model, a number of probabilistic approaches are used. These
approaches vary on the basis of the purpose for which a language model is created. The
amount of text data to be analyzed and the math applied for analysis makes a difference
in the approach followed for creating and training a language model.

For example, a language model used for predicting the next word in a search query will be
absolutely different from those used in predicting the next word in a long document
(such as Google Docs). The approach followed to train the model would be unique in both
cases.

Types of Language Models:

There are primarily two types of language models:

1. Statistical Language Models

Statistical models include the development of probabilistic models that are able to
predict the next word in the sequence, given the words that precede it. A number of
statistical language models are in use already.
Let’s take a look at some of those popular models:
N-Gram: This is one of the simplest approaches to language modelling. Here, a probability
distribution for a sequence of ‘n’ is created, where ‘n’ can be any number and defines the
size of the gram (or sequence of words being assigned a probability). If n=4, a gram may
look like: “can you help me”. Basically, ‘n’ is the amount of context that the model is
trained to consider. There are different types of N-Gram models such as unigrams,
bigrams, trigrams, etc.

Unigram: The unigram is the simplest type of language model. It doesn't look at any
conditioning context in its calculations. It evaluates each word or term independently.
Unigram models commonly handle language processing tasks such as information
retrieval. The unigram is the foundation of a more specific model variant called the query
likelihood model, which uses information retrieval to examine a pool of documents and
match the most relevant one to a specific query.
Bidirectional: Unlike n-gram models, which analyze text in one direction (backwards),
bidirectional models analyze text in both directions, backwards and forwards. These
models can predict any word in a sentence or body of text by using every other word in
the text. Examining text bidirectionally increases result accuracy. This type is often
utilized in machine learning and speech generation applications. For example, Google
uses a bidirectional model to process search queries.

Exponential: This type of statistical model evaluates text by using an equation which is a
combination of n-grams and feature functions. Here the features and parameters of the
desired results are already specified. The model is based on the principle of entropy,
which states that probability distribution with the most entropy is the best choice.
Exponential models have fewer statistical assumptions which mean the chances of having
accurate results are more.

Continuous Space: In this type of statistical model, words are arranged as a non-linear
combination of weights in a neural network. The process of assigning weight to a word is
known as word embedding. This type of model proves helpful in scenarios where the data
set of words continues to become large and include unique words.

In cases where the data set is large and consists of rarely used or unique words, linear
models such as n-gram do not work. This is because, with increasing words, the possible
word sequences increase, and thus the patterns predicting the next word become
weaker.

2. Neural Language Models

These language models are based on neural networks and are often considered as an
advanced approach to execute NLP tasks. Neural language models overcome the
shortcomings of classical models such as n-gram and are used for complex tasks such as
speech recognition or machine translation.

Language is significantly complex and keeps on evolving. Therefore, the more complex
the language model is, the better it would be at performing NLP tasks. Compared to the
n-gram model, an exponential or continuous space model proves to be a better option for
NLP tasks because they are designed to handle ambiguity and language variation.

Meanwhile, language models should be able to manage dependencies. For example, a


model should be able to understand words derived from different languages.

Grammar
Grammar is defined as the rules for forming well-structured sentences.

While describing the syntactic structure of well-formed programs, Grammar plays a


very essential and important role. In simple words, Grammar denotes syntactical rules
that are used for conversation in natural languages.

The theory of formal languages is not only applicable here but is also applicable in the
fields of Computer Science mainly in programming languages and data structures.

Context-Free Grammar (CFG)


A context-free grammar, which is in short represented as CFG, is a notation used for
describing the languages and it is a superset of Regular grammar which you can see
from the following diagram:

CFG consists of a finite set of grammar rules having the following four components
• Set of Non-Terminals
• Set of Terminals
• Set of Productions
• Start Symbol

Set of Non-terminals
It is represented by V. The non-terminals are syntactic variables that denote the sets of
strings, which helps in defining the language that is generated with the help of
grammar.

Set of Terminals
It is also known as tokens and represented by Σ. Strings are formed with the help of the
basic symbols of terminals.

Set of Productions
It is represented by P. The set gives an idea about how the terminals and nonterminals
can be combined. Every production consists of the following components:

• Non-terminals,
• Arrow,
• Terminals (the sequence of terminals).

The left side of production is called non-terminals while the right side of production is
called terminals.

Start Symbol
The production begins from the start symbol. It is represented by symbol S.
Nonterminal symbols are always designated as start symbols.
Constituency Grammar (CG)
It is also known as Phrase structure grammar. It is called constituency Grammar as it is
based on the constituency relation. It is the opposite of dependency grammar.

Before deep dive into the discussion of CG, let’s see some fundamental points about
constituency grammar and constituency relation.

• All the related frameworks view the sentence structure in terms of constituency
relation.
• To derive the constituency relation, we take the help of subject-predicate
division of Latin as well as Greek grammar.
• Here we study the clause structure in terms of noun phrase NP and verb phrase
VP.

Dependency Grammar (DG)


It is opposite to the constituency grammar and is based on the dependency relation.
Dependency grammar (DG) is opposite to constituency grammar because it lacks
phrasal nodes.

Before deep dive into the discussion of DG, let’s see some fundamental points about
Dependency grammar and Dependency relation.

• In Dependency Grammar, the words are connected to each other by directed


links.
• The verb is considered the center of the clause structure.

• Every other syntactic unit is connected to the verb in terms of directed link.
These syntactic units are called dependencies.

Regular expression
o Regular expression is a sequence of pattern that defines a string. It is used to denote regular
languages.
o It is also used to match character combinations in strings. String searching algorithm used this
pattern to find the operations on string.

o In regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx, xxx,
xxxx,.....}

o In regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx,
xxxx,.....}

Operations on Regular Language


The various operations on regular language are:

Union: If L and M are two regular languages then their union L U M is also a union.

1. L U M = {s | s is in L or s is in M}

Intersection: If L and M are two regular languages then their intersection is also an
intersection.

1. L M = {st | s is in L and t is in M}

Kleene closure: If L is a regular language then its kleene closure L1* will also be a regular
language.

1. L* = Zero or more occurrence of language L.

Finite Automata
o Finite automata are used to recognize patterns.
o It takes the string of symbol as input and changes its state accordingly. When the desired symbol
is found, then the transition occurs.

o At the time of transition, the automata can either move to the next state or stay in the same state.
o Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.

Formal Definition of FA
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:

1. Q: finite set of states


2. ∑: finite set of the input symbol
3. q0: initial state
4. F: final state
5. δ: Transition function

Finite Automata Model:


Finite automata can be represented by input tape and finite control.

Input tape: It is a linear tape having some number of cells. Each input symbol is placed in
each cell.

Finite control: The finite control decides the next state on receiving particular input from
input tape. The tape reader reads the cells one by one from left to right, and at a time only
one input symbol is read.

Types of Automata:
There are two types of finite automata:

1. DFA(deterministic finite automata)

2. NFA(non-deterministic finite automata)

1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. In the DFA, the machine goes to one state only for a particular input
character. DFA does not accept the null move.

2. NFA

NFA stands for non-deterministic finite automata. It is used to transmit any number of states for
a particular input. It can accept the null move.

Some important points about DFA and NFA:

1. Every DFA is NFA, but NFA is not DFA.

2. There can be multiple final states in both NFA and DFA.

3. DFA is used in Lexical Analysis in Compiler.

4. NFA is more of a theoretical concept.

Morphology − It is a study of construction of words from primitive meaningful units.


Tokenization is the process of breaking a text string into an array of tokens. The users can
think of tokens as distinct parts like a word can be a token in the sentence, while the sentence is a
token within the form of a paragraph.

1.Sentence Tokenization is use for splitting the sentences in the paragraph

2.Word Tokenization is used for splitting the words in a sentence.

3.WordPunctTokenizer is used for separating the punctuation from the words.

Part-of-speech (POS) tagging


It is a process of converting a sentence to forms – list of words, list of tuples
(where each tuple is having a form (word, tag)). The tag in case of is a part-of-
speech tag, and signifies whether the word is a noun, adjective, verb, and so on.
Default tagging is a basic step for the part-of-speech tagging. It is performed
using the DefaultTagger class. The DefaultTagger class takes ‘tag’ as a single
argument. NN is the tag for a singular noun. DefaultTagger is most useful when it
gets to work with most common part-of-speech tag. that’s why a noun tag is
recommended.
Challenges in POS Tagging
Some common challenges in part-of-speech (POS) tagging include:

• Ambiguity: Some words can have multiple POS tags depending on the context in which
they appear, making it difficult to determine their correct tag. For example, the word
“bass” can be a noun (a type of fish) or an adjective (having a low frequency or pitch).
• Out-of-vocabulary (OOV) words: Words that are not present in the training data of a
POS tagger can be difficult to tag accurately, especially if they are rare or specific to a
particular domain.
• Complex grammatical structures: Languages with complex grammatical structures,
such as languages with many inflections or free word order, can be more challenging to
tag accurately.
• Lack of annotated training data: Some languages or domains may have limited
annotated training data, making it difficult to train a high-performing POS tagger.
• Inconsistencies in annotated data: Annotated data can sometimes contain errors or
inconsistencies, which can negatively impact the performance of a POS tagger.

What is Semantics?
Semantics is simply the branch of linguistics that concerns studying the meanings of words as well as their
meanings within a sentence. Thus, it is the study of linguistic meaning, or more precisely, the study of the relation
between linguistic expressions and their meaning. Therefore, it considers the meaning of a sentence without
paying attention to their context.

To explain further what semantics means in linguistics, it can be denoted that “it is the study of the interpretation
of signs or symbols used in agents or communities within particular circumstances and contexts”. Hence,
according to this, sounds, facial expressions, body language, and proxemics have semantic (meaningful) content,
and each of these comprises several branches of study. Moreover, in written language, things like paragraph
structure and punctuation bear semantic content; other forms of language bear other semantic content.

Thus, semantics focuses on three basic aspects: “the relations of words to the objects denoted by them, the
relations of words to the interpreters of them, and, in symbolic logic, the formal relations of signs to one another
(syntax)”. Therefore, semantics also looks at the ways in which the meanings of words can be related to each
other.

What is Pragmatics?
Pragmatics is another branch of linguistics. Similar to semantics, pragmatics also studies the meanings of words,
but it pays emphasis on their context. In other words, pragmatics is “the study of the use of linguistic signs,
words, and sentences, in actual situations.”
Thus, it looks beyond the literal meaning of an utterance or a sentence, considering how the context impacts its
meaning to be constructed as well the implied meanings.

Therefore, unlike semantics, pragmatics concern the context of that particular words and how that context
impacts their meaning.

Technically, semantic analysis involves:


1. Data processing.
2. Defining features, parameters, and characteristics of processed data
3. Data representation
4. Defining grammar for data analysis
5. Assessing semantic layers of processed data
6. Performing semantic analysis based on the linguistic formalism.
Syntactic Analysis
Syntactic analysis involves analyzing the grammatical syntax of a sentence to understand its meaning.
For example, consider the following sentence: “The cow jumped over the moon“
Using Syntactic analysis, a computer would be able to understand the parts of speech of the different words in
the sentence. Based on the understanding, it can then try and estimate the meaning of the sentence. In the
case of the above example (however ridiculous it might be in real life), there is no conflict about the
interpretation. Thus, the syntactic analysis does the job just fine.
However, human language is nuanced and not always, is a sentence as simple as the one described above.
Consider this: “Does this all sound like a joke to you?“
A human would easily understand the irateness locked in the sentence. However, a syntactic analysis may just
be too naive for it. That leads us to the need for something better and more sophisticated, i.e., Semantic
Analysis.

UNIT-V
What is an Expert System?
An expert system is a computer program that is designed to solve complex problems and
to provide decision-making ability like a human expert. It performs this by extracting
knowledge from its knowledge base using the reasoning and inference rules according to
the user queries.

The expert system is a part of AI, and the first ES was developed in the year 1970, which
was the first successful approach of artificial intelligence. It solves the most complex issue
as an expert by extracting the knowledge stored in its knowledge base. The system helps
in decision making for compsex problems using both facts and heuristics like a human
expert. It is called so because it contains the expert knowledge of a specific domain and
can solve any complex problem of that particular domain. These systems are designed for
a specific domain, such as medicine, science, etc.

The performance of an expert system is based on the expert's knowledge stored in its
knowledge base. The more knowledge stored in the KB, the more that system improves
its performance. One of the common examples of an ES is a suggestion of spelling errors
while typing in the Google search box.

Characteristics of Expert System


o High Performance: The expert system provides high performance for solving any type of complex
problem of a specific domain with high efficiency and accuracy.
o Understandable: It responds in a way that can be easily understandable by the user. It can take input in
human language and provides the output in the same way.
o Reliable: It is much reliable for generating an efficient and accurate output.
o Highly responsive: ES provides the result for any complex query within a very short period of time.

Advantages of Expert System


o These systems are highly reproducible.
o They can be used for risky places where the human presence is not safe.
o Error possibilities are less if the KB contains correct knowledge.
o The performance of these systems remains steady as it is not affected by emotions, tension, or fatigue.
o They provide a very high speed to respond to a particular query.

Limitations of Expert System


o The response of the expert system may get wrong if the knowledge base contains the wrong information.
o Like a human being, it cannot produce a creative output for different scenarios.
o Its maintenance and development costs are very high.
o Knowledge acquisition for designing is much difficult.
o For each domain, we require a specific ES, which is one of the big limitations.
o It cannot learn from itself and hence requires manual updates.

Architecture of an Expert System


o Typical expert system architecture is shown in Figure.
o The knowledge base contains the specific domain knowledge that is used by an expert to derive
conclusions from facts.
o In the case of a rule-based expert system, this domain knowledge is expressed in the form of a
series of rules.
o The explanation system provides information to the user about how the inference engine arrived
at its conclusions. This can often be essential, particularly if the advice being given is of a critical
nature, such as with a medical diagnosis system.
o If the system has used faulty reasoning to arrive at its conclusions, then the user may be able to see
this by examining the data given by the explanation system.
o The fact database contains the case-specific data that are to be used in a particular case to derive a
conclusion.
o In the case of a medical expert system, this would contain information that had been obtained about
the patient’s condition.
o The user of the expert system interfaces with it through a user interface, which provides access to
the inference engine, the explanation system, and the knowledge-base editor.

Expert System Shells


o An Expert system shell is a software development environment. It contains
the basic components of expert systems. A shell is associated with a
prescribed method for building applications by configuring and instantiating
these components.
Shell components and description
o The generic components of a shell : the knowledge acquisition, the knowledge
Base, the reasoning, the explanation and the user interface are shown below.
The knowledge base and reasoning engine are the core components.
o All these components are described in the next slide.
■ Knowledge Base
o A store of factual and heuristic knowledge. Expert system tool provides one or
more knowledge representation schemes for expressing knowledge about the
application domain. Some tools use both Frames (objects) and IF-THEN rules.
In PROLOG the knowledge is represented as logical statements.
■ Reasoning Engine
o Inference mechanisms for manipulating the symbolic information and
knowledge in the knowledge base form a line of reasoning in solving a
problem. The inference mechanism can range from simple modus ponens
backward chaining of IF-THEN rules to Case-Based reasoning.
Knowledge Acquisition subsystem
o A subsystem to help experts in build knowledge bases. However, collecting
knowledge, needed to solve problems and build the knowledge base, is the
biggest bottleneck in building expert systems.
Explanation subsystem
o A subsystem that explains the system's actions. The explanation can range
from how the final or intermediate solutions were arrived at justifying the need
for additional data.
User Interface
o A means of communication with the user. The user interface is generally not
a part of the expert system technology. It was not given much attention in
the past. However, the user interface can make a critical difference in the pe
eived utility of an Expert system.

Knowledge Acquisition Concept in Artificial Intelligence

• Knowledge Acquisition is the transformation of knowledge from


the forms in which it exists into forms that can be used in a knowledge
based system
• The primary goal discover, develop and implement efficient,
effective method of knowledge acquisition
• It is a process of adding new knowledge to a knowledge base
refining, improving previously acquire knowledge
• Acquisition is usually associated with some definite purpose such
as expanding the capabilities of system, improving or enhancing the
performance of some specific tasks
• Acquired knowledge consist of facts, rules, concept, procedure,
formulas, relationship, stats, plans,heuristic or any relevant
information
• The sources can be anyone of the following like-that , report,
electronic document,database,newspaper,news channel,soft copy of
document etc
Three model of knowledge acquisition -
1. Handcrafting - means code knowledge is converted into
program directly
2. Knowledge Engineering - means working with expert system
to organise knowledge in a suitable form for an expert system to
use.
3. Machine Learning - means to extract the knowledge from
training examples
Challenges/ Issues in knowledge Acquisition
1. most knowledge is in the heads of experts
2. experts have vast amount of knowledge
3. experts are very busy and valuable people
4. each expert doesn't know everything
5. knowledge is short lived
6. difference in expert opinions
Concepts of Learning
Learning is the process of converting experience into expertise or knowledge.
Learning can be broadly classified into three categories, as mentioned below, based on the
nature of the learning data and interaction between the learner and the environment.

• Supervised Learning
• Unsupervised Learning
• Semi-supervised Learning
Similarly, there are four categories of machine learning algorithms as shown below −

• Supervised learning algorithm


• Unsupervised learning algorithm
• Semi-supervised learning algorithm
• Reinforcement learning algorithm

Three Types of Learning That Artificial Intelligence (AI) Does


Three general categories of learning that artificial intelligence (AI)/machine learning utilizes

to actually learn. They are Supervised Learning, Unsupervised Learning and Reinforcement learning.

• Supervised Learning: The machine has a “teacher” who guides it by providing sample inputs along
with the desired output. The machine then maps the inputs and the outputs. This is similar to how we
teach very young children with picture books. According to Yann LeCun, all of the AI machines we have
today have used this form of learning (from speech recognition to self-driving cars).

• Reinforcement Learning: Yann LeCun believes this plays a relatively minor role in training AI and is
similar to training an animal. When the animal displays a desired behavior it is given a reward.
According to the Wikipedia entry on Machine Learning, reinforcement learning is defined as “a
computer program interacts with a dynamic environment in which it must perform a certain goal (such
as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. “

• Unsupervised Learning: This is the most important and most difficult type of learning and would be
better titled Predictive Learning. In this case the machine is not given any labels for its inputs and needs
to “figure out” the structure on its own. This is similar to how babies learn early in life. For example
they learn that if an object in space is not supported it will fall

However, the most commonly used ones are supervised and unsupervised learning.

You might also like