Natural Language Processing
Natural Language Processing
Natural Language Processing
1.1 INTRODUCTION
Natural languages are those languages that are spoken by the people. Natural language
processing girdles everything a computer needs to understand natural language and also
generates natural language. Natural Language Processing is a subfield of Artificial Intelligence
and linguistic, devoted to make computers understand the statements or words written in human
languages. A Natural language also known as ordinary language that is spoken or written by
people (humans) for general purpose communication.
Natural language came into existence because when user wishes to communicate with the
computer we cant force the users to learn machine specific language so this basically caters to
managers or childrens who do not have enough time to learn new specific languages or get
skilled in them. Languages can be any like Hindi, French, English, Chinese etc. A language is a
system, a set of rules or set of symbols.
1)Symbols are combined and used for conveying information or broadcasting the information.
Natural Language Processing (NLP) is the computerized approach to analyzing text that is based
on both a set of theories and a set of technologies. And, being a very active area of research and
development, there is not a single agreed-upon definition that would satisfy everyone, but there
are some aspects, which would be part of any knowledgeable person’s definition.
NLP NLP
COMPUTER
INPUT OUTPUT
The notion of ‘levels of linguistic analysis’ (to be further exp refers to the fact that there are
multiple types of language processing known to be at work when humans produce or
comprehend language. It is thought that humans normally utilize all of these levels since each
level conveys different types of meaning.
But various NLP systems utilize different levels, or combinations of levels of linguistic analysis,
and this is seen in the differences amongst various NLP applications. This also leads to much
confusion on the part of non-specialists as to what NLP really is, because a system that uses any
subset of these levels of analysis can be said to be an NLP-based system. The difference between
them, therefore, may actually be whether the system uses ‘weak’ NLP or ‘strong’ NLP.
‘Human-like language processing’ reveals that NLP is considered a discipline within Artificial
Intelligence (AI). And while the full lineage of NLP does depend on a number of other
disciplines, since NLP strives for human-like performance, it is appropriate to consider it an AI
discipline.
‘For a range of tasks or applications’ points out that NLP is not usually considered a goal in and
of itself, except perhaps for AI researchers. For others, NLP is the means for
Therefore, you have Information Retrieval (IR) systems that utilize NLP, as well as Machine
Translation (MT), Question-Answering, etc.
The work done in this phase focused mainly on machine translation (MT). This phase was a
period of enthusiasm and optimism.
Let us now see all that the first phase had in it:
1)The research on NLP started in early 1950s after Booth & Richens’ investigation and Weaver’s
memorandum on machine translation in 1949.
2)1954 was the year when a limited experiment on automatic translation from Russian to English
demonstrated in the Georgetown-IBM experiment.
3)In the same year, the publication of the journal MT (Machine Translation) started.
4) The first international conference on Machine Translation (MT) was held in 1952 and second
was held in 1956.
In this phase, the work done was majorly related to world knowledge and on its role in the
construction and manipulation of meaning representations. That is why, this phase is also called
AI-flavored phase.
1) In early 1961, the work began on the problems of addressing and constructing data or
knowledge base. This work was influenced by AI.
2) In the same year, a BASEBALL question-answering system was also developed. The input to
this system was restricted and the language processing involved was a simple one.
3) A much advanced system was described in Minsky (1968). This system, when compared to
the BASEBALL question-answering system, was recognized and provided for the need of
inference on the knowledge base in interpreting and responding to language input
This phase can be described as the grammatico-logical phase. Due to the failure of practical
system building in last phase, the researchers moved towards the use of logic for knowledge
representation and reasoning in AI.
The grammatico-logical approach, towards the end of decade, helped us with powerful general-
purpose sentence processors like SRI’s Core Language Engine and Discourse Representation
Theory, which offered a means of tackling more extended discourse.
1) In this phase we got some practical resources & tools like parsers, e.g. Alvey Natural
Language Tools along with more operational and commercial systems, e.g. for database query.
2) The work on lexicon in 1980s also pointed in the direction of grammatico-logical approach.
Fourth Phase (Lexical & Corpus Phase) – The 1990s
We can describe this as a lexical & corpus phase. The phase had a lexicalized approach to
grammar that appeared in late 1980s and became an increasing influence. There was a revolution
in natural language processing in this decade with the introduction of machine learning
algorithms for language processing
1.3 Goal
The goal of NLP as stated above is “to accomplish human-like language processing”. The
choice of the word ‘processing’ is very deliberate, and should not be replaced with
‘understanding’. For although the field of NLP was originally referred to as Natural Language
Understanding (NLU) in the early days of AI, it is well agreed today that while the goal of NLP
is true NLU, that goal has not yet been accomplished. A full NLU System would be able to:
While NLP has made serious inroads into accomplishing goals 1 to 3, the fact that NLP systems
cannot, of themselves, draw inferences from text, NLU still remains the goal of NLP. There are
more practical goals for NLP, many related to the particular application for which it is being
utilized.
For example, an NLP-based IR system has the goal of providing more precise, complete
information in response to a user’s real information need.
The goal of the NLP system here is to represent the true meaning and intent of the user’s query,
which can be expressed as naturally in everyday language as if they were speaking to a reference
librarian. Also, the contents of the documents that are being searched will be represented at all
their levels of meaning so that a true match between need and response can be found, no matter
how either are expressed in their surface form.
1.4 Origins
As most modern disciplines, the lineage of NLP is indeed mixed, and still today has strong
emphases by different groups whose backgrounds are more influenced by one or another of the
disciplines. Key among the contributors to the discipline and practice of NLP are: Linguistics -
focuses on formal, structural models of language and the discovery of language universals - in
fact the field of NLP was originally referred to as Computational Linguistics; Computer Science
- is concerned with developing internal representations of data and efficient processing of these
structures, and; Cognitive Psychology - looks at language usage as a window into human
cognitive processes, and has the goal of modeling the use of language in a psychologically
plausible way.
1.5 Divisions
While the entire field is referred to as Natural Language Processing, there are in fact two distinct
focuses – language processing and language generation. The first of these refers to the analysis
of language for the purpose of producing a meaningful representation, while the latter refers to
the production of language from a representation.
The task of Natural Language Processing is equivalent to the role of reader/listener, while the
task of Natural Language Generation is that of the writer/speaker. While much of the theory and
technology are shared by these two divisions, Natural Language Generation also requires a
planning capability. That is, the generation system requires a plan or model of the goal of the
interaction in order to decide what the system should generate at each point in an interaction.