Natural Language Processing

INTRODUCTION chapter 1
1.1 INTRODUCTION
Natural languages are those languages that are spoken by the people. Natural language
processing girdles everything a computer needs to understand natural language and also
generates natural language. Natural Language Processing is a subfield of Artificial Intelligence
and linguistic, devoted to make computers understand the statements or words written in human
languages. A Natural language also known as ordinary language that is spoken or written by
people (humans) for general purpose communication.
Natural language came into existence because when user wishes to communicate with the
computer we cant force the users to learn machine specific language so this basically caters to
managers or childrens who do not have enough time to learn new specific languages or get
skilled in them. Languages can be any like Hindi, French, English, Chinese etc. A language is a
system, a set of rules or set of symbols.
1)Symbols are combined and used for conveying information or broadcasting the information.
2) Rules tyrannize handling of symbols.NLP Besets anything a computer or machine needs

to understand typed or spoken (natural languge).
Natural Language Processing (NLP) is the computerized approach to analyzing text that is based
on both a set of theories and a set of technologies. And, being a very active area of research and
development, there is not a single agreed-upon definition that would satisfy everyone, but there
are some aspects, which would be part of any knowledgeable person’s definition.
Definition: Natural Language Processing is a theoretically motivated range of computational

techniques for analyzing and representing naturally occurring texts at one or more levels of
linguistic analysis for the purpose of achieving human-like language processing for a range of
tasks or applications.
NLP NLP
COMPUTER
INPUT OUTPUT
Fig 1.1 Natural Language Processing

Several elements of this definition can be further detailed. Firstly the imprecise notion of ‘range
of computational techniques’ is necessary because there are multiple methods or techniques from
which to choose to accomplish a particular type of language analysis. ‘Naturally occurring texts’
can be of any language, mode, genre, etc. The texts can be oral or written. The only requirement
is that they be in a language used by humans to communicate to one another. Also, the text being
analyzed should not be specifically constructed for the purpose of the analysis, but rather that the
text be gathered from actual usage.
The notion of ‘levels of linguistic analysis’ (to be further exp refers to the fact that there are
multiple types of language processing known to be at work when humans produce or
comprehend language. It is thought that humans normally utilize all of these levels since each
level conveys different types of meaning.
But various NLP systems utilize different levels, or combinations of levels of linguistic analysis,
and this is seen in the differences amongst various NLP applications. This also leads to much
confusion on the part of non-specialists as to what NLP really is, because a system that uses any
subset of these levels of analysis can be said to be an NLP-based system. The difference between
them, therefore, may actually be whether the system uses ‘weak’ NLP or ‘strong’ NLP.
‘Human-like language processing’ reveals that NLP is considered a discipline within Artificial
Intelligence (AI). And while the full lineage of NLP does depend on a number of other
disciplines, since NLP strives for human-like performance, it is appropriate to consider it an AI
discipline.
‘For a range of tasks or applications’ points out that NLP is not usually considered a goal in and
of itself, except perhaps for AI researchers. For others, NLP is the means for
Therefore, you have Information Retrieval (IR) systems that utilize NLP, as well as Machine
Translation (MT), Question-Answering, etc.
1.2 HISTORY OF NLP

We have divided the history of NLP into four phases. The phases have distinctive concerns and
styles.
First Phase (Machine Translation Phase) – Late 1940s to late 1960s
The work done in this phase focused mainly on machine translation (MT). This phase was a
period of enthusiasm and optimism.
Let us now see all that the first phase had in it:
1)The research on NLP started in early 1950s after Booth & Richens’ investigation and Weaver’s
memorandum on machine translation in 1949.
2)1954 was the year when a limited experiment on automatic translation from Russian to English
demonstrated in the Georgetown-IBM experiment.
3)In the same year, the publication of the journal MT (Machine Translation) started.
4) The first international conference on Machine Translation (MT) was held in 1952 and second
was held in 1956.
5) In 1961, the work presented in Teddington International Conference on Machine Translation

of Languages and Applied Language analysis was the high point of this phase.
Second Phase (AI Influenced Phase) – Late 1960s to late 1970s
In this phase, the work done was majorly related to world knowledge and on its role in the
construction and manipulation of meaning representations. That is why, this phase is also called
AI-flavored phase.
The phase had in it, the following:
1) In early 1961, the work began on the problems of addressing and constructing data or
knowledge base. This work was influenced by AI.
2) In the same year, a BASEBALL question-answering system was also developed. The input to
this system was restricted and the language processing involved was a simple one.
3) A much advanced system was described in Minsky (1968). This system, when compared to
the BASEBALL question-answering system, was recognized and provided for the need of
inference on the knowledge base in interpreting and responding to language input
Third Phase (Grammatico-logical Phase) – Late 1970s to late 1980s
This phase can be described as the grammatico-logical phase. Due to the failure of practical
system building in last phase, the researchers moved towards the use of logic for knowledge
representation and reasoning in AI.
The third phase had the following in it:
The grammatico-logical approach, towards the end of decade, helped us with powerful general-
purpose sentence processors like SRI’s Core Language Engine and Discourse Representation
Theory, which offered a means of tackling more extended discourse.
1) In this phase we got some practical resources & tools like parsers, e.g. Alvey Natural
Language Tools along with more operational and commercial systems, e.g. for database query.
2) The work on lexicon in 1980s also pointed in the direction of grammatico-logical approach.
Fourth Phase (Lexical & Corpus Phase) – The 1990s
We can describe this as a lexical & corpus phase. The phase had a lexicalized approach to
grammar that appeared in late 1980s and became an increasing influence. There was a revolution
in natural language processing in this decade with the introduction of machine learning
algorithms for language processing
1.3 Goal
The goal of NLP as stated above is “to accomplish human-like language processing”. The
choice of the word ‘processing’ is very deliberate, and should not be replaced with
‘understanding’. For although the field of NLP was originally referred to as Natural Language
Understanding (NLU) in the early days of AI, it is well agreed today that while the goal of NLP
is true NLU, that goal has not yet been accomplished. A full NLU System would be able to:
1. Paraphrase an input text
2. Translate the text into another language
3. Answer questions about the contents of the text
4. Draw inferences from the text
While NLP has made serious inroads into accomplishing goals 1 to 3, the fact that NLP systems
cannot, of themselves, draw inferences from text, NLU still remains the goal of NLP. There are
more practical goals for NLP, many related to the particular application for which it is being
utilized.
For example, an NLP-based IR system has the goal of providing more precise, complete
information in response to a user’s real information need.
The goal of the NLP system here is to represent the true meaning and intent of the user’s query,
which can be expressed as naturally in everyday language as if they were speaking to a reference
librarian. Also, the contents of the documents that are being searched will be represented at all
their levels of meaning so that a true match between need and response can be found, no matter
how either are expressed in their surface form.
1.4 Origins
As most modern disciplines, the lineage of NLP is indeed mixed, and still today has strong
emphases by different groups whose backgrounds are more influenced by one or another of the
disciplines. Key among the contributors to the discipline and practice of NLP are: Linguistics -
focuses on formal, structural models of language and the discovery of language universals - in
fact the field of NLP was originally referred to as Computational Linguistics; Computer Science
- is concerned with developing internal representations of data and efficient processing of these
structures, and; Cognitive Psychology - looks at language usage as a window into human
cognitive processes, and has the goal of modeling the use of language in a psychologically
plausible way.
1.5 Divisions
While the entire field is referred to as Natural Language Processing, there are in fact two distinct
focuses – language processing and language generation. The first of these refers to the analysis
of language for the purpose of producing a meaningful representation, while the latter refers to
the production of language from a representation.
The task of Natural Language Processing is equivalent to the role of reader/listener, while the
task of Natural Language Generation is that of the writer/speaker. While much of the theory and
technology are shared by these two divisions, Natural Language Generation also requires a
planning capability. That is, the generation system requires a plan or model of the goal of the
interaction in order to decide what the system should generate at each point in an interaction.
Another distinction is traditionally made between language understanding and speech

understanding. Speech understanding starts with, and speech generation ends with, oral language
and therefore rely on the additional fields of acoustics and phonology

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Copyright:

Available Formats

What are the different levels of linguistic analysis in NLP?

What are the origins of NLP?

INTRODUCTION chapter 1

2) Rules tyrannize handling of symbols.NLP Besets anything a computer or machine needs

Definition: Natural Language Processing is a theoretically motivated range of computational

Fig 1.1 Natural Language Processing

1.2 HISTORY OF NLP

First Phase (Machine Translation Phase) – Late 1940s to late 1960s

5) In 1961, the work presented in Teddington International Conference on Machine Translation

Second Phase (AI Influenced Phase) – Late 1960s to late 1970s

The phase had in it, the following:

Third Phase (Grammatico-logical Phase) – Late 1970s to late 1980s

The third phase had the following in it:

1. Paraphrase an input text

2. Translate the text into another language

3. Answer questions about the contents of the text

4. Draw inferences from the text

Another distinction is traditionally made between language understanding and speech

You might also like