Jones 1994
Jones 1994
Jones 1994
Abstract
This paper reviews natural language processing (NLP) from the late 1940's to
the present, seeking to identify its successive trends as these reflect concerns with
different problems or the pursuit of different approaches to solving these problems
and building systems as wholes. The review distinguishes four phases in the history
of NLP, characterised respectively by an emphasis on machine translation, by the
influence of artificial intelligence, by the adoption of a logico-gramrnatical style,
and by an attack on massive language data. The account considers the significant
and salient work in each phase, and concludes with an assessment of where we
stand after more than forty years of effort in the field.
1 Introduction
At the ACL Conference in 1987 Don Walker, Jane Robinson and I were talking about
when we began in NLP research. Fred Thompson told us he began in 1954 and others,
like Martin Kay, started out too in the fifties. Work in the field has concentrated first
on one problem, then on another, sometimes because solving problem X depends on
solving problem Y but sometimes just because problem Y seems more tractable than
problem X. It is nice to believe that research in NLP, like scientific research in general,
advances in a consolidating way, and though there may be more faith than substance in
this, we can certainly do NLP now we could not do in the fifties. We may indeed be
seduced by the march of computing technology into thinking we have made intellectual
advances in understanding how to do NLP, though better technology has also simply
eliminated some difficulties we sweated over in earlier years. But more importantly,
better technology means that when we return to long-standing problems they are not
always so daunting as before.
Those, like Don, who had been aroundfor a long time, can see old ideas reappearing
in new guises, like lexicalist approaches to NLP, and MT in particular. But the new
costumes are better made, of better materials, as well as more becoming: so research
is not so much going round in circles as ascending a spiral, if only a rather flat one.
In reviewing the history of NLP, I see four phases, each with their distinctive concerns
and styles. Don, in one way or another and like all of us, to some extent moved in
•The material in the earlier part of this paper is taken from my article "Natural language processing:
an overview", in W. Bright (ed.) International encyclopedia of linguistics, New York: Oxford University
Press, 1992, Vol. 3, 53-59.
A. Zampolli et al. (eds.), Current Issues in Computational Linguistics: In Honour of Don Walker
© Springer Science+Business Media Dordrecht 1994
time to the current beat. But it is noteworthy that in the push he made for linguistic
resources, like corpus collections, he not only significantly promoted what I have called
the data-bashing decade that is now with us, but also returned to what was a major
concern in the first period of NLP research: building the powerful and comprehensive
dictionaries that serious NLP applications, like MT, need.
I define the first phase of work in NLP as lasting from the late 1940s to the late
1960s, the second from the late 60s to the late 70s and the third to the late 80s, while we
are in a clear fourth phase now.
4
increased after Sputnik 1, but the work had begun before. Russian and English were
the dominant languages, but others, including Chinese, were involved (Booth, 1967;
Hutchins, 1986).
Though the period ended under the cloud of the 1966 ALPAC Report, (ALPAC, 1966;
Hutchins, 1986), most of those engaged were neither crooks nor bozos. Many came to
NLP research with a background and established status in linguistic and language study,
and were motivated by the belief that something practically useful could be achieved,
even though the strategies adopted were crude and the results not of high quality. The
first major question was whether even to obtain only limited results, principled methods
based on generalisation were required, or whether ad hoc particularisation would suffice.
The second issue was the relative emphasis to be placed, in either case, on syntax and
on semantics. The third problem was the actual value of the results, especially when
balanced against pre- or post-editing requirements.
The main line of work during this period can be summarised as starting with trans-
lation as lookup, in dictionary-based word-for-word processing. The need to resolve
syntactic and semantic ambiguity, and the former in particular because it is not open to
fudging through the use of broad output equivalents, led to ambiguity resolution strate-
gies based on local context, so dictionary entries became in effect individual procedures.
Semantic resolution involved both specific word, and semantic category, collocation.
But long-distance dependencies, the lack of a transparent word order in languages like
German, and also the need for a whole-sentence structure characterisation to obtain prop-
erly ordered output, as well as a perceived value in generalisation, Jed to the development
of autonomous sentence grammars and parsers.
Most of the NLP research done in this period was focused on syntax, partly because
syntactic processing was manifestly necessary, and partly through implicit or explicit
endorsement of the idea of syntax-driven processing. The really new experience in
this work, and its contribution to linguistics in general, came from recognising the
implications of computing represented by the need not only for an explicit, precise, and
complete characterisation of language, but for a well-founded or formal characterisation
and, even more importantly, the need for algorithms to apply this description. Plath's
account (1967) of NLP research at Harvard shows this development of computational
grammar with its lexicon and parsing strategy very clearly. But as Plath also makes
clear, those concentrating on syntax did not suppose that this was all there was to it: the
semantic problems and needs of NLP were only too obvious to those aiming, as many
MT workers were, at the translation of unrestricted real texts like scientific papers. The
strategy was rather to tackle syntax first, if only because semantic ambiguity resolution
might be finessed by using words with broad meanings as output because these could be
given the necessary more specific interpretations in context.
There were however some workers who concentrated on semantics because they saw
it as the really challenging problem, or assumed semantically-driven processing. Thus
Masterman's and Ceccato's groups, for example, exploited semantic pattern matching
using semantic categories and semantic case frames, and indeed in Ceccato's work
(1967) the use of world knowledge to extend linguistic semantics, along with semantic
networks as a device for knowledge representation.
MT research was almost killed by the 1966 ALPAC Report, which concluded that
MT was nowhere near achievement and led to funding cuts especially in the most active
5
country, the USA, even though it recommended support for computational linguistics.
But it is important to recognise what these first NLP workers did achieve. They recog-
nised, and attempted to meet, the requirements of computational language processing,
particularly in relation to syntactic analysis, and indeed successfully parsed and charac-
terised sentences. They investigated many aspects of language, like polysemy, and of
processing, including generation. They addressed the issues of overall system architec-
tures and processing strategies, for example in direct, interlingual or transfer translation.
They began to develop formalisms and tools, and some influential ideas first appeared,
like the use of logic for representation (cf. Yngve, 1967). Some groups were also es-
tablished, developing resources like grammars and gaining experience, as at the Rand
Corporation. There was indeed enough knowhow by now for some textbooks, like Hays
(1967).
There was little work, on the other hand, on some important problems that have since
attracted attention, like anaphor resolution, since though text was being translated it was
treated as a sequence of independent sentences, or on the function of language, since
the work was mainly on single-source discourse. There was little attempt to incorporate
world knowledge, and to relate this non-linguistic knowledge to linguistic knowledge,
though some world knowledge was smuggled in under the heading of semantics. The
belief, or challenge, was that one could get far enough with essentially linguistic, and
therefore shallow, processing not involving reasoning on world models. The research
of this period did not produce any systems of scope or quality, though by the end of
the 1960s there were MT production systems providing output of use to their customers
(Hutchins, 1986). There was more merit in the work of the period, and more continuity,
through individuals, with later effort, than subsequent myths allow, though the early
literature was inaccessible and little used. But perhaps the best comment is Bledsoe's
at the International Joint Conference on Artificial Intelligence of 1985 (Bledsoe, 1986)
on the value, for artificial intelligence as a whole, of the early MT workers' head-on
attempt to do something really hard.
Work on the use of computers for literary and linguistic study also began in this
period, but it has never been closely linked with that in NLP, though some common
concerns have become more prominent recently.
6
processing capabilities. Though differing in many ways they shared a procedural style
and were perceived as having an overall coherence as systems and a genuinely compu-
tational character. The dominant linguistic theory of the late 1960s, transformational
grammar, was seen both as fundamentally unsuited to computation and particularly
analysis, even though TG was formally oriented and there was at least one serious
transformational parser, and as offering nothing on semantics, which had to be tackled
for any actual NLP system. The computational confidence illustrated by Woods' and
Winograd's work, and the range of experiment it promoted, while drawing on previous
work, is well shown by the varied research reported in Rustin (1973).
The view that current linguistics had nothing to contribute, and the feeling that AI was
liberating, were also apparent in Schank's work ( 1980), which explicitly emphasised se-
mantics in the form of general-purpose semantics with case structures for representation
and semantically-driven processing. The community's concern, illustrated by Wino-
grad and Schank alike, with meaning representation and the use of world knowledge
then became an argument, reflecting a widespread feeling in AI stimulated by Minsky's
promulgation of frames (Minsky, 1975), for the use of a larger scale organisation of
knowledge than that represented in NLP by verb case frames or propositional units:
this large-scale organisation would characterise the different relationships between the
elements of a whole universe of discourse, and would support the inferences, including
default inferences, needed especially in interpreting longer discourse and dialogue. NLP
would deliver deep representations integrating and filling out individual inputs to form
a whole constituting an instantiation of a generic world model. Schank's arguments for
the Yale group's use of more event-oriented scripts developed this line in the context of
earlier work by linking individual propositional case frames with the larger structures
via their semantic primitives (cf. Cullingford, 1981). Semantic networks (Bobrow and
Collins, 1975; Findler, 1979) were similarly proposed as a third variant on this theme,
offering a range of options from associative lexical networks only weakly and implicitly
embodying world knowledge to alternative notations for frames. These types of knowl-
edge representation linked NLP with mainstream AI, and their descriptive and functional
status, for example in relation to logic, was and has remained a matter for debate.
Semantic primitives seen, as in Schank's Conceptual Dependency Nets (Schank,
1975), as having a representational and not just a selective role also appeared to fit
naturally with the need to capture underlying conceptual relations and identities in
discourse processing, particularly for types of material or tasks where fine distinctions
do not figure. Their status too was a matter for controversy, but they have continued
in use, supplemented by or sometimes in the form of domain-specific categories, in
application systems. They have also had a significant role, in the more conventional
form of selectional restrictions, even when semantic driving has been abandoned.
The general confidence of those working in the field, and the widespread belief
that progress could be and was being made, was apparent on the one hand in the
ARPA Speech Understanding Research (SUR) project (Lea, 1980) and on the other in
some major system development projects building database front ends. Several of the
SUR projects were ambitious attempts to build genuinely integrated systems combining
top-down with bottom-up processing, though unfortunately the best performing system
against the target measurements was the least theoretically interesting.
The front end projects (see, e.g., Hendrix et al., 1978) were intended to go signifi-
7
cantly beyond LUNAR in interfacing to large autonomous(and therefore not controlled)
databases, and in being more robust under the pressures of 'ill-formed' input; and
the confidence on which they were based drove other work including that on the first
significant commercial front end, INTELLECf (Harris, 1984). But these projects unfor-
tunately also showed that even an apparently straightforward, and perhaps the simplest
because naturally constrained, NLP task was far more difficult than it seemed to be.
NLP workers have been struggling ever since on the one hand with the problems of con-
structing general-purpose transportable front ends and of providing for the acquisition
of application-specific knowledge, and on the other of handling the user's real needs in
dialogue. The former led to the development of modular architectures, general-purpose
formalisms, and toolkits, typically for supplying a specialised lexicon, semantics, and
domain and database model on top of standard syntax, following the sublanguage ap-
proach which had been pioneered for text processing by Sager's NYU group (in Kittredge
and Lehrberger, 1982), but sometimes supplying a specialised syntax as well. The latter
stimulated research on the identification of the user's beliefs, goals and plans which is
also and more fully needed for dynamic and extended interaction with expert systems
for consultation and command, where the system's responses should be cooperative.
The need to identify the language user's goals and plans was early recognised by
the Yale group, and has become a major trend in NLP research since, along with a more
careful treatment of speech acts. Work on interactive dialogue in particular, from the
second half of the 70s, has emphasised the communicative function of language, and the
indirect function and underlying meaning, as well as direct function and surface meaning,
of linguistic expressions. At the same time work on discourse understanding in the 70s,
whether on single-source texts like stories or reports, or on dialogue, stimulated research
on anaphor resolution and on the construction, maintenance and use of discourse models
not relying only on prior scenarios like scripts; and some useful progress was made with
the development of notions of discourse or focus spaces and of resolution algorithms
tied to these (Joshi et al., 1981; Brady and Berwick, 1983; Grosz et al., 1986).
9
line but exploited whatever conceptual apparatus was to hand, like case and domain
frames.
The revival of MT was a significant feature of this period, in which European and
Japanese interest played a major part. The European Commission both used production
systems based on customised pragmatism and promoted the Eurotra research project
on multi-lingual translation within a common, well-defined transfer framework. There
were several active Japanese teams, with some translation products in the market (Nagao,
1989). Much of the MT work done assumed that something at least useful and perhaps
more could be provided, particularly for specific applications, with or without editor or
user participation in the translation process; and it reflected the current state of NLP in
grammar choices and the use of modular system architectures.
On the research side, the period was notable for a growth of interest in discourse,
and it saw the first serious work on generation, especially multi-sentence text generation.
There were two sides to the interest in discourse, which came together in the context
of interactive, dialogue systems, for instance for advice giving, where the need for
cooperative system responses implies modelling of the participants' beliefs, goals and
plans, and can naturally lead to the production of paragraph-length output, for instance
in providing explanations. Work on user modelling, as illustrated in Kobsa and Wahlster
(1989), was one strand in research on language use intended for active communicative
purposes and on discourse structure as related to such purposes (Cohen et al., 1990).
At the same time, as e.g., McKeown (1985) showed, rhetorical schemas could be used
as eonvenient recipes for producing communicatively effective, as well as linguistically
coherent, text.
From the point of view of NLP as a whole on the other hand, there was more novelty
in the connectionist approaches explored in this period, implying a very different system
architecture from the conventional modular one (cf. Rumelhart et al., 1986). This work,
though not directly absorbed into the mainstream, can be seen as one source, via the idea
of probabilistic networks, for the present interest in statistically-flavoured NLP.
The final trend of the 80s was a marked growth of work on the lexicon. This was
stimulated by the important role the lexicon plays in the grammatico-logical approach
and by the needs of multi-lingual MT, and also by the problems of transportability,
customising and knowledge acquisition in relation to individual applications. The first
serious attempts were now made to exploit commercial dictionaries in machine-readable
form, and this in tum led to the exploitation of text corpora to validate, enhance or
customise initial lexical data, research made much easier by the rapidly increasing
supply of text material. This last trend can be seen now to be giving the current fourth
period of NLP its dominant colour.
10
as a terminological knowledge base. But this work has been supported by notable
initiatives in data gathering and encoding, and has encouraged a surge of interest in
the use of corpora to identify linguistic occurrence and cooccurrence patterns that can
be applied in syntactic and semantic preference computation. Probabilistic approaches
are indeed spreading throughout NLP, in part stimulated by their demonstrated utility in
speech processing and hence sometimes advocated not just as supports, but as substitutes,
for model-based processing.
The rapid growth in the supply of machine-readable text has not only supplied NLP
researchers with a source of data and a testbed for e.g., parsers. The flood of material
has increased consumers' pressure for the means of finding their way round in it, and has
led both to a new focus of NLP research and development in message processing, and to
a surge of effort in the wider area of text processing which deals with the identification
of the key concepts in a full text, for instance for use in text retrieval (cf. Jacobs, 1992).
Thus NLP, earlier not found to be sufficiently useful for document retrieval based on
abstracts, may contribute effectively to searching full text files. All of this work has
encouraged the use of probabilistic tagging, originally applied only in data gathering,
and the development of shallow or robust analysers. In this context, NLP workers have
also been forced to handle more than well-formed individual sentences or well-mannered
ellipses and to deal, for instance, with the variety of proper names.
The interest in text, as well as in improving the scope and quality of interfaces, has
also promoted work on discourse structure, currently notable for the interaction between
those approaching the determination and use of discourse structure from the point of
view of computational needs and constraints, and those working within the context of
linguistics or psycholinguistics.
A further major present trend can be seen as a natural outcome of the interaction
between consumer (and funder) pressures and the real as well as claimed advances in
NLP competence and performance made during the 1980s. This is the growth of serious
evaluation activities, driven primarily by the (D)ARPA conferences (cf. HLT, 1993) but
also reflecting a wider recognition that rigorous evaluation is both required and feasible
when systems are solid enough to be used for non-trivial tasks (Galliers and Sparck
Jones, 1993). Designing and applying evaluation methodologies has been a salutary
experience, but the field has gained enormously from this, as much from learning about
evaluation in itself as from the actual, and rising, levels of performance displayed.
However evaluation has to some extent become a new orthodoxy, and it is important
it should not tum into an ultimately damaging tuning to demonstrate prowess in some
particular case, as opposed to improving the scientific quality of work in the field and
promoting community synergy.
These evaluation initiatives have nevertheless focused attention on the challenge of
NLP tasks involving operations on a large scale, like text retrieval from terabytes of
material, and the nature of the specific tasks chosen has also had a stimulating effect
in cutting across established boundaries, for instance by linking NLP and information
retrieval. More importantly, the (D)ARPA conferences have helped to bring speech
and language processing together, with new benefits for NLP from the improvements in
speech processing technology since the SUR programme of the 1970s. These improve-
ments are indeed more generally promoting a new wave of spoken language system
applications, including ones involving translation, already demonstrated for limited do-
11
main inquiry systems and proposed in a much more ambitious form in the Verbmobil
project (Kay et al., 1991; Siemens, 1991 ).
Finally, this period has seen a significant, new interest in multi-modal, or multi-
media, systems. This is in part a natural response to the opportunities offered by modem
computing technology, and in part an attempt to satisfy human needs and skills in
information management. But whether combining language with other modes or media,
like graphics, actually simplifies or complicates language processing is an open question.
12
of comparable translation performance for different systems is a salutary reminder of
how far NLP has to go.
The present phase of NLP work is interesting, however, not only because of the
extent to which it demonstrates that some progress has been made since the 1950s,
though far less than was then expected or at least hoped for. Some of its characteristic
concerns were also those of the 50s: thus as I said at the beginning, NLP has returned
to some of its early themes, and by a path on an ascending spiral rather than in a closed
circle, even if the ascent is slow and uneven. The present emphasis on the lexicon and on
statistical information, as well as the revival of interest in MT and in retrieval, reflect the
pattern illustrated, on the one hand, by Reifter's heroic efforts with the Chinese lexicon
and translation (Reifter, 1967), and on the other by the earlier semantic classification
work reviewed in Sparck Jones (1992). The present phase, like the first one but unlike
some intervening ones, also allows for the rich idiosyncracy of language as well as
for its stripped universals, and has again shifted the balance between linguistic and
non-linguistic resources in language processing towards the linguistic side.
As I noted too, this return to concerns of the first phase of NLP is also a reminder of
Don Walker's long-standing interests. While the Mitre work on syntax with which he
was concerned (Zwicky et al., 1965) can be seen as contributing to the ample stream of
computational grammar research, the concern with text data with which Don 's name has
been so closely associated in recent years had its foreshadowing in the title of another
of his early papers: "SAFARI: an online text-processing system", a title truly symbolic
for both Don and the field (Walker, 1967).
13
References
[ 1] ALPAC: Language and machines: computers in translation and linguistics, Report
by the Automatic Language Processing Advisory Committee, National Academy
of Science, Washington DC, 1966; see also Hutchins (1986), Chapter 8.
[2] A1shawi, H. (ed), The Core Language Engine, Cambridge, MA: MIT Press, 1992.
[3] Bledsoe, W. "I had a dream: AAAI presidential address, 19 August 1985", The AI
Magazine 7 ( 1), 1986, 57-61.
[4] Bobrow, D.G. and Collins, A. (eds) Representation and understanding, New York:
Academic, 1975.
[5] Booth, A.D. (ed.) Machine translation, Amsterdam: North-Holland, 1967.
[6] Brady, M. and Berwick, R.C. (eds.) Computational models of discourse, Cam-
bridge, MA: MIT Press, 1983.
[7] Briscoe, E. et al. "A formalism and environment for the development of a large
grammar of English", IJCAI 87: Proceedings of the lOth International Joint
Conference on Artificial Intelligence, 1987, 703-708.
[8] Ceccato, S. "Correlational analysis and mechanical translation", in Booth 1967,
77-135.
[9] Cohen, P.R., Morgan, J. and Pollack, M.E. (eds.) Intentions in communication,
Cambridge, MA: MIT Press, 1990.
[ 10] Cullingford, R. "SAM", 1981; reprinted in Grosz et al. 1986, 627-649.
[11] Engelien, B. and McBryde, R. Natura/language markets: commercial strategies,
Ovum Ltd, 7 Rathbone Street, London, 1991.
[12] Findler, N.V. (ed.) Associative networks, New York: Academic, 1979.
[13] Galliers, J.R. and Sparck Jones, K. Evaluating natura/language processing sys-
tems, Technical Report 291, Computer Laboratory, University of Cambridge, 1993.
[14] Green, B.F. et al "BASEBALL: an automatic question answerer", 1961; reprinted
in Grosz et al., 1986, 545-549.
[15] Grosz, B.J., Sparck Jones, K. and Webber, B.L. (eds) Readings in natural language
processing, Los Altos, CA: Morgan Kaufmann, 1986.
[16] Harris, L.R. "Experience with INTELLECT', The AI Magazine 5(2), 1984, 43-50.
[17] Hays, D.G. Introduction to computational linguistics, London: Macdonald, 1967.
[18] Hendrix, G., Sacerdoti, E., Sagalowicz, D., Slocum, J., "Developing a Natural
Language Interface to Complex Data", ACM Transactions on Database Systems,
Vol3, No.3, pp 105-147, 1978.
14
[19] HLT: Proceedings ofthe ARPA Workshop on Human Language Technology, March
1993; San Mateo, CA: Morgan Kaufmann, in press.
[20] Hutchins, W.J. Machine translation, Chichester, England: Ellis Horwood, 1986.
[21] Hutchins, W.J. and Somers, H.L. An introduction to machine translation, London:
Academic Press, 1992.
[22] Jacobs, P.S. (ed) Text-based intelligent systems, Hillsdale, NJ: Lawrence Erlbaum
Associates, 1992.
[23] Joshi, A.K., Webber, B.L. and Sag, I.A. (eds.) Elements ofdiscourse understanding,
Cambridge: Cambridge University Press, 1981.
[24] Kay, M., Gawron, J.M. and Norvig, P. Verbmobil: a translation system for face-
to-face dialogue, CSLI, Stanford University, 1991.
[29] McKeown, K.R. Text generation, Cambridge: Cambridge University Press, 1985.
[30] Minsky, M. (ed.) Semantic information processing, Cambridge, MA: MIT Press,
1968.
[31] Minsky, M., "A framework for representing knowledge," (ed Winston, P.), The
psychology of computer vision, McGraw-Hill, 1975.
[32] Nagao, M. (ed) A Japanese view of machine translation in light of the consider-
ations and recommendations reported by ALPAC, USA, Japan Electronic Industry
Development Association, 1989.
[33] Plath, W. "Multiple path analysis and automatic translation", in Booth 1967,267-
315.
[34] Reifler, E. "Chinese-English machine translation, its lexicographic and linguistic
problems", in Booth 1967,317-428.
[35] Rumelhart, D.E., McClelland, J.L. and the PDP Research Group, Parallel dis-
tributed processing, 2 vols, Cambridge, MA: MIT Press, 1986.
[36] Rustin, R. (ed) Natura/language processing, New York: Algorithmics Press, 1973.
15
[37] Schank, R. C., Conceptual Information Processing, Amsterdam, North Holland,
1975.
[38] Schank, R.C. "Language and memory", 1980; reprinted in Grosz et al. 1986,
171-191.
[46] Zwicky, A.M. et al., "The MITRE syntactic analysis procedure for transformational
grammars", Proceedings of the Fall Joint Computer Conference, 1965; AF/PS
Conference Proceedings Vol. 27, Part 1, 1965, 317-326.
16