Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
46 views6 pages

Séquence 4 NEW PPDDFF

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

Séquence 4 : Applications of Corpus Linguistics in Research

4.1 Introduction

Read & compare


Read
Read and Compare (critical thinking) Leech’s and Biber’s definitions of corpus
linguistics. This article presents two facets of corpus-based research. What follows are two
definitions of how quantitative and qualitative researches relate to the question of corpora.

(Biber) “Quantitative techniques are essential for corpus-based studies. For example, if
you wanted to compare the language use of patterns for the words big and large, you would
need to know how many times each word occurs in the corpus, how many different words
co-occur with each of these adjectives (the collocations), and how common each of those
collocations is. These are all quantitative measurements….”A crucial part of the corpus-
based approach is going beyond the quantitative patterns to propose functional
interpretations explaining why the patterns exist. As a result, a large amount of effort in
corpus-based studies is devoted to explaining and exemplifying quantitative patterns1.”
(Leech) “[I]n corpus linguistics quantitative and qualitative methods are extensively used
in combination. It is also characteristic of corpus linguistics to begin with quantitative
findings, and work toward qualitative ones. But…the procedure may have cyclic elements.
Generally, it is desirable to subject quantitative results to qualitative scrutiny—attempting
to explain why a particular frequency pattern occurs, for example. But on the other hand,
qualitative analysis (making use of the investigator’s ability to interpret samples of
language in context) may be the means for classifying examples in a particular corpus by
their meanings; and this qualitative analysis may then be the input to a further quantitative
analysis, one based on meaning….”2

We can say that both definitions are not opposite statements of what corpus
linguistics is. We might even add that the two definitions do complete each other.
Certain aspects from Leech’s or Biber’s approaches to the defining the concept add
up without subtracting anything from the other. Leech insists on the combination of
both approaches to research undertakings.
_______________________

Reading comprehension

READ and ANSWER question

Additional Applications of Corpus-Based Research


"Apart from the applications in linguistic research per se, the following practical
applications may be mentioned.

Lexicography
Corpus-derived frequency lists and, more especially, concordances are establishing
themselves as basic tools for the lexicographer. . . .
1
– Douglas Biber, Susan Conrad, and Randi Reppen, Corpus Linguistics: Investigating Language Structure and Use,
Cambridge University Press, 2004.
2
Douglas Biber, Susan Conrad, and Randi Reppen, Corpus Linguistics: Investigating Language Structure and
Use, Cambridge University Press, 2004.
Language Teaching
. . . The use of concordances as language-learning tools is currently a major interest in
computer-assisted language learning (CALL; see Johns 1986). . . .
Speech Processing
Machine translation is one example of the application of corpora for what computer
scientists call natural language processing. In addition to machine translation, a major
research goal for NLP is speech processing, that is, the development of computer systems
capable of outputting automatically produced speech from written input ( speech synthesis),
or converting speech input into written form ( speech recognition)." (Geoffrey N. Leech,
"Corpora." The Linguistics Encyclopedia, ed. by Kirsten Malmkjaer. Routledge, 1995).

RANK according to you the mentioned domains of Corpus Linguistics in research


(Lexicography; Language Teaching; Speech Processing). It seems that all three domains
are important for the advancement of science. We still believe that speech processing has
still a few years to reach excellence.

________________

4.2. Spoken and written corpora


4.2.1 Spoken corpora

Read the following about how a spoken corpus is built-up. There are certain principles that
are fundamental when considering that task:

Building a spoken corpus: what are the basics3?


Throughout the development of corpus linguistics there has been a noticeable focus on
analyzing written language and, at the time of writing, written corpora run close to the 20-
billion-word mark (e.g. the EnTenTen suite; Jakubíček et al. 2013). Using these corpora,
the possibilities for generating new insights into the ways in which language is structured
and used are both exciting and unprecedented. Spoken corpora, on the other hand, tend to
be much smaller in size than their written counterparts.
A chapter in the first edition of this Handbook featured a ‘wish-list for future corpora’
which included a hope that ‘creating spoken corpora will benefit from technological
advances in speech recognition, thus making the task of transcribing spoken language to
text files a much more efficient process and more automated task’ (Reppen 2010: 36).
Despite the passing of a decade and the ever-increasing sophistication of speech-to-text
technologies, a fully accurate and sophisticated automated approach to spoken corpus
construction has still not been developed. The collection and processing of spoken corpora
remains largely manually driven and is typically costlier and more time-consuming to
undertake than the construction of written corpora. This has resulted in the comparative
lag in the development of spoken datasets versus their written counterparts and an on-
going ‘written-biased view’ in corpus linguistic research (Lüdeling and Kytö 2008: vi).
Despite these challenges, there remains a growing interest in developing spoken
corpora; this is a testament to the value they provide to a diverse number of research
communities. Following on from the early developments of relatively small spoken corpora
in the 1960s, such as the 500,000-word London-Lund Corpus (Svartvik 1990), the past five

3
Dawn Knight and Svenja Adolphs? pp.21-22).
decades have seen major advances in the collection and development of spoken corpora,
particularly in the English language, but not exclusively. Examples of English spoken
corpora include the 5-million-word Cambridge and Nottingham Corpus of Discourse in
English (CANCODE; McCarthy 1998), the 1-million-word Limerick Corpus of Irish
English (LCIE; Farr et al. 2004), the 1-million-word Hong Kong Corpus of DOI:
10.4324/9780367076399-3 21 Spoken English (HKCSE; Cheng and Warren 2002) and the
recently released 11.5- million-word Spoken British National Corpus 2014 (Spoken BNC
2014; Love et al. 2017).

_________________________

4.2.2. written corpora

Question related to text in sequence part: Why do you think McEnery and
Brookes believe that corpus linguistics puts the emphasis on written corpora?

1. because of the amount of written data;


2. the readability of the mass of data by machines.

4.2.3. the language of science

----------------------------------------

4.2.4. backgrounds of technical terms

(30mn)
Summarise text
[This activity will help you take notice of and understand the Backgrounds of
Technical Terms, and their progress. Take into consideration the language items
underlined (in red). These words can help you focus on the main ideas you can use to
summarize the text].
“As modern sciences evolved from folk beliefs and folk practices, the terms used for
the expression of scientific concepts have also their origin in the folk language. Thus, as
and when refinement and rigour were required, the terms used for the expression of
scientific concepts were sifted out from the common language and were given and used
with specific and codified reference. These form the first category of technical terms in any
language.
The second category includes those terms that come into existence in the language if
the concepts are suggested and codified first in that language. When scientific concepts
were adopted on a massive scale because of the newness of the concepts as well as the
newness of agencies and systems that impart such concepts, there may be a large-scale
importation into a language of technical terms that have their origins in the languages of
other societies…
In yet another category, users of a language may not at all accept any terms that
have their origins in another language and may at all times aim at coining their terms in
their own language. That is, there are a number of variables that should be considered:
how do the users of a language view the use of terms that are imported into their language;
how do they exploit the terms that are already available in their language to express a
related but newly founded concept; what is the state of (quality of) scientific research
carried out using that language as the medium; what are the programmes for the coinage
of terms; is the coinage a translation process or is it an integral part and a consequence of
the state of scientific research carried out in a community through its language; what is the
nature of the genius of the language in terms of productive coinage facilities and in terms
of adaptation and adoption of terms with origins in other languages…
In general, the following points may be considered in order to understand the
processes of coinage and use of technical terms: one should consider, first of all, the
science, its requirements in a particular period of study, its state of art in a particular
period of study and its state, of art in the society's institutions at the time of study. Though
the scientific concepts are available in a discipline, the absorption of these concepts and
their expression in languages will differ from society to society depending upon each
society's development with regard to science.
Secondly, we should consider the role of the scientist in the discipline, as a
professional practitioner of the discipline and as an individual using a particular language
for the expression of sciences. In all these, he is governed by certain social norms of his
society.
Thirdly, we must consider the genius of a particular language that is being used by
the scientist for the expression of his concepts. The genius of the language may be looked at
from two different but related angles. In the first, we identify the structural mechanisms that
are provided by a language for the coinage of technical terms. This angle would include
answers to questions such as what potential a language has with regard to absorption of
concepts, that originated in a different linguistic background, and, are there devices in the
language that help a scientist to derive and express a related concept using an already
existing term in an extended sense. In the second category we consider the different types of
linguistic conventions and linguistic trends for the adoption, coinage and adaptation of
technical terms…
Fourthly, we must study the goals a society has set before itself with regard to
scientific pursuits and the importance it attaches to these pursuits. We must also study how
the society views the borrowing of terms, its relationship with other societies which speak
languages different from its own language, and the trends with regard to the maintenance
of self-identities and so on.
Another point that one should consider in studying the coinage of technical terms is
as to what formal agencies are there for the coinage of technical terms, their policies and
an evaluation of their influence in not only the coinage of terms but also in the use of the
terms.
Sixthly, we may consider also the other informal agencies that may have existed or
exist which engage themselves in such coinages of terms. Under this category one should
consider the influence of newspapers and popular magazines. One should also consider the
individuals and their views and works. These individuals may or may not be scientists but
may be engaged in providing multi-disciplinary works. In fact, pioneers among these
individuals have contributed a great deal in shaping the trends in the coinage of technical
terms and their use in various developing languages….
Yet another point one should consider is the audience for whom the technical term
is intended. This is related to the social conventions, background of the audience in terms
of its acquaintance with the concepts being described, and the quality or competence of the
scientist in the concerned language.
The needs of audience may have encouraged the scientific pursuits. But the
identification of concepts and their labelling need not be directly influenced by the
audience. It is when the communication of the concepts is thought of that the background of
the audience becomes very important. In fact, the presence of the audience is a check on the
growing alienation of technical language from the common language use. Yet another
factor that is related to the background of the audience referred to above is the degree of
literacy prevailing in the community. The motivation to write in a language known to the
masses, when the masses themselves are an illiterate majority, may be very much less. The
scientist seeking recognition of his works is then left with no choice except writing through
a medium that is widely used. And as a result the quantum and kinds of technical terms in a
language used by the illiterate majority will be found only of limited scope and range…
Generally speaking, technical terms abound in certain fields. In these fields, due to
the newness of the concepts or due to a desire to avoid the "misleading" influences of
common language terms, a conscious effort is made to coin and use more and more
technical terms… In any case technical terms abound in texts that are meant for specialists.
They are found in less number in texts meant for non-specialists. An attempt is made to
keep the number to the minimum possible in texts meant for laymen.”

4.2.5. scientific discourse

Reading for Information (40mn)

- Features of Scientific English at the word level4:

Read for Information the following and pay attention to the words in red because
they give you the main features that are characteristic of scientific English.]
“Terms are formed by compounding: crankshaft
1. Blends: trunk + union: trunnion
2. Terms formed by affixation : electrifiable
3. Shortening: Psi 'pounds per square inch'
4. Words through conversion from one phrase type to another: to blow out: blow-out
(of fuses)
5. Through semantic transfer.
6. Foreign features in plural marking: focus/foci.
7. Irregular verbs: wrought steel
8. Verbal concord fluctuation: dynamic is/are
9. The use of the suffix -s in tools and instruments such as clippers, jointers, shears
and so on.
10. More foreign elements in technical terms.
11. Multi-word terms with hyphens. Note that the word forming processes we have
indicated are now resorted, to in other uses of language also, particularly in
modern journalism. Note also that of all the above, compounding is the most
productive process in English. A new word is formed by combining two or more
independent words. The form thus created acts as an independent lexical unit, as if
it were a single word. The inflectional affixes such as those for marking plural and
possession are added to the last part of the compound, indicating that the form,
indeed, behaves like a single word. The meaning of a compound, however, is not
always just the sum of its parts. As there is no fixed rule followed even in the
language of science for the formation of compounds, the meanings of some of these
4
M. S. Thirumalai, 2003), http://www.languageinindia.com/jan2003/languageinscience.html
compounds are ambiguous.
12. There are three types of elements that are used as affixes in English. Under the first
category are the affixes such as -er, -ish, -ness, and -oid. These are shared both by
the ordinary and scientific English.
13. The second category of affixes includes only those having origin in scientific
English, but are used in ordinary language also, through a slow entry into it from
scientific English.
14. The third category includes quasi-affixes.
15. There are several types of derivation
 Derivation in which a verb is formed out of noun, adjective or even another
verb. (One type of verb is converted into another type.)
 Derivation in which a noun is formed out of a verb or adjective. Sometimes
one type of noun is changed into another type of noun.
 Derivation of an adjective in which a verb or noun is changed into an
adjective. All the three may be accomplished through affixation and/or
compounding processes.
16. qualifying elements are added to a generic term to express the specialized
meanings.
17. There are a number of neologisms, the rules for some of which are transparent and
for others opaque. Neologism plays an important role in modern scientific English,
and is one of the sources of vocabulary in English, which include also nominal
phrases, compounds, derivatives, new applications of words and borrowings. A new
type of neologism somewhat similar to the category of new applications of an
already available word is coming into existence, especially in applied sciences.
Words of ordinary language are given a meaning, which by any stretch of
imagination could not be associated with those words. In space industry such
usages are becoming quite common, posing a great threat to communication.”

________________________

4.3. language corpora in the digital world

You might also like