Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Terminology in The Age of Artificial Intelligence: François Massion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Journal of Translation Studies vol. 01/2021, pp.

87–108
© 2021 François Massion - DOI https://doi.org/10.3726/JTS012021.6

françois massion
D.O.G. Dokumentation ohne Grenzen GmbH
francois.massion@dog-gmbh.de

Terminology in the Age of Artificial Intelligence

Abstract
Until a few years ago, artificial intelligence (AI) played no role in practical terminology
work. Meanwhile, intelligent terminology management systems have become part of the
tool landscape. They model knowledge by building concept networks with the help of rela-
tions. These intelligent terminologies can help improve the performance and quality of AI-
based applications and support translators and interpreters in their daily work. Intelligent
terminology repositories offer challenging new opportunities for translators, interpreters
and knowledge workers. This article gives some background information and explains the
functioning of intelligent terminologies.

Keywords
artificial intelligence, machine translation, terminology, semantic frame, machine cognition

1. Introduction

Terminology has always been an integral part of the work of translators


or interpreters. To a large extent, the challenge of translation amounts to
understanding the meaning of special terms and finding their equivalents
in the target language.
We have recently seen with Brexit how difficult it has been for the
various parties involved to communicate effectively without sharing a

Journal of Translation Studies vol. 01/2021 - This work is licensed under a Creative Commons
CC-BY 4.0 license. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
88 François Massion

common understanding of the term Brexit (or associated terms like backstop
or hard border, etc.). This is representative of many situations experienced
by language professionals.
As terms play a key role in translation and interpretation and in
communication, many organizations, companies or individuals have built
multilingual terminologies that they store in different formats, from simple
Excel file to complex relational databases. With the rise of AI, some of the
established ideas on terminology are now under review.

2.  The practice of terminology today

Terminology work has a long tradition. The Austrian Engineer Eugen


Wüster (1898-1977) laid the foundation of modern terminology work in the
early 1930s. The industrialization of society and the technological revoluti-
ons had a profound impact on language, especially on language for special
purpose (LSP) and Wüster took an active part in its shaping. His doctoral
dissertation Internationale Sprachnormung in der Technik, besonders in
der Elektrotechnik (“International Language standardization in technology,
particularly in electrical engineering”) became a standard work in applied
linguistics (Picht and Schmitz 2001). Wüster later published numerous artic-
les and books on how to organize terminology. Wüster’s ideas laid the basis
of practical prescriptive terminology work. His approach is pretty much in
line with the need of industrial companies to standardize the use of terms,
especially of technical terms in order to make communication more efficient.
Of course, prescriptive terminology work is linked to a special purpose
and therefore, Wüster’s views were bound to draw some criticism. Scholars
from different fields pointed at socio-cognitive or communication aspects to
underline the weakness of Wüster’s model in explaining how terminology
is actually perceived and used in society. For example, the “the theory of
doors” of Teresa Cabré (Cabré 2000) or the socioterminology theory of
François Gaudin (Gaudin 1990).
Terminology in the Age of AI 89

So far, the ideas of Wüster’s critics have had little impact on the
terminology repositories used by professional translators or interpreters.
Most terminology repositories in the industry are concept-based. This
means that for each language all terms designing a concept are gathered
and recorded together with attributes and definitions. In many cases these
repositories contain usage recommendations, i.e. they display allowed and
prohibited terms. The reason for this approach can easily be understood
from a practical point of view knowing e.g. that large industrial companies
or organizations frequently record numerous terms for the same part (car
phone, car telephone, cell phone, cellular phone, mobile phone, etc.) and
want to standardize their corporate language.

3.  AI disruption

As in many other branches, new technologies bring along major disruptions


and force all participants to review their own role as well as accepted ideas.
This is the case with the growing influence of artificial intelligence in all
areas related to natural language. AI as such is not new. The term Artifi-
cial Intelligence (AI) was coined more than 60 years ago at the Dartmouth
conference in 1956 and there have since been many AI projects related to
knowledge and language. Just to name a few, there were early attempts to
automatically translate documents, starting in 1946. There was the ELI-
ZA program designed by Weizenbaum in 1965 that could communicate
in natural language with humans and pass the Turing test; or the first
knowledge-based system (Dendral) built in 1965 by Edward Feigenbaum,
Bruce G. Buchanan, Joshua Lederberg and Carl Djerassi. This system used
knowledge bases and reasoning.
Several AI pioneers tackled early on the task of modelling human
thinking for computers, one prominent example being the work of Marvin
Minsky (Minsky 1974) with his frame theory. He saw frames as the repre-
sentation of a stereotyped situation that is called up by the brain when we
encounter a new situation (Minsky 2000).
90 François Massion

Fig. 1:  Hierarchy of representations proposed by M. Minsky (from Minsky 2000)

But all these developments took place in a parallel world, far away from
the daily routine of interpreters and translators. As it is too often the case,
sciences thrive in silos and sometimes research is done independently and
in parallel on similar subjects.
Now AI has reached the radar screen of translators, interpreters or
language professionals. A turning point has been the widespread use of
Deep Learning and Convolutional Networks since 2012 (Le Cun 2019) that
boosted the performance of machine translation. Since then, most language
professionals and scholars have been struggling to assess what effects AI will
have on their profession. Artificial Intelligence has had a disruptive effect
in nearly all fields where it is applied. And terminology is no exception.

4.  The emergence of intelligent terminologies

Until now the major purpose of terminology work has been to model
and construct terminology repositories for use by humans. Whatever the
prevailing theory was, its purpose has been to serve humans. The task of
modelling knowledge for machines had been tackled separately by other
disciplines like computer science with research areas such as logic (building
Terminology in the Age of AI 91

up on the early work of Gottlob Frege and Bertrand Russell1) and ontologies
(Willard Quine 2).
AI led to the introduction of new types of terminology repositories into
the traditional tool landscape. Intelligent terminologies a.k.a. ontotermino-
logies, knowledge- rich terminologies, relational terminologies, termonto-
logies are right now available for translators and interpreters. Intelligent
terminologies belong to the family of augmented translation technologies
that are inspired or driven by artificial intelligence. They model knowledge
by creating multilingual conceptual networks using relationships.
Like other technologies and concepts such as AI or machine translation,
the idea is not new. Ontoterminology is a term coined by Roche in 2007
and further developed (Roche 2007; Roche et al. 2011) and termontology
or termontography can be traced back to Temmerman and Kerremans in
2003 (Temmerman and Kerremans 2003; Kerremans and Temmerman
2004). Besides, the idea of connecting concepts has been formulated long
ago. It can be found in Ross Quillian’s Ph.D. thesis in 1966 that introduced
the idea of semantic networks (Quillian 1968). The difference now is that
these ideas have become part of practical applications that are available
to be used today on a wide scale. A look at WIPO Pearl3, BABELNET4 or
tools like LOOKUP5 can confirm this.

5.  The traditional model of terminology under scrutiny

In order to better understand what these intelligent terminologies are, how


they are structured and why the need for them has arisen, let’s first have a
look at the weaknesses of the existing models.

1 Gottlob Frege published his Begriffsschrift, eine der arithmetischen nachgebildete


Formelsprache des reinen Denkens in 1879. As the title says, he developed a formula
language of pure thinking, based on arithmetic. Bertrand Russell further developed
Frege’s ideas with a critical eye. In his book Introduction to Mathematical Philosophy
(1919), Russell develops his ideas on logic.
2 Willard Quine worked in several fields including logic, the philosophy of language and
ontology.
3 See https://wipopearl.wipo.int/en/linguistic.
4 See https://babelnet.org/.
5 See https://www.dog-gmbh.de/.
92 François Massion

5.1  The semiotic triangle

Traditional terminologies are concept-based, i.e. they start from an abstract


concept and collect for each language respectively all the terms (words,
abbreviations or phrases) that designate that concept. The theoretical foun-
dation of this approach is the semiotic triangle of reference, as first published
by Ogden and Richards in their book The Meaning of Meaning (Ogden and
Richards 1923), a book that still deserves being read today. The underlying
assumption is that there is one exact way to define a concept and that we can
use such defined concepts to excise the portion of reality that matches this
definition, e.g. all mobile phones, all birds, etc. This is supposed to work
in the same way for abstract concept like mathematics or knowledge. The
general approach of the semiotic triangle is shared by most terminologists
and is well documented in multiple standards such as ISO 704:2009.
Over the years, several schools of thought have criticized the short-
comings of the semiotic triangle from different perspectives, pointing to
aspects of linguistics, cognition, intention and communication (Temmerman
and Kerremans 2003; Faber 2011; Cabré 2000). However, their ideas did
not result in a pragmatic terminology model used by practitioners in their
daily work.
As a matter of fact, all three angles of the semantic triangle are subject
to discussion:

1. First the term used or the referent. It is the entry point into the triangle.
A term is supposed to perfectly link the abstract concept and the ob-
ject of the concept. But language is ambiguous and imprecise. There
is polysemy and individuals use words that do not always cover the
semantic features of the abstract definition. They often use a more
general term. The meaning of the word comes from its association
with other words or from extralinguistic aspects. Fillmore describes
e.g. how in certain circumstances terms like magistrate and judge
are used as synonyms. The synonymy is given by the semantic frame
(Fillmore and Baker 2001).
2. Next, the object of the concept itself, that is an individual instance
of the concept. This object can be real and abstract and may have
deviating individual properties due to the context, the situation or the
individual interpretation of a person, etc.
Terminology in the Age of AI 93

3. Third the definition itself. Definitions are not cast in stone. They are
influenced by several factors and can vary substantially depending
on the source of the definition. First of all, reality is not perceived the
same way by everyone: culture, language and individual experien-
ce play an important role. The purpose of terminology work is also
important. While in many cases, terminologists who create entries
de-contextualize concepts and formulate definitions that are intended
to be valid for as many users as possible, in other cases they pursue
specific legitimate goals when they select and define concepts. As
an example, you can look at a bike as a means of transportation, as
a tool to improve your physical condition or as an item in the ledger
of a company that sells products to customers. Depending on the in-
tention, the elements of the definition and the equivalents in other
languages will vary.

Thus, the semiotic triangle does not offer the flexibility to describe “soft”
and variable features of a concept, for instance the communication si-
tuation, the context, the objectives, the experience, the culture, etc. The
semiotic triangle itself is static and represents a frozen in vitro definition
of a concept.
This is the reason for which terminology entries do not always reflect
the actual situation in which a term is used. The connection between the
definition of the concept, the term and the object or idea the translator is
working on fails. Translators or interpreters do regularly experience this
and often need to consult several terminological sources before finding a
translation or definition that is acceptable.
Most terminology databases in use today do not provide a mechanism
to respond to different usage situations. They leave translators and inter-
preters alone with their translation issue. For example, what is the correct
translation of the word container, knowing only the definition “an object for
holding or transporting something”. Unless we see the container or have a
detailed description of the situation, we have no chance of knowing exactly
what it is. It can be a box to transport parcels or used to ship goods overseas
or a recipient for a liquid. Depending on this, the translation will be very
different. But if we see it associated with other terms, as in the following
sentence: “Place the container of dough on the table and pour a cup of water
into the glass,” we will probably find the right translation.
94 François Massion

5.2  Importance of the context

To put it in a nutshell, the main reason for the weakness of traditional


terminologies is that once concept definitions are consulted in practical
situations, they do not always succeed in creating a link between a real or
abstract object and a terminology entry. This is because in general humans
do not think in terms of definitions but rather in terms of situations and
related concepts.
This phenomenon has been studied for almost a century by different
scientists, whether they come from the cognitive sciences, linguistics,
computer sciences or neurosciences. One of the most quoted sentences “You
shall know a word by the company it keeps” was expressed by the English
linguist John Rupert Firth (Firth 1957). Recently, neurosciences publica-
tions explain how semantic networks are formed in the brain, shaped by
the cognitive experience of individuals. There is even a “semantic atlas of
the brain” which is the result of research work published in 2016 by a team
of UC Berkeley researchers and available online6. In the field of cognitive
linguistics, Charles Fillmore developed his theory of frame semantics in
the 1970s by modelling frames as associations of terms used in a typical
context (Fillmore 1976; 1982).

6.  Cognition and terminology

In order to build up terminologies that can better help T&I or machines


to understand concepts, the starting point is to first look at how humans
understand things because intuitively humans manage to do what termin-
ology entries do not succeed at doing: They find a way to map a term to the
physical or abstract reality they are working on. After that, we will look at
how machines understand meaning and we will use our findings to create
a terminology model that can be used by humans and machines alike.

6 See http://gallantlab.org/huth2016/.
Terminology in the Age of AI 95

6.1  Human cognition

Several sciences have contributed over the years to the study of human
cognition: psychology, sociology, neurosciences to name a few. It is now an
established fact that language is embodied, i.e. our senses (sight, hearing, taste,
smell, touch as well as the motor system) play an important role in the way we
understand the meaning of words and sentences (Gallese and Lakoff 2005).
Progresses made in neurosciences allow us to observe via fMRI
(functional magnetic resonance imaging) brain activity in the cortex during
cognition and language tasks. Angela Friederici describes in her seminal
book how the human brain processes language (Friederici and Chomsky
2017) and proposes a language comprehension model. She observes syntac-
tic and semantic processes in the brain, whereas first syntactic processing
occurs and then semantic processing in a different network. This tells us
that human cognition also uses basic rules.
Semantic processes have been studied for example by Friedemann Pul-
vermüller, who observed what he called thought circuits (Pulvermüller 2013;
Pulvermüller, Garagnani, and Wennekers 2014). He could observe learned
neuronal circuits in the brain, e.g. for number words, for action words, for
abstract words. These circuits are developed “by way of associative corre-
lation learning”. He calls them distributed neuronal assemblies (DNA) or
thought circuits. DNAs can be combined into higher-order semantic circuits.
He adds “DNAs also provide a mechanism for combinatorial generalization
and for semantically linking words and concepts (or thoughts)” (Pulvermüller
2013: 585-589).
From these and several associated findings we retain the following
facts that are of some importance when it comes to creating intelligent
terminologies:

1. We have been learning concepts throughout our life mostly via our
senses and our individual experience. This is a reason for which in-
dividuals do not understand things and language in a uniform way
(Dehaene 2018; Frith 2013).
2. Certain concepts have been learned at a very early stage in our life
and are strongly connected to our early sensory and motor experien-
ces. They can be observed in many primitive metaphors like “a warm
person” (Lakoff and Johnson 2008).
96 François Massion

3. More complex meanings are built as a combination of basic elements.


This implies that units of knowledge are stored in our brain and are
combined using rules.
4. We do not understand concepts in isolation. Concepts are always re-
lated to other concepts.
5. Related concepts are understood and memorized as typical represen-
tative contexts, e.g. game + chess or game + football not just game
alone. We cannot understand a term without knowing one or more of
its typical usage contexts.
6. Recurring and typical contexts are memorized referents for a meaning
(e.g. university lecture) and can be adapted to individual situations by
adding or deleting some features associated with the reference context
(e.g. a chess game as a mobile phone application). In a way this is si-
milar to the Difference-Engines imagined by Minsky (Minsky 2007).

There are several terms used to describe such typical contexts: semantic
network, frame, scenario, mental space …. We will use here the term frame
in this sense.

6.2  Machine cognition

Machines have of course no intelligence and no cognition which are hu-


man properties. But machines are very good at crunching numbers and
implementing algorithms designed by humans. Thus, when we use the
term machine cognition, we mean here the ability of a system to process
natural language data and extract from it information in order to deliver a
result that emulates human cognition. This could be providing an answer
in natural language to a question, the translation of a text or the summari-
zation of an article.
Different methods are used by machines, from deep learning by neural
networks to reasoning and inferring using rules and information objects
from knowledge bases. We will leave the latter aside as a major part of the
cognition work is explicitly designed by humans who write the rules of
inference and validate the knowledge data.
More interesting for our research is the way deep learning algorithms
learn and represent the meaning of words, sentences and documents. If we
Terminology in the Age of AI 97

take for example a neural machine translation system, the learning process
is divided into two main tasks: decoding and encoding. Encoding means
that the neural network takes thousands of source sentences (from bilingual
training material) and learns for each sentence a so-called “thought vector”,
i.e.a series of numbers that reflect the words of the source sentence and their
interdependencies.
After the encoding of the source sentence is completed, the decoding
occurs, looking at the words of the thought vector and trying to predict the
associated words of the target language. The system has the choice between
all the words it has learned and uses the values from the vectors learned
for each individual word in order to calculate the most likely translation.
In the learning phase, the learning algorithm counts how often words like
translation and memory are used together or are used in a similar context.
The result is a model used to predict sequences of words. Thus, when the
decoder has the choice between memory and banana as a term that comes
after the word translation it will take the term memory. It does this using
an output function that looks for a maximum value (softmax) for the combi-
nation of the memory vector and the other word vectors of the vocabulary.
Of course, the learning process is more complex than this simplified
representation. Deep Learning networks that started to be widespread and
successful from 2012 on, use several hidden layers between the input (source
sentence) and the output (machine translation). Intermediate layers are hid-
den and take over specific tasks like, as humans do, looking at the context
of neighbouring words before making a decision on the most appropriate
next term. This is how the so-called CNNs (convolutional neural networks)
operate. The data is passed on from one layer to the next layer. Each layer
is responsible for a specific task. Before being reached out to the next layer,
the numeric data is transformed in order to reflect the intermediary results
(=decisions). In a way this emulates neural networks in the brain that use
neurotransmitters to send information from one neuron to the next.
CNNs have been inspired by human visual perception in the visual
cortex described by David Hunter Hubel and Torsten Nils Wiesel in 1962
and for which they later received the Nobel Prize. They discovered two cate-
gories of cells in the visual cortex, simple and complex cells that processed
information from regions of cortical fields. Complex cells summarized the
stimulus of simple cells and specialized in recognizing certain patterns,
e.g. vertical or horizontal traits. The final image is the result of a sequence
98 François Massion

of features captured by dedicated groups of cells and concatenated into a


final image.
What can we learn from machine cognition?

1. Words are understood in connection with other words. The meaning


of a word is expressed by a word vector (series of values associated
with the word). A vector is learned.
2. The values in the word vectors reflect the strength of the relationship
between words. Words that occur together or behave similarly have a
stronger link.
3. Because word vectors are numerical values, mathematical operations
can be performed on them. They can be added, subtracted or compa-
red. Comparison gives a degree of semantic similarity of concepts.
Combinations are used to construct more complex meanings.
4. The learned knowledge can be saved, exchanged and reused. Once a
model exists it always delivers the same results.

Currently, the state-of-the-art models used for neural machine translation are
called transformers. Systems that have been trained via Deep learning are
predictive models and do not use rules to generate language or translations.
Because of this they are not always consistent in the use of words and they
make mistakes when words have context-dependent meanings. Advanced
algorithms like BERT, BERTRAM or ELMo are trying to make trained
models more aware of the context but there is still a long way to go as no in-
telligence or reasoning is involved and the data to be computed is immense.
In a recent article titled “Extending Machine Language Models to-
ward Human-Level Language Understanding” (McClelland et al. 2019),
AI researchers suggest enhancing existing machine language models by
building up an artificial fast learning system that stores embeddings derived
from past learning cycles and using them in an integrated system of know-
ledge. In context the system would retrieve relevant information by also
looking at the similarity between the stored embeddings and the context.
The authors write “artificial systems with indefinite capacity could exceed
human abilities in this regard” (McClelland et al. 2019: 7).
Terminology in the Age of AI 99

6.3  Multilingual aspects

Some 250 years ago, John Wilkins (Wilkins 1668) dreamed of a universal
language that could describe reality in an unambiguous way to all humans
independently of their own language. He writes:
So that if men should generally consent upon the same way or manner of Ex-
pression, as they agree in the same Notion, we should then be freed from that
Curse in the Confusion of Tongues, with all the unhappy consequences of it.
(Book I, Chap. V, 20)

We know that each language has different ways of expressing meanings


and uses different amounts of words. This is a special challenge for termi-
nologists who cannot always clearly map terms and concepts across lan-
guages. The reasons can be multiple: A concept does not exist in the target
language (take for example the Japanese capsule hotels, unknown in many
other cultures) or one language structures the reality differently, giving the
translator several translation alternatives depending on the context. This is
e.g. the case with the French verb télécharger that can be translated either
by upload or download. As a result, translators or interpreters spend much
of their time researching specific terms and their equivalents.

6.4  Frames and intelligent terminologies

From the description of human and machine cognition we can now define a
new type of terminologies that can be used by humans and machines alike
and can represent the meaning of terms in context and across languages. We
want to complement the existing multilingual structure of concept-based
terminologies with standard contexts that we call frames. Frames stand for
typical use situations of concepts or terms.
This can be useful not only for homonyms, i.e. clearly different con-
cepts like bridge (related to construction or to a tooth) but also for synonyms
that are not interchangeable all the time like boss and manager or gingivitis
and gum disease.
A particular advantage of this approach concerns multilingual termi-
nologies. On the one hand we may have a concept with a definition shared
equally by all languages like taxi or breakfast, but with different contexts
of use, e.g. a breakfast in France and in China. Relations between each
100 François Massion

language variant and concepts in other languages (e.g. rice vs. croissant in
the case of breakfast) help translators and interpreters in their work.
Another useful application concerns terms and concepts or terms
that have more than one equivalent in the target language. For example,
the English term fish has two translations in Spanish: pescado (= seafood)
and pez, depending on whether the fish is a dish on your plate or is alive.
Each usage situation of the Spanish translation can be linked to a different
concept frame like restaurant, plate, waiter in the one case and sea, water,
swarm in the other case. This is not only very useful for translators, but also
for quality assurance technologies that can automatically interpret these
relations and report a wrong translation in a specific context.

7.  Building an intelligent terminology in four steps

Intelligent terminologies are organized as collections of concepts that are


interconnected through relations. Four successive steps are required to build
intelligent technologies:

(1) In the beginning there is a collection of terms extracted from a refe-


rence corpus.
(2) These terms are then merged into concepts, e.g. software and ap-
plication are merged into the common concept of an object used to
“(instruct) a computer to do specific tasks” (www.techopedia.com).
(3) The concepts are enriched with additional information and metadata,
e.g. a definition, an illustration, a status or usage attributes, etc.
(4) The concepts or the terms are linked together according to predefi-
ned relation categories. Different types of relations are used, such as
hierarchical relations, part-whole relations and associative relations.
Hierarchical categories usually reflect some sort of classification or ta-
xonomy. Associative relations depend on the subject matter. A medical
scientist will need other relations than an automotive engineer.

In general, knowledge-based terminology systems will display a concept


map with one concept in the core and related concepts around it. Transla-
tors or interpreters can use this information to visualize the context of the
concept they are trying to understand and thus get valuable additional input.
Terminology in the Age of AI 101

7.1  Identifying relations

It is a challenge to define relation categories and to apply them to concepts


or terms. Related terms are the construction elements of frames, i.e. typical
use contexts of concepts. But what is a typical context?
Building semantic relations between concepts requires time and in-
depth domain expertise. One way is, of course, to have subject specialists
use their personal knowledge to connect concepts one by one. This ap-
proach is the best in terms of quality, because the knowledge modelled in
the terminology database has been hand-picked and validated by subject
specialists. However, this can be very time-consuming and requires a high
investment in time and money.
Translators or interpreters do something similar when they research
terminology for a project. Instead of making notes they could preserve their
work for later reuse by themselves or colleagues by modelling relations in
an intelligent terminology database.

Fig. 2:  Graph representation of an entry in an intelligent terminology database


102 François Massion

Beside manual research work, tools and methods of natural language proces-
sing (NLP) and artificial intelligence can be used to semi-automatically identify
terms that are used together in the same context. For example, co-occurrence
matrices can be generated and processed to create lists of co-occurring words.
The idea of word embeddings was introduced in 2001 (Bengio, Du-
charme, and Vincent 2001). Word embeddings are learned by machines
and associate a keyword with other words that influence it, have a similar
meaning or behave in the same way. For each word, these associations are
vectors of mostly between 100 and 300 dimensions and can be queried to
get lists of semantically related words, e.g. words related to graduate (see
figure 3). This information is useful for building semantic frames.

Fig. 3:  Similar words from a word embedding7

7 Word embedding extracted from article: Cai and Dong (2015).


Terminology in the Age of AI 103

7.2  Application examples

Because they have a strong knowledge component, intelligent terminologies


can be used in multiple ways. Here are some examples. Intelligent termi-
nologies can be used:

• To help translators or interpreters understand the context of a concept


beyond a definition and make the right translation choice.
• To discover and visualize knowledge hidden in documents or translations.
• To structure and store new knowledge.
• To verify the correct use of terms in translations.
• To augment deep learning algorithms by disambiguating terms in
context.

They are particularly useful in situations where information needs to be


extracted, as is the case with a document to be translated. A document as
such is only a collection of words. Before she starts interpreting, the inter-
preter must analyse the reference material, i.e. identify the subject, spot the
ambiguities, recognize the relations between words, understand the concepts
transported by the text. This process can be time-consuming, especially
when dealing with large documents. However, this process can be accele-
rated with intelligent terminologies using techniques such as annotation,
markup and highlighting. For example, it is possible to highlight/markup:

• a context for connected relevant terms, e.g. income > equity > taxes
(as opposed to the income > sales > goods, that may require a different
translation in some languages).
• terms with a special usage attribute (e.g. prohibited translation)
• categories of terms based on their properties (e.g. attribute about roles,
named entities, text classification, part of speech, etc.)

An illustration of this are the Wikipedia articles that contain intelligent links
to data associated with the terms highlighted and linked in the articles.8
There are several tools available on the market for text annotation.
Some of them can directly tap intelligent terminology databases.
8 To experience this, you can open a Wikipedia page, e.g. https://en.wikipedia.org/wiki/
Artificial_intelligence, and then click in the left panel under Tools on the menu item:
Wikidata item.
104 François Massion

Fig. 4:  Example of annotation tool – Babelfy (www.babelfy.org)

Annotation can also add markup to content for further processing by diverse
applications. In this way, intelligent applications such as chatbots or smart
assistants can “understand” annotated content, recognize the elements with
relevant information and output the required results. This can already be
seen in areas such as technical support for products or in marketing and
sales, when connected products such as flight, hotel and rental car are offered
to the user as a package.
Quality assurance technologies connected to intelligent terminology
repositories can check context-dependent translation variants and identify
mistranslations in a particular context.

8. Conclusion

Intelligent terminologies are still relatively new, and it will take some time
before they are adopted by translators and interpreters or by academia on a
broad basis. Existing solutions differ in the variety of relations they model
and the methods they use to implement them. Intelligent terminologies are
the result of multidisciplinary research work and are changing the paradigms
of terminology work. Language professionals can benefit from them but will
also discover new services opportunities. With their unique skills, they can
help build up genuinely multicultural and multilingual knowledge bases.
Terminology in the Age of AI 105

Glossary

In order to avoid misunderstandings, especially in an article on terminology,


we have used the terms below in the following senses:

Expression Description
Concept Abstract unit of thought (e.g. what is understood by ‘software’)
Term Word or group of words denoting a concept
Relation Semantic relation between concepts or terms (e.g. meronomic relation:
steering wheel is part of car)
Frame Typical recurrent context of use for a term or concept.

Bibliographical references

Bengio, Yoshua, Réjean Ducharme, and Pascal Vincent (2001) “A neural


probabilistic language model”, Advances in Neural Information Pro-
cessing Systems.
Cabré, M. Teresa (2000) “Elements for a theory of terminology: Towards
an alternative paradigm”, Terminology, 6:1, pp. 35-57.
Cai, Rendong, and Dong, Yanping (2015) “Interpreter training and students
of interpreting in China”, The Journal of Translation Studies, 16:4,
pp. 167-191.
Cun, Yann Le (2019): Quand La Machine Apprend, Paris, Odile Jacob.
Dehaene, Stanislas (2018): Apprendre!: Les Talents Du Cerveau, Le Défi
Des Machines, Paris, Odile Jacob.
Faber, Pamela (2011) “The dynamics of specialized knowledge representa-
tion: Simulational reconstruction or the perception–action interface”,
Terminology: International Journal of Theoretical and Applied Issues
in Specialized Communication, 17:1, pp. 9-29. <https://doi.org/10.1075/
term.17.1.02fab>.
Fillmore, Charles J. (1976) “Frame semantics and the nature of langua-
ge”, Annals of the New York Academy of Sciences. <https:// doi.
org/10.1111/j.1749-6632.1976.tb25467.x>.
Fillmore, Charles J. (1982) “Frame semantics”, Linguistics in the Moning
Calm: Selected Papers from SICOL-1981.
106 François Massion

Fillmore, Charles J, and Collin F Baker (2001) “Frame semantics for


text understanding”, Text, pp. 3-4. <http://scholar.google.com/scho-
lar?hl=en&btnG=Search&q=intitle:Frame+Semantics+for+Text+Un-
dersta nding#0>.
Frith, Chris (2013): Making up the Mind: How the Brain Creates Our Mental
World, Wiley. < https:// books.google.de/ books?id=qTkWO7hErD4C.>
Firth, John Rupert (1957) “A synopsis of linguistic theory, 1930- 1955”,
Studies in Linguistic Analysis. <https://ci.nii.ac.jp/ naid/10020680394/
en/>.
Frege, Gottlob (1879): Begriffsschrift, eine der arithmetischen nachgebildete
Formelsprache des reinen Denkens, Halle, Verlag von Louis Nebert.
Friederici, Angela D., and Noam Chomsky (2017): Language in Our Brain:
The Origins of a Uniquely Human Capacity, MIT Press.
Gallese, Vittorio, and George Lakoff (2005) “The brain’s concepts:
The role of the sensory-motor system in conceptual knowledge”,
Cognitive Neuropsychology, 22:3-4, pp. 455-479. <https://doi.
org/10.1080/02643290442000310>.
Gaudin, François (1990) “Socioterminology and expert discourses”,
TKE’90: Terminology and Knowledge Engineering, 2, pp. 631-641.
<https://hal.archives-ouvertes.fr/hal-01090697>.
Kerremans, Koen, and Rita Temmerman (2004) “Towards multilingual,
termontological support in ontology engineering”, Proceedings of
Termino 2004, Université de Lyon.
Lakoff, George, and Mark Johnson (2008): Metaphors We Live By, Uni-
versity of Chicago Press.
McClelland, James L., Felix Hill, Maja Rudolph, Jason Baldridge, and Hin-
rich Schütze (2019) “Extending machine language models toward hu-
man-level language understanding”. <http://arxiv.org/ abs/1912.05877>.
Minsky, Marvin (1974) “A framework for representing knowledge”, MIT-
AI Laboratory Memo 306. Reprinted in The Psychology of Computer
Vision. Ed. by Patrick Winston (1975), McGraw-Hill.
Minsky, Marvin (2000) “ Commonsense- based interfaces”, Communicati-
ons of ACM, 43: 8, pp. 66-73. < https:// doi. org/10.1145/345124.345145>.
Minsky, Marvin (2007): The Emotion Machine: Commonsense Thinking,
Artificial Intelligence, and the Future of the Human Mind, Simon &
Schuster. <https://books.google.de/books?id=agUgKCrLIMQC>.
Terminology in the Age of AI 107

Ogden, Charles Kay, and Ivor Armstrong Richards (1923): The Meaning of
Meaning, London: Routledge & Kegan Paul. <https://books. google.
de/books?id=fo84zQEACAAJ>.
Picht, Heribert, and Klaus-Dirk Schmitz (2001): Terminologie und Wissens-
ordnung: Ausgewählte Schriften aus dem Gesamtwerk von Eugen
Wüster, Term Net. <https://books.google.de/ books?id=utaTAAAA-
CAAJ>.
Pulvermüller, Friedemann (2013) “How neurons make meaning: Brain
mechanisms for embodied and abstract-symbolic semantics”, Trends
in Cognitive Sciences, 17:9, pp. 458-470. <https://doi.org/10.1016/
j.tics.2013.06.004>.
Pulvermüller, Friedemann, Max Garagnani, and Thomas Wennekers (2014)
“Thinking in circuits: Toward neurobiological explanation in cognitive
neuroscience”, Biological Cybernetics, 108:5, pp. 573-93. <https://
doi.org/10.1007/s00422-014-0603-9>.
Quillian, Ross M. (1968) “Semantic memory”, in Semantic Information
Processing: Readings in Cognitive Science. Ed. by Marvin Minsky,
MIT Press.
Quine, Willard V.O. (1969): Ontological Relativity and Other Essays, United
Kingdom, Columbia University Press.
Roche, Christophe (2007) “Le Terme et Le Concept: Fondements d’une
Ontoterminologie”, TOTh 2007: « Terminologie & Ontologie: Théories
et Applications » - Annecy 1er Juin 2007, pp. 1-13.
Roche, Christophe, Marie Calberg-challot, Luc Damas, Philippe Rouard,
Christophe Roche, Marie Calberg-challot, Luc Damas, and Philippe
Rouard Ontoterminology (2011) “Ontoterminology: A new paradigm
for terminology”, Proceedings of the International Conference on
Knowledge Engineering and Ontology Development, Funchal-Ma-
deira, Portugal.
Russell, Bertrand (1919): Introduction to Mathematical Philosophy, London,
George Allen and Unwin.
Temmerman, Rita, and Koen Kerremans (2003) “Termontography: Ontolo-
gy building and the sociocognitive approach to terminology descrip-
tion”, Applied Linguistics, 105, pp. 1-10. <http://citeseerx.ist. psu.edu/
viewdoc/summary?doi=10.1.1.95.3960>.
Wilkins, John (1668): An Essay Towards a Real Character and a Philoso-
phical Language, S. Gellibrand.

You might also like