Full Text 01
Full Text 01
Sofia Helgegren
2005-05-24
LIU-KOGVET-D--05/09--SE
Magisteruppsats i kognitionsvetenskap
Instutitionen för datavetenskap
Linköpings universitet
Contents
1 Introduction 1
2 Background 5
2.1 A Brief Introduction to the Harry Potter Series . . . . . . . . . . 5
2.1.1 The Harry Potter Series and Culture . . . . . . . . . . . . 6
2.2 The HP Series from a Translation Studies Perspective . . . . . . 7
2.2.1 A Note on the Translator . . . . . . . . . . . . . . . . . . 7
2.2.2 The Harry Potter Books as Novels . . . . . . . . . . . . . 7
2.3 Previous Studies of the Harry Potter Books . . . . . . . . . . . . 8
3 Translation Theory 9
3.1 Translation and Culture . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Descriptive Translation Studies . . . . . . . . . . . . . . . . . . . 10
3.3 The Effect of the Translator . . . . . . . . . . . . . . . . . . . . . 11
3.4 Translation Universals . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Explicitation . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.2 Simplification . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.3 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Translation of Fiction . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 Children’s Literature in Translation . . . . . . . . . . . . . . . . 15
3.7 Constraints on Translation of Children’s Literature . . . . . . . . 16
4 Studying Translations 19
4.1 Parallel Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Sentence Alignment . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Part-of-speech Tagging . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Word Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4.1 Guidelines for Manual Word Alignment . . . . . . . . . . 21
4.5 Non-1-to-1-operations . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Lexical Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.6.1 Strategy for Lexical Shifts . . . . . . . . . . . . . . . . . . 25
4.7 Paraphrasing and Lexical Choice . . . . . . . . . . . . . . . . . . 25
CONTENTS
5 Methodology 27
5.1 The Sequence of Work . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 A Presentation of the Tools . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 I*Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 I*Trix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.3 New Tools, New Possibilities . . . . . . . . . . . . . . . . 30
7 Results 41
7.1 The HP-Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2 Translational Results . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2.1 Additions and Deletions . . . . . . . . . . . . . . . . . . . 42
7.2.2 Translation Universals . . . . . . . . . . . . . . . . . . . . 49
7.2.3 Investigating Translational Choices . . . . . . . . . . . . . 53
7.3 Methodological Results . . . . . . . . . . . . . . . . . . . . . . . . 58
7.3.1 Evaluation of the Different Strategies . . . . . . . . . . . 59
8 Discussion 61
8.1 Discussion on the Translational Results . . . . . . . . . . . . . . 61
8.1.1 FDG Imperfections . . . . . . . . . . . . . . . . . . . . . . 61
8.1.2 The Relationship between Additions, Deletions and Lex-
ical Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.1.3 Translation Universals . . . . . . . . . . . . . . . . . . . . 63
8.1.4 The Development of the Translator . . . . . . . . . . . . . 64
8.1.5 Sources of Error for the Translational Results . . . . . . . 64
8.2 Discussion on Tools and Methodological Results . . . . . . . . . 65
8.2.1 Using the Alignment Tools . . . . . . . . . . . . . . . . . 65
8.2.2 Advantages and Disadvantages of Using I*Link . . . . . . 65
8.2.3 Specifics of I*Link as Sources of Error . . . . . . . . . . . 66
8.2.4 Suggestions for Improvements of the Tools . . . . . . . . . 67
8.3 Suggestions for Further Research . . . . . . . . . . . . . . . . . . 68
Bibliography 71
Chapter 1
Introduction
Translations and the original text they are supposed to be the equivalent of are
not exactly the same. They differ in many ways, some changes occur naturally
as the basic structure of different languages is not the same, and some are due
to choices made by the translator. However, the differences between the two
texts might not be caused just in the process of translation, but also by the
process of translation.
Studies have shown that there are structural differences not only between
a specific original and its translation, but also between translations and other
texts written in the same language in general (Baker 1996). Translated text
often has certain characteristics that sets it apart from other texts written in
the same language. These characteristics are claimed to be the result of a
subconscious process in translators to ensure that the text is understandable to
the new readers in the new context.
Translation is all about context. It is about taking one text out of its cul-
tural context and making it available to a whole new readership that is not
a part of that cultural context, and therefore cannot have the same vantage
point as a reader from the source culture reading the source text. Because of
this, in translating the words of the text, the translator must also take the for-
eignness of the text into consideration, and decide whether that is something
worth preserving for the foreign feel, or if it should be adapted to the target
culture readers. Over the years, the translation studies community has shifted
from favouring the source-oriented approach and its very close rendering of the
original text, to the target-oriented approach that focuses on readability and
achieving an equivalent effect in the target culture (Tabbert 2002). It is a shift
from smaller segments to larger and from closeness to ease of understanding.
The objects chosen for this study are the first four books in the astoundingly
successful Harry Potter series written by J.K. Rowling, and their translations
into Swedish. There are many reasons behind this choice, but a fact that makes
them so interesting to study is that they belong to the genre of children’s litera-
ture but have not been treated exclusively as such. They have attracted readers
both older and younger than the intended one, and through their success, they
1
2 CHAPTER 1. INTRODUCTION
have gained a unique status in children’s literature. Moreover, the Harry Potter
books belong to a sub-genre, namely fantasy.
There are two additional reasons to why the Harry Potter books were cho-
sen for this study. Firstly, they are all translated into Swedish by the same
translator, Lena Fries-Gedin. This fact makes it possible to study the books
contrastively, and see if there are any structural differences between the trans-
lations. The translations may have changed over time and therefore, it is inter-
esting to analyse the samples sequentially. With a formal description of actual
changes, it would be possible to ascertain if the translator’s work is consistent
over time or if changes can be detected.
Secondly, the four books were written, translated and published within a
relatively short period of time, which makes it more likely that any contrastive
differences between the samples are actually due to changes in the approach
of the translator, and not any of the other possible sources of change, such as
a change in the cultural climate due to long periods of time passing between
the publication of the original text and the translation. This makes it possible
to pose questions concerning whether the translator in some way develops over
time and whether that is traceable in the produced texts.
In order to study the translations, the samples of source and target texts
were aligned. Alignment is a method in which each sentence in the source text
is paired with the corresponding sentence in the target text. This method was
also used on a word-level, i.e. each word or cluster of words, depending on the
nature of the text segments, was paired with the corresponding units. This
allows for all translated segments to be linked together in units of source and
target words, and makes visible the words in the source texts that were omitted
in the translation process, as well as any words that were added to it, i.e. exists
in the target text but not in the original. Consequently, changes in the text
that occur in translation can be studied, which is why alignment is the chosen
method for the study. This study is data-driven, and the hypotheses took a
preliminary shape during the manual sentence alignment.
The purpose of this study is threefold. In the field of translation studies,
one purpose is to investigate whether the so called translation universals are
manifested in these texts, and if they are, what form do they take? The second
purpose is to contrastively study the samples to discover whether there are
detectable differences between them that could indicate that the translator’s
approach has in some way changed from the first book to the fourth.
The methodological purpose is to investigate alignment and to evaluate the
different alignment strategies used. Aligning as a method for studying transla-
tions is also evaluated, especially in relation to the new kind of information the
new type of alignment tools used in this study can provide in comparison with
traditional tools.
The hypotheses that will be investigated are:
• The translator’s style in translating the texts will have changed over time,
3
Background
It is difficult to find words to describe the success of the Harry Potter books, and
considering the number of copies sold in both English and various translations
world wide, perhaps an introduction seems superfluous. Nevertheless, the books
present a rather specific mix of two different worlds which presents difficulties
to both translators of the series, and to readers of this thesis that are unfamiliar
with the books. Therefore, a brief summary of some important aspects of the
series is provided below. In addition, an explanation is given to why the Harry
Potter books were chosen for this study.
5
6 CHAPTER 2. BACKGROUND
school things. This is the introduction to the magical world, for Harry as well
as for the reader.
Apart from this introduction to the magical world in the first book, the
books all follow more or less the same format. They start in Privet Drive, in
the summer holidays, with a bored and lonely Harry harassed by Dudley. As
the school year starts, Harry by some means, usually the chartered Hogwarts
Express leaving from platform 9 3 /4 at King’s Cross station, goes to the magical
world of Hogwarts where great adventures of different sorts happen. The books
end with a crisis and a sometimes bitter-sweet triumph for Harry in a fight in
which he defeats the Dark Side, i.e. Lord Voldemort or some of his followers.
Swedish translator has chosen to keep all names of the leading characters in
their original English versions, and has only translated minor characters and
some animal names.
As is obvious to any reader of the series, the length of the books has increased
with every new published piece. Particularly the later books that span 636
pages for the fourth book (Rowling 2001a) and 766 pages for the fifth book,
Harry Potter and the Order of the Phoenix (which is not included in the HP-
corpus), demand very much more of a young reader than ordinary children’s
fiction does. The length alone suggests that the books are meant to be read
by fairly accomplished readers with a certain amount of patience and stamina,
and for children perhaps even more so since no pictures or illustrations are used.
Moreover, as Harry Potter grows older (as he does with every book, because each
book describes the event of one school year), the plot becomes more complicated
and the demands on the reader therefore increase. Consequently, at least the
later books in the series merit discussion as novels, in my opinion, at least from
a purely literary perspective.
The conclusion of the discussion above is that first and foremost, the books
are fiction, as they portray fictitious events. Secondly, they contain many ele-
ments from the fantasy-genre. Thirdly, and naturally, they are children’s books.
In general, however, I state that they can be seen as novels targeted on both
adults and children. From a translational perspective, however, it is important
to consider the fact that at least one part of the targeted audience is children,
which will be explained in section 3.7.
Translation Theory
In this chapter, relevant theory from the translation studies field is presented.
The particular research questions investigated in this study are explained in
connection with the corresponding background theories.
9
10 CHAPTER 3. TRANSLATION THEORY
In the history of translation studies, much discussion has pivoted around the
concept of free and literal translation, and which one is to be preferred. Until the
beginning of the nineteenth century, a free style that emphasised the spirit and
sense of the text was favoured. After this, the study of cultural anthropology
dictated that language “was entirely the product of culture”, which brought
with it the idea that translation was nearly impossible, and that it at any rate
needed to be as literal as possible (ibid., p. 45). This rather extreme point
of view was gradually abandoned, however, and today, translations tend to
be more target oriented (Baker 1996). Moreover, in translation studies, the
prescriptive approach saying what a translation should be like has been replaced
by a descriptive approach, aiming instead to explain what a translation is really
like (Tabbert 2002).
produced text.
3.4.1 Explicitation
The theory of explicitation concerns the tendency in translations to “spell things
out rather than leave them implicit” (Baker 1996, p. 180). Explicitation can
be expressed syntactically or lexically. For example, translated texts tend to
have a higher degree of conjunctions than original texts. Lexical explicitation
can be made through various means, but oftentimes it is made by adding nouns
in order to explain some piece of information that needs to be explained to a
target culture reader.
Another possible manifestation of explicitation is the fact that translations
tend to be longer than their original texts. When translations become longer,
the additions to the ST are often made to explain features in the ST that
might not be known to readers in a TT-culture. Thus the translation becomes
more understandable than a more faithful rendering. This manifestation has
the advantage of being relatively easy to examine.
In this study, explicitation is thought to manifest itself in two ways. Firstly,
that the TTs are longer than the STs was evident on a very early stage. Sec-
ondly, if more information has been added to the target texts than removed
from the source texts, this also indicates that they have been explicitated.
3.4.2 Simplification
Simplification is the tendency of translated texts to contain simplified language
compared to the original text (ibid.). For example, long sentences are often
divided into several shorter ones.
One indicator of simplification is a relatively low lexical density, meaning
that the number of function words or grammatical words is high, in proportion
to the number of lexical words. Lexical words contain more information than
grammatical words, and using fewer lexical words means that the reader will
have to keep track of less information. Using less variated vocabulary is also
one manifestation of simplification.
Another possible sign of simplification is that punctuation tends to change in
translations. According to Malmkjaer (1997), punctuation is rateable on a scale
from weak to strong in the order comma, semicolon and full stop. In translations,
punctuation usually becomes stronger, in that commas are often translated into
3.4. TRANSLATION UNIVERSALS 13
semicolons or full stops, and semicolons are translated into full stops. If the
punctuation is stronger, it is highly likely that there are more sentences in the
TT than in the ST, which indicates that long and complex sentences have been
divided into several shorter ones, and thereby the complexity of the text has
been decreased.
In the HP-corpus, simplification is assumed to be manifested in long sen-
tences being divided into several shorter ones, stronger punctuation and the
removal of the regional dialects that some characters speak in (see discussion
below).
3.4.3 Normalisation
Normalisation or conservatism is what Baker calls the “tendency to exaggerate
features of the target language and to conform to its typical patterns” (1996,
p. 183). This can take the shape of the translator over-using clichés or typical
grammatical structures of the TL, often grammaticising elements of texts that
are ungrammatical in the source.
Normalising also involves adapting the punctuation to the typical usage of
the TL. For example, commas are used much more in English than in Swedish.
Ingo states that a Swedish reader is much disturbed by an overuse of commas,
and strongly recommends that the amount of commas is adapted to the usage
of the target language (1991). One of the ways in which normalisation will
be investigated in the HP-corpus is through the treatment of punctuation, and
whether or not any evidence can be found of it being adapted to fit Swedish
usage.
Another element of the Harry Potter books in which normalisation might be
manifested is in the treatment of the different dialects used for certain characters
in the source texts dialogues. Dialect “differs from person to person primarily
in the phonic medium” and “has to do with the user in a particular language
event: who (or what) the speaker/writer is” (Hatim & Mason 1990, p. 39). The
effect of changing a character’s dialect can be considerable, as in the French
version of the first Harry Potter book, where the dialect of Rubeus Hagrid has
been normalised and grammaticised (Davies 2003). In the English versions of
the books, Hagrid’s speech casts him as a “down-to-earth, simple, uneducated
and in some ways childlike character” but in the French version, his utterances
are “characterized by impeccable grammar and standard, even somewhat formal
vocabulary” (ibid., p. 82).
Dialect is a language variation that is dependent on the user, and Hatim
and Mason distinguish between idiolectical, geographical, temporal, social and
standard/non-standard variation (Hatim & Mason 1990). For the purpose of
this study, the main interest in dialect is the use of different geographical di-
alects, or accents. Accent is the variation in language that roughly corresponds
to the geographical origin of the speaker. Accents can carry ideological and
political implications that translators must be aware of, and because of this
translation of accent is problematic (ibid.).
14 CHAPTER 3. TRANSLATION THEORY
In the Harry Potter series, accent is used actively in the depiction of dif-
ferent characters, not only for Rubeus Hagrid, but also for Stan Shunpike, the
conductor on the Knight Bus in Harry Potter and the Prisoner of Azkaban
(Rowling 2000a). Through alternative spelling in the utterances of Hagrid and
Stan Shunpike, that clearly deviates from standard English spelling, Rowling
represents the phonic qualities specific to two very different geographical di-
alects.
Both dialects are to certain extents ungrammatical, and it could prove in-
teresting to see if the translator has chosen to grammaticise the utterances, or
adapted them to Swedish in some other way. Significantly, the dialects are very
different, and should this difference not have been retained in the target texts,
this is not only an instance of normalisation, but also of simplification, since it
decreases the complexity of the texts.
it is set in such a British environment and contains so many concepts that are
completely foreign to Swedish children.
2. has ambivalent texts, with both literal meaning and a deeper, interpretable
meaning.
3. is written and purchased by others than the primary readership, i.e. adults.
4. has many functions and cultural constraints, in that they are intended to
both entertain and educate.
The fact that the genre has two audiences has some interesting implications.
In the relationship between adults and children, the power is with the former
group, which is very much reflected in the area of children’s literature. Adults
write, edit, publish, market and buy the books that are intended for children,
which means that the primary audience is more or less without say when it
comes to what they read. Parents decide what is suitable for their child, but
children and adults are not likely to have the same taste in literature (ibid.).
Number two above, although worth investigating, is not something that will
be pursued further in this study, as it is more interesting to do so from a literary
angle.
Because works of this genre are produced in a more or less exclusively adult
environment, it is important for the adults in that environment to be very
much in touch with current children’s culture. In all literary production, the
writers, publishers, editors and indeed, translators, have to be aware of the
current trends in the culture for which they produce, which is not a trivial
matter, and in children’s literature, it is complicated by the fact that adults
cannot be equal members of the child community. Still, they must know and
understand the culture, in terms of what children find interesting, how they
speak and think, current vocabulary, and so on. Otherwise, the style of the
language used in the translation risks being dated, and the readers will notice
this. As Eirlys E. Davies points out, “translating for children may present more
of a challenge than translating for adults; young readers are perhaps less likely
to be tolerant of the occasional obscurity, awkwardness or unnatural-sounding
3.7. CONSTRAINTS ON TRANSLATION OF CHILDREN’S LITERATURE17
phrasing which adults, conscious that they are dealing with a translation, may
be more accepting of” (2003, p. 66).
Due to the educational goal of children’s literature, studying explicitation,
simplification and normalisation might be of particular relevance, as there is an
even greater need to make texts understandable for the readership in order to
meet with the goal to educate. One important part of the purpose to educate
is, as Puurtinen (1998) points out, that adults expect children’s literature to
help in the development of the child’s linguistic skills. Therefore, there might
be a stronger tendency for translators of children’s literature to normalise the
texts by grammaticising them, in order to avoid the readership learning faulty
grammar from the books.
18 CHAPTER 3. TRANSLATION THEORY
Chapter 4
Studying Translations
19
20 CHAPTER 4. STUDYING TRANSLATIONS
For small corpora like the HP-corpus, sentence alignment can be done quite
easily using basic word processing software such as Microsoft Word. For larger
collections of text, automatic tools are necessary.
1. Mark as many words as necessary on both the target and source side.
2. Mark as few words as necessary on both the target and source side.
22 CHAPTER 4. STUDYING TRANSLATIONS
Following the guidelines is supposed to ensure that all links have a two-way
equivalence between the source and target segments.
4.5 Non-1-to-1-operations
When aligning a corpus it becomes evident that some segments of the ST do
not have a one-to-one correspondence with a TT segment, and the annotator
is forced to link together segments in (usually) larger chunks. These non-1-to-
1-operations include additions, deletions, convergences and divergences (Merkel
1999).
The focus of this study is on the segments of both source and target texts that
do not have a corresponding segment in the other language, namely additions
and deletions. These are significant changes to the text made by the translator,
and in the aligning process, they lead to the annotator marking the segments
as NULL-links, i.e. segments without corresponding segments. This does not
apply to divergence and convergence, and they will only be mentioned briefly
below for completeness. All examples below are taken from the Harry Potter
corpus.
Example:
He rolled onto his back and tried to remember the dream he had been
having.
Han rullade över på rygg och försökte komma ihåg drömmen han hade
haft.
Example:
At last.
Äntligen.
Additions
Translators sometimes add information to the text, and those additions are
elements of the TT that are not present in the ST. The effect an addition
has on a text is to a great extent dependent on the linguistic nature of the
addition. It is reasonable to expect that added verbs, nouns and adjectives add
actual information, where as added pronouns can indicate that the translator
has in fact grammaticised the text. In the ideal case, the translator only makes
additions when it is absolutely necessary. However, this is not always the case,
as can be seen in the example below, where Fries-Gedin has added the equivalent
of long, a piece of information that is not motivated by the meaning of the source
word cloaks.
Example:
People in cloaks.
Folk i långa mantlar.
Deletions
Deletions occur in the aligned material when the translator has chosen not
to include some piece of information from the ST. The effect of a deletion is
usually that the text has been simplified. In the example below, around has
been deleted.
Example:
He looked around at Harry and Hermione.
Han såg på Harry and Hermione.
Should the source sentence contain a deletion and the target sentence an
addition, it can be reasonable to suspect that there might be a relationship
between the two.
is nothing wrong with it” (ibid., p. 36). Specifically, “mind particularly your
descriptive words: adjectives, adverbs, nouns and verbs of quality” (ibid., p.
36). Consequently, the use a translator makes of adding or deleting descriptive
words and segments to or from the text can be seen as a part of his or her style
of translating, and will be the focus of the investigation into how Fries-Gedin
uses addition and deletion in the samples.
3. neither less nor more specific and not equivalent, i.e. it has a different
meaning than the source item.
These definitions can also be termed a less specific shift, a more specific
shift, and an other lexical shift (ibid.). Examples of the different types of lexical
shifts are given below. The bold faced words are the source item and its cho-
sen translation. Gloss translations of the actual meaning of the chosen target
segments are given in the square brackets in the English sentences, illustrating
the lexical shifts (whelk has in Swedish been generalised into [seafood], it has
been specified as [The stench], and darkly has been changed into [quietly]).
Like additions and deletions, lexical shifts are significant changes made to
the text, and they are rarely necessary to make. Consequently, analysing trans-
lations in terms of lexical shifts can illustrate the influence of the translator on
the text.
4.7. PARAPHRASING AND LEXICAL CHOICE 25
Methodology
This chapter outlines how the HP-project was carried out, and describes the
specialised software tools that were used in the process. In addition, some
advantages of using these new types of alignment tools are explained.
27
28 CHAPTER 5. METHODOLOGY
5.2.1 I*Link
The word alignment system used in this study, I*Link, is interactive in that it is
used in collaboration with a human annotator in order to increase the efficiency
and performance of the tool. In collaboration with a human annotator, the
precision figure of I*Link is more or less 100 percent, which is necessary in this
study. In order to study the entire samples and search for patterns, the entire
samples including complex structures that are sometimes very difficult to align
must be as fully aligned as possible.
I*Link is a semi-manual alignment tool that uses information from bilin-
gual resources and built-in heuristics to suggest correspondence candidates for
alignment, which the user accepts, revises or rejects (Merkel et al. 2003). Any
element the tool cannot suggest a match for, the user chooses a match for man-
ually by clicking on the matching word, should one exist, and then presses the
“Match”-button. If no matching word exists, the user marks the element as
a NULL-link. I*Link uses machine learning techniques to store the choices of
the user in dynamic resources that are built during and used directly in the
linking process. Thus “the accuracy of the proposed word links is continuously
improved during and across word alignment sessions, which in turn means in-
creased efficiency” (ibid., p. 2). This is, however, dependent on the ability of the
user to be consistent in his or her chosen links. If the choices are inconsistent,
it will harm the learning effect and I*Link will not perform optimally.
In addition to the built-in resources, I*Link can be fed with user-specific dy-
namic resources. If the user has worked with the tool previously, the resources
collected from those sessions can be used as an additional knowledge base for
the system, which should enhance the performance of the system. I*Link auto-
matically collects statistical data on the performed translational actions.
The graphical interface of I*Link consists of four windows: the Link Panel, the
Link Table Panel, the Resource Panel and the Settings Panel. The Link Panel
in figure 5.1 is the window in which the current sentence pair is presented, the
source sentence in the upper half and the target sentence in the lower half. It is
in this window that the user can accept or reject the automatic proposals and
select links manually. Chosen links are marked using corresponding colours, and
are also shown in the Link Table Panel in figure 5.2. Additions and deletions
can be marked as NULL-links by right-clicking with the mouse on the word or
words, and choosing NULL.
5.2. A PRESENTATION OF THE TOOLS 29
In the centre of the Link Panel, directly below the windows where the source
and target sentences are shown, some important pieces of information are dis-
played. The box in the middle contains the number of the current sentence
pair, in this case number 1258. The green pieces of text on both sides of this
box says “Source completed” and “Target completed” when both sentences are
fully aligned and the “Done”-button is pressed. This is significant since the
advantage of this system is that full and complete alignment can be achieved,
and it is thus important to be able to verify that all tokens in each sentence
have been aligned before moving on to the next sentence.
The eight fields in the lower left corner of the Link Panel window show
linguistic data on the current link on four levels: word form, base form, POS
and the function the word or words have in the sentence.
The Resource Panel and the Settings Panel were not used actively in this
project. Descriptions of these panels are available in Merkel et al. (2003).
5.2.2 I*Trix
Another word alignment tool that was used in the study is I*Trix, which differs
from I*Link by being a tool with which fully automatic alignment can be done.
The sample to be aligned is run through I*Trix, which links whatever it can
in the sample. The output can then be manually post-edited in I*Link by the
user, in order to correct mistakes and achieve a complete alignment where all
tokens in the sample are aligned. Like I*Link, I*Trix can be fed with user-
specific resources built up in previous sessions using I*Link in order to enhance
the performance of the tool.
that the old framework for analysis is less useful, which entails that this study
differs somewhat from traditional studies.
Traditionally, other measurements were used, such as type-token ratio and
lexical density (Baker 1996). The main purpose of these measures is to investi-
gate translation in a broader perspective and to describe general principles that
can be found in translations. In contrast, the tools used in this study makes
it possible to systematically analyse particular translations in a more power-
ful way than was possible with traditional tools. Consequently, the methods
used in this study are not suitable for investigating translations in general, but
are very well suited for making a more thorough investigation of one or more
translations.
32 CHAPTER 5. METHODOLOGY
Chapter 6
In this chapter, the building and aligning of the HP-corpus are described. The
four different alignment processes are described and discussed in some detail, so
as to explain how the different strategies affect the process.
Table 6.1: The names of the books the samples are taken from, and the names
of the corresponding samples.
Samples of the first 20000 words in each ST were chosen, rounded to the
nearest chapter. There were several reasons as to why only whole chapters were
used in the samples, the perhaps most important one being that in order to
study the translations contrastively, the semantic integrity of the texts needed
to be preserved. This was also why the samples all contain the beginnings of
33
34 CHAPTER 6. THE MAKING OF THE HP-CORPUS
the four books, as it was deemed more difficult to track the translator’s change
unless the same part of the different books were being studied. In addition,
the beginning and ending of chapters have specific characteristics. The extent
of the resulting samples in the number of tokens and chapters included can be
seen in table 6.2. The total token count in the corpus is 189116 tokens.
Table 6.2: The respective sizes (in number of tokens) of the samples in the
HP-corpus, and the number of chapters in each sample.
Table 6.4: The different strategies used in the alignment of the samples.
it. In addition, trying to learn the system before starting to align the samples
was, of course, positive both regarding the attempt to be consistent and the
over-all quality of the chosen links.
to the marginal effect it would produce. Such imperfections also exist in the
other samples, and were treated in the same way.
HP2 was automatically aligned using the built-in resources of I*Trix, and the
data from this second session was post-edited using I*Link. The alignment of
HP2 required 16.5 hours.
The difference in the strategies used for HP1 and HP2 had some interesting
implications. When only using the buttons Match, Accept, Reject and Done in
I*Link, as in HP1, the links are presented in turn, and with enough pushing
of the buttons, either a match can be found or the word is treated as a NULL
link. This means that the links get coloured one by one, and it is therefore easy
for the annotator to keep track of the segments that are linked together. HP2,
however, was aligned automatically using I*Trix, which means that in the post-
editing session using I*Link, the links already matched by I*Trix were already
coloured. Sometimes, words that stand next to each other in the sentences but
are not a part of the same link can have very similar colours. This means that
there is a risk of accepting links made by I*Trix that should not be accepted,
because the eye does not pick up on the slight difference between the colour of
the matched links. To me, this meant that in aligning HP2, I had to be very
careful, and quite a few sentence pairs were aligned before I discovered this, and
so the sentences that I aligned while still ignorant of this had to be post-edited.
This was done in the same session, as soon as it was discovered, and the required
time is included in the 16.5 hours it took to align the whole sample.
Another consequence of the colour scheme was that in linking HP2, I learned
to use the Link Table Panel. In this window, all linked pairs appear after the
Accept-button is pressed, and it is possible to check the links as you go along. In
HP1, I did not use this as I did not need it, but for the samples pre-aligned with
I*Trix, it became indispensable as it diminished the problem with neighbouring
links of almost the same colour.
One idea that occurred to me after aligning the first 350 sentence pairs
of HP2 was that if one wanted to use the same guidelines that the heuristics
of I*Link and I*Trix are based on, it might have been better to start with
the strategy that was used for HP2, i.e. using I*Trix to pre-align HP1. The
matches in the system’s output quite often differed from my own choices for
matches, and this made me start to doubt my reasons for not using the exact
same heuristics as the system. My theory is that using the same guidelines
and starting with a pre-aligned text might have ensured the consistency of the
chosen links, as it is much easier to just accept what the system suggests and
simply correct the mistakes and link what the system has not been able to link.
For the unexperienced annotator, this could be used as a way to simplify the
process and ensure a greater consistency.
6.3. COMMENTS ON THE ALIGNMENT PROCESS 37
the effect of the PC-capacity, a simple time-test was done on the PCs, and the
summary of the time required for aligning the different samples in table 6.5 has
been modified to accommodate for that difference.
Table 6.5: The different strategies and the time required for aligning each sam-
ple.
the beginning of HP1, probably due to the fact that a heuristic used in I*Link
always suggests that such constructions should be aligned as one link, not two.
At the very start of the aligning I was apparently too preoccupied with handling
the system to notice that I was not following my own guidelines. Because of
this, HP1 was post-edited in order to make the links conform to the patterns
used in HP2, 3 and 4. The post-editing session, in which all 1768 sentence pairs
were checked and mistakes corrected, required 3 hours.
40 CHAPTER 6. THE MAKING OF THE HP-CORPUS
Chapter 7
Results
The first part of this chapter briefly states that the HP-corpus is a result in
itself. The second part describes the translational results of the analysis of the
corpus and some complications that occurred during the analysis. I have chosen
to give a more detailed description of the analysis because there is no ready-to-
use framework for studies like this, as mentioned in the introduction. The third
part of the chapter presents the methodological results that concern the tools
and strategies used in the project.
41
42 CHAPTER 7. RESULTS
Finally, a way of organising and presenting the data obtained in this study using
semantic mirroring is presented.
EN, ING, AD and NDE marked tokens. In table 7.1, V2 thus contains nulled
tokens in the TTs marked V, AD and NDE, because all additions are, naturally,
made in Swedish and cannot be ING or EN. Consequently, V2 in table 7.2 thus
includes nulled tokens from the STs tagged V, EN or ING.
Table 7.1: The percentage of additions in the samples. For each investigated
word class, the number shown is in relation to the total amount of words of that
word class in the source texts.
Table 7.2: The percentage of deletions in the samples. For each investigated
word class, the number shown is in relation to the total amount of words of that
word class in the source texts.
The results are not homogeneous, however, as there are differences in the
distributions of additions and deletions. Figure 7.1 and figure 7.2 are included
to show the results in a directly perceptible way 1 . In studying these, it is
obvious that there are few deletions for all the word classes in HP1, but for
additions, the results have a much greater range for the different word classes.
1 The lines in the figures are the straight lines that minimise the sum of the squared distances
between each line and its four corresponding dots. The purpose of the lines is to visualise
trends in the data, mainly to illustrate the sequential differences between the samples. The
underlying data is displayed in tables 7.1 and 7.2.
44 CHAPTER 7. RESULTS
30%
+
Pronoun
⊕ ×
25% Noun
⊕
⊕ ⊕
Adverb
+
20% ♦
Adjective
♦ ◊
⊕ + ∆ Verb 2
+
15% ∆
◊ Verb
∆
+ ◊
10% ∆
♦ ♦
◊
×
∆
♦ ×
◊ ×
5%
×
0%
HP1 HP2 HP3 HP4
30%
+
Pronoun
×
25% Noun
⊕
Adverb
♦
20% Adjective
◊
Verb 2
⊕ ∆
15% Verb
10% ∆
+
⊕ ⊕ ◊
♦
+ ∆
◊
∆
◊ +
5% ⊕ ♦ ♦ ×
∆
+
◊ ×
♦ ×
×
0%
HP1 HP2 HP3 HP4
Table 7.3: A representation of additions for the different word classes. The
added information is shown in bold face in the target sentence column. A gloss
translation of the information that has been added is given in the square brackets
in the source sentence column.
Table 7.4: A representation of deletions for the different word classes. The
deleted information is shown in bold face in the source sentence column.
7.2. TRANSLATIONAL RESULTS 47
source sentence reads: “He was wearing what appeared to be a golfing jumper
and a very old pair of jeans, slightly too big for him and held up with a thick
leather belt”. The target is: “Han var iförd något som såg ut som en golftröja
och ett par slitna jeans, som var lite för stora för honom och därför hölls uppe
av ett tjockt läderbälte”. Here, very old, which should be mycket gamla in
Swedish, has instead become slitna, the equivalent of worn. In this example, it
is evident that the Swedish and English constructions are not equivalent as closer
translations are possible. However, there is still some degree of correspondence
between the deleted and added elements, as the meaning of the Swedish word
at least has some kind of semantic relationship with the meaning of the source
words. Consequently, this combination of addition and deletion is in fact a
lexical shift.
Only add. Only del. Both add. and del. Neither add. nor del.
25 18 68 39
Table 7.5: Results of the close investigation of the last 150 sentence pairs of
HP4.
However, this surface relationship does not say anything about whether or
not there is a relationship between the additions and deletions in the sentence
pairs that contain both. It would be reasonable to expect that if there is a
48 CHAPTER 7. RESULTS
relationship between an addition and a deletion, the words involved will often be
of the same word class. A detailed presentation of the distribution of additions
and deletions for each word class is given in table 7.6.
Word class Add. Del. Both add. and del. Neither add. nor del.
Verb 45 32 24 97
Adjective 19 8 3 126
Adverb 37 24 11 100
Noun 26 8 3 119
Pronoun 43 27 12 92
Table 7.6: Distribution of additions and deletions for the investigated word
classes in the subsample.
Explicitation
For the core purpose of this study, explicitation was expected to be manifested
primarily in two ways. The first was that if there were more additions than
deletions in the samples, this could be seen as an indication of explicitation.
Looking at the combined effect of table 7.1 and table 7.2, there is indeed more
added than deleted information in the samples. However, addition of nouns
is, as mentioned in section 3.4.1, considered to be a typical manifestation of
explicitation. In the HP-corpus, the number of additions is lower for the noun
category than for any of the other categories of words. This fact is an indication
that explicitation is not so strongly manifested in the HP-corpus, at least not
for the traditional type of lexical explicitation by the addition of nouns.
Notwithstanding this, the second expected manifestation of explicitation in
the corpus follows logically from the first, in that the translated texts were likely
to be longer than the original texts. As early as during the sentence alignment, it
became clear that the samples seemed to conform to the tendency of translations
to be longer than their originals. As can be seen in table 7.7, there are more
tokens in all the TT-samples, compared to their STs. This increase, although
consistent, struck me as being smaller than expected, as the impression during
the sentence alignment was that the Swedish texts were noticeably longer. A
50 CHAPTER 7. RESULTS
possible explanation for this will be given in section 8.1.3 in the discussion
chapter.
Table 7.7: The respective sizes (in number of tokens) of the samples in the
HP-corpus, and the difference in number of tokens.
Simplification
One of the possible manifestations of simplification investigated for the HP-
corpus is the punctuation, and whether or not it has been strengthened. As can
be seen in table 7.8, there is indeed evidence of strengthened punctuation in the
7.2. TRANSLATIONAL RESULTS 51
samples. Commas and semicolons have been changed to full stops, and signifi-
cantly, there is a change over time concerning the strengthening of punctuation
markers. The implication of this is that the texts have become more simplified
from HP1 to HP4.
Normalisation
As mentioned above, normalisation is manifested in the way the translator has
chosen to treat the dialects of Rubeus Hagrid and Stan Shunpike. In addition
to this, two other kinds of manifestations of normalisation have been found in
the HP-corpus.
7.2. TRANSLATIONAL RESULTS 53
Firstly, there are sentences in the corpus that have been made more gram-
matically correct in the translations than they were in the source texts. One
example of this is sentence pair 1225 in HP4. The original sentence reads:
“’Long walk, Arthur?’ Cedric’s father asked”. The Swedish sentence has been
grammaticised by completing the sentence. The Swedish equivalent of did you
have a has been added before long walk, as in “”Hade ni långt att gå, Arthur?”
frågade Cedrics far.” Another example of normalisation through grammaticis-
ing an ungrammatical utterance is in the top row of table 7.3, where the verb
has been added, making the Swedish sentence complete.
Secondly, normalisation can also be manifested in the translation of punc-
tuation markers. Translators tend to adapt the usage of punctuation markers
to fit better with the target language usage, and evidence of this have been
found in the HP-corpus. Particularly interesting is the treatment of semicolons
in the translations. Semicolons are not very common in original Swedish texts,
especially not in children’s literature, but as can be seen in table 7.8, many
semicolons are kept in the target versions of the HP-samples. In comparing
the numbers for the respective samples, it is evident also that there has been a
change over time in the treatment of semicolons.
In HP1, 28 semicolons have been retained, and very few other changes have
been made to this particular punctuation marker. In HP4, only 13 semicolons
have been retained. Moreover, 31 semicolons have been changed into full stops
and 12 into commas. For HP1, the corresponding figures are much lower, as no
semicolon has been changed into a full stop, and only one semicolon has become
a comma.
Also concerning commas there are indications of normalisation. As is evident
in table 7.10, many commas have been omitted in the target texts, which is also
an indication of normalisation because commas are used much more frequently
in English, compared to Swedish. The conclusion I draw from this is that
in the usage of syntactic markers, the texts are normalised through adapted
punctuation, and the tendency for the translator to normalise the texts in this
way has increased over time, at least regarding semicolons.
choices and situations are presented below. These examples are also meant to
illustrate that the choice of translator does have a real effect on the produced
translation, an effect that is manifested in the particular translational choices
of that translator.
Lexical Choice
During the manual word alignment, I noticed that Fries-Gedin used two alterna-
tive translations for wand, namely trollstav and trollspö. Fries-Gedin has opted
for using trollstav when the carrier of the wand is male, and trollspö when the
carrier is female (see examples from HP2 in table 7.11).
parts of the series. The cupboard under the stairs is a well-known concept to
any Harry Potter reader, as it is what functions as Harry’s room in the Dursley
house in the beginning of the series. Later on, it is where Harry’s Hogwarts
things are kept when he is home for the holidays. The cupboard under the
stairs is not translated consistently throughout the corpus. The construction
can be found in all four samples, and in HP1 the full construction is translated
as skrymslet under trappan. In other instances, where the source consists of
only cupboard, the Swedish translation is krypin. Both skrymsle and krypin are
in some cases modified with the adjective trånga, denoting narrow in English.
In HP2, HP3 and HP4, the full construction is translated as skrubben under
trappan, and shorter versions as skrubben. Why this change has been made is,
naturally, impossible to say without asking the translator, but it is an indication
that she is not averse to change, if it is called for. In this case, I argue that it
is a change for the better, as skrymsle denotes a very small and narrow space,
generally impossible to close off with a door, corresponding more closely to the
English nook than to cupboard. Skrubb, on the other hand, is a more likely
description, since it denotes a rather small, closed-off space, but still giving the
impression of being large enough to hold an eleven year old boy.
equivalent of both Apparition and the infinitives of the associated verbs, which
has some interesting implications. The first half, spök- is derived from the
Swedish word for ghost, spöke, which is very close in meaning to the original
meaning of apparition. Transferens is a neologism that Fries-Gedin has probably
built on the word transferering, denoting a transfer of some resource, usually
money. Spöktransferens works as an equivalent of the infinitives of the verbs to
Apparate/Disapparate, but it does not work in the active sense, when somebody
Apparates, or Disapparates. The chosen translation for the active senses is
använda sig av spöktransferens, the equivalent of to use ghost transferal, which
is a cumbersome construction. In table 7.12, sentence pair 1188 shows the
relationship between Apparition and spöktransferens. In 1189 the difficulties in
translating constructions containing Apparition is illustrated. Fries-Gedin has
in this case chosen to paraphrase and simplify by changing the Apparition point
to the Swedish equivalent of for the purpose.
Lexical Patterns
From the resources built in the alignment, it is possible to create alphabetical
lists of the words and their translations. By investigating these lists, lexical
patterns of how words are translated can be discovered. In instances where
words have many and diverse translations, this is generally a sign that they have
been difficult to translate. Because the HP-books portray a complex, magical
environment with quite detailed vocabulary, it would be reasonable to expect
that words specific to this world might have been especially challenging for the
translator. Therefore, a closer investigation was made into the translation of
vocabulary typical to this domain.
Whether a person is wizard or Muggle is paramount in the Harry Potter
world. Muggle is consistently translated with the neologism mugglare. In the
cases where it is a part of a longer noun construction, such as Muggle clothes,
this, equally consistently, becomes a compound noun in Swedish, mugglarkläder,
with the stem mugglar-.
Similarly, wizard is translated as trollkarl in the absolute majority of the
cases. When it is a part of a longer noun construction, such as in the wizarding
bank, this becomes a compound noun in Swedish with the stem trollkarls-, in this
particular case trollkarlsbanken. One rare exception is wizard gold, translated as
trollmynt, which changes the meaning of the word, since troll in Swedish means
exactly what troll does in English. The second part of the word is also changed,
as gold is translated into the equivalent of coin.
Patterns of consistency, as well as patterns of inconsistency, become apparent
when investigating the resources in this way. For example, noun constructions
with magic are oftentimes translated into a compound with trollkarls- as the
stem, i.e. the same stem as used for translating compounds containing forms
of wizard. The translation equivalent of magic is magi, but the word has eleven
different translations (see table 7.13). Magical, however, has fewer translations,
but they are built both around the magi and the troll stems. Judging from
the amount of translation equivalents for words about wizards and magic, the
7.2. TRANSLATIONAL RESULTS 57
magical element of the Harry Potter world seems to have caused a problem in
the translation process.
An interesting lexical pattern of inconsistency apparent in the HP-corpus
is the translation of the Dursleys. Perhaps surprisingly, it has ten different
translations ranging from similar constructions equivalent to the Dursley couple
and the Dursley spouses to equivalents of his uncle or aunt, them and the others
(see table 7.13). This illustrates a difference between English and Swedish; in
Swedish a plural form of a family name is not used as consistently to describe
the unit of that family as it is in English, which could account for the many
different translation alternatives.
Table 7.13: The patterns of translations for certain typical Harry Potter related
words.
Semantic Mirroring
The resources built during the word alignment can also be used to create more
powerful resources than the alphabetical lists, such as semantic mirrors. With
semantic mirrors, it is possible to extract information about the translations
that is not available in I*Link in itself.
Semantic mirroring of the resources built up in the HP-project was made by
Helge Dyvik at the University of Bergen, Norway, and this resulted in two addi-
tional means for studying the material. One is a thesaurus-like file that shows
all the words in the corpus that have many different translation alternatives,
and does not contain words with few translation alternatives. As mentioned
above, a large number of translation alternatives for a certain word indicates
that it has been difficult to translate, as there is not only a few possibilities for
equivalence. Thus the thesaurus shows words that have been translated incon-
sistently. The other is a search tool that makes it possible to search for specific
words in the thesaurus. For a full description of semantic mirrors, see Dyvik
(2003).
58 CHAPTER 7. RESULTS
grinning
(Translation: log. )
Synonyms: grinned—1—
gripped
(Translation: tog. )
Synonyms: withdrew.
grow
(Translation: blev. )
Synonyms: became.
growled
(Translation: röt, brummade. )
Synonyms: roared—1—, snarled—1—
Related words: barked, bellowed.
grudgingly
(Translation: motvilligt. )
Synonyms: resentfully.
For the particular purpose of this study, the semantic thesaurus and search
tool were used to some extent in the investigation of lexical choice. However,
the main contribution of these two resources are perhaps to simplify searching
the material and browsing the thesaurus for those with a particular interest, be
it in semantic relationships of translations or in the Harry Potter books.
Table 7.14: The different strategies used in the alignment of the samples.
The basis for the evaluation of the strategies presented above is their effi-
ciency, measured in the time it took to align each sample. This is because manual
aligning is, as already mentioned, despite its advantages, very time-consuming.
Consequently, it is relevant to consider if any one strategy decreases the time
required by the aligning more than the others. As is evident in table 7.14,
HP3 required the least time to align, and HP1 the most. Running a sample
through I*Trix takes only a few minutes, so this time can be disregarded in the
comparison between the different strategies.
One of the two most obvious explanations to why HP1 required so much
time is that it was the largest sample in terms of number of tokens. The other
explanation is the fact that it was the first sample to be aligned, and a certain
amount of the time was spent dealing with insecurities in using I*Link and
trying to maintain consistency.
However, in investigating the efficiency of the strategies, it is of course pivotal
to take the exact sizes of the samples, i.e. the token-count, into consideration.
In table 7.15 below, such a comparison is made.
Table 7.15: The efficiency of the different strategies, in relation to sample size.
The words/sentence count is the mean number of words per sentence on the
source side.
The fact that there is such a great difference between the time required to
align HP3 in comparison with the other samples is of course the most relevant
finding of the strategy evaluation. The implication of the results is that it
appears to be most efficient to align a subsample or a part of the corpus, use the
60 CHAPTER 7. RESULTS
dynamic resources from that session to automatically align the next subsample
in I*Trix, and then revise those results in I*Link.
Chapter 8
Discussion
First of all, I would like to point out that this thesis is in no way intended to
be a value-judgement of the translations or the translator. As Newmark points
out (1988), translations must be discussed as they are always made subjectively,
and this thesis is a mere discussion of the translation of Harry Potter, also made
from a subjective viewpoint.
That being said, the results unanimously indicate that there are indeed
significant changes between the translations in relation to their respective orig-
inals, and that these changes increase sequentially. Additionally, the universals
of translation are manifested in the corpus. Moreover, the different strategies
used in the alignment process gave different results concerning their efficiency,
and in summary, the third strategy seems to be the most efficient, at least in
the case of this annotator. In other words, all the hypotheses stated in the
introduction have been verified by the results, which is highly encouraging.
Following the format of the last chapter, the discussion will be divided into
three main sections. The first section will discuss the translational results. The
second will deal with the methodological results, and will also include some
additional discussion about my experiences of I*Link and using this set of tools
for projects of this kind. The third and final section contains suggestions for
further research.
61
62 CHAPTER 8. DISCUSSION
In the final hours of analysis carried out, another FDG-related problem was
discovered. In sentence pair 1125 in HP4, the Swedish råkade, a verb equivalent
to happened to, has not, as expected, been tagged V, but instead A, for adjective.
Due to this discovery, all word class tags for all tokens of the last 150 sentence
pairs of HP4 used in the close investigation of additions and deletions were
checked manually. It was found that for both the source and the target sides,
less than 2 percent of the tokens carried a faulty tag. If this is indicative for the
whole HP-corpus, the FDG precision rate is approximately 98 percent. Based
on this, I conclude that the few flaws concerning FDG-tags present in the HP-
corpus are not likely to have affected the reliability of the results to any great
extent.
Because the focus of this study is in part on the influence of the translator on
the translated text, I have tried to distinguish between small and significant
changes, which ties into the concepts of lexical shifts, additions and deletions.
In my definition of necessary lexical shifts, it is only natural that they have
no closer or more accurate translations. In other words, these can be treated as
regular translational equivalents, because to all intents and purposes, they are,
as long as the meaning of the source words could not have been more preserved
in any other target construction.
Concerning the nature of unnecessary, or voluntary, changes made to the
texts, the close investigation revealed that at least regarding verbs, many addi-
tions and deletions are used in combination, as pairs. They are in fact lexical
shifts or parts of paraphrases, not regular additions and deletions.
Because of the fact that significant lexical shifts seem to be rather common,
I feel that it would be positive to be able to distinguish them from regular
additions and deletions, at least if the focus is on the degree of change. In
my opinion, this could be done by making it possible to mark lexical shifts in
alignment programs such as I*Link. In the analysis, all types of lexical shifts
could then be treated as indicators of change in the translation, and other lexical
shifts could be analysed as having a similar effect to additions and deletions.
Implementing lexical shifts in I*Link would enrich the analysis of the corpus
as an even more fine-grained analysis could be made into the nature of the
changes the translator has made to the target text. Especially, more specific
lexical shifts could be seen as clear indicators of explicitation. In addition, the
changes that are simply more free in relation to the source text than necessary
could be distinguished, and this could in turn be used to measure how free the
translation is in relation to the original text. One possible disadvantage could
be that it might make the aligning more time-consuming due to the time that
would be spent distinguishing the different kinds of lexical shifts.
8.1. DISCUSSION ON THE TRANSLATIONAL RESULTS 63
is obvious in table 7.10). Each comma is counted as one token, just like each
word is counted as one token. This is of course true for all punctuation markers.
In some instances, there might be a relationship between the added and
deleted commas, like between added and deleted words. Some of the added
commas could be replacements for deleted ones. If commas were moved within
the same sentence or sentence pair, I tried to be consistent in linking them as
each other’s equivalent, notwithstanding the fact that the comma was moved.
If the number of added commas is subtracted from the number of deleted
commas, the difference equals the number of tokens that were commas in the
source text, but are not commas in the target text. This way of analysing the
use of commas using the data in table 7.10 indicates that even if there are only
521 more tokens in the TT of HP1, there might, in effect, be many more words,
since 404 commas have been removed and only 57 added. In other words, the
target texts might have been more explicitated by containing more words than
a simple comparison of token counts between the samples reveals.
Above and beyond all, what a system such as I*Link provides to the field
of translation studies is the ability to fully align and investigate a large corpus
of texts in a structured way, using the tools integrated in the system. It makes
it possible to search the material and get structured output in mere seconds,
once the alignment is done. In my opinion, the challenge is to find a way to
analyse the material and the outputs so that scientifically interesting results can
be presented. A risk with all systems that provide a lot of results in the shape
of numbers and statistics is that it is tempting to over-use the possibilities for
making calculations. Consequently, the researcher must be very focused on the
scope of the study and avoid presenting all figures that can be calculated on the
aligned material.
In my personal experience, the built-in heuristics can also cause problems
in some cases, because if the user chooses another strategy than the one pre-
programmed in I*Link, the system does not respond to this as quickly as perhaps
desirable, but continues to suggests links that are preferable according to the
heuristics. This can be very frustrating to the user and the risk is that the
system by continuously working against the user dominates the choice process
and convinces the user to adapt to I*Link, which means that the links will be
less consistent than necessary.
The specific situation in which this caused a problem to me was described
in section 6.2.3, and relates to the linking of proper names. Had I instead of
my own strategy chosen to make one link of a character’s name and surname,
as I*Link is built to do, the automatic heuristic would have been very helpful,
and could have aided me in keeping my links consistent.
much domain specific language use and many neologisms. Consequently, more
research into translation of fantasy and fiction is needed.
70 CHAPTER 8. DISCUSSION
Bibliography
Bergius, H. (2003), ‘Hon tar sig friheter med Harry Potter’, Dagens Nyheter,
July 27 2003 .
Borin, L. (2002), ...and never the twain shall meet?, in L.Borin, ed., ‘Parallel
Corpora, Parallel Worlds’, Rodopi B.V., Amsterdam.
Hatim, B. & I. Mason (1990), Discourse and the translator, Longman, London.
Holmes, J.S. (2000), The name and nature of translation studies, in L.Venuti &
M.Baker, eds, ‘The Translation Studies Reader’, Routledge, London.
71
72 BIBLIOGRAPHY
Titel Att spåra översättningsuniversalier och översättarutveckling genom att ordlänka en Harry
Title Potter-korpus
Tracing Translation Universals and Translator Development by Word Aligning a Harry Potter
Corpus
Författare Sofia Helgegren
Author
Sammanfattning
Abstract
For the purpose of this descriptive translation study, a translation corpus was built from roughly the first
20,000 words of each of the first four Harry Potter books by J.K. Rowling, and their respective translations
into Swedish. I*Link, a new type of word alignment tool, was used to align the samples on a word level and
to investigate and analyse the aligned corpus. The purpose of the study was threefold: to investigate
manifestations of translation universals, to search for evidence of translator development and to study the
efficiency of different strategies for using the alignment tools.
The results show that all three translation universals were manifested in the corpus, both on a general pattern
level and on a more specific lexical level. Additionally, a clear pattern of translator development was
discovered, showing that there are differences between the four different samples. The tendency is that the
translations become further removed from the original texts, and this difference occurs homogeneously and
sequentially. In the word alignment, four different ways of using the tools were tested, and one strategy was
found to be more efficient than the others. This strategy uses dynamic resources from previous alignment
sessions as input to I*Trix, an automatic alignment tool, and the output file is manually post-edited in
I*Link.
In conclusion, the study shows how new tools and methods can be used in descriptive translation studies to
extract information that is not readily obtainable with traditional tools and methods.
Nyckelord
Keyword
word alignment, translation universals, translator development, corpus, additions, deletions