CSLI Studies in Computational Linguistics ONLINE, CSLI Publications, 2005
A paper pre-print of this electronic publication was presented to Professor Kimmo Koskenniemi at ... more A paper pre-print of this electronic publication was presented to Professor Kimmo Koskenniemi at a special Colloquium on Friday, September 2nd, 2005, at the Auditorium of the Arppeanum building of the University of Helsinki, arranged in association with the Workshop on Finite State Methods for Natural Language Processoring (FSMNLP)< http://www. ling. helsinki. fi/events/FSMNLP2005/>. This final electronic version of the publication contains a few minor revisions, as compared to the pre-printed paper version. ...
Abstract This project report describes a multilingual wordnet initiative embarked in the META-NOR... more Abstract This project report describes a multilingual wordnet initiative embarked in the META-NORD project and concerned with the validation and pilot linking between Nordic and Baltic wordnets. The builders of these wordnets have applied very different compilation strategies: The Danish, Icelandic and Swedish wordnets are being developed via monolingual dictionaries and corpora and subsequently linked to Princeton WordNet.
Abstract Corpus-based treebank annotation is known to result in incomplete coverage of mid-and lo... more Abstract Corpus-based treebank annotation is known to result in incomplete coverage of mid-and low-frequency linguistic constructions: the linguistic representation and corpus annotation quality are sometimes suboptimal. Large descriptive grammars cover also many midand low-frequency constructions. We argue for use of large descriptive grammars and their sample sentences as a basis for specifying higher-coverage grammatical representations.
In the theory of linguistic morphology, morphemes are considered to be the smallest meaning-beari... more In the theory of linguistic morphology, morphemes are considered to be the smallest meaning-bearing elements of language, and they can be defined in a language-independent manner. It seems that even approximative automated morphological analysis is beneficial for many natural language applications dealing with large vocabularies, such as speech recognition and machine translation. Many existing applications make use of words as vocabulary units.
Abstract. HFST–Helsinki Finite-State Technology (hfst. sf. net) is a framework for compiling and ... more Abstract. HFST–Helsinki Finite-State Technology (hfst. sf. net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications.
Abstract This paper presents a simple method for finding new synonym candidates for a bilingual w... more Abstract This paper presents a simple method for finding new synonym candidates for a bilingual wordnet by using another bilingual resource. Our goal is to add new synonyms to the existing synsets of the Finnish WordNet, which has direct word sense translation correspondences to the Princeton WordNet. For this task, we use Wikipedia and its links between the articles of the same topic in Finnish and English. One of the automatically extracted groups of synonyms yielded ca. 2,000 synonyms with 89% accuracy.
Abstract This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN... more Abstract This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN), and constructing the FiWN database. FiWN was created by translating all the word senses of the Princeton WordNet (PWN) into Finnish and by joining the translations with the semantic and lexical relations of PWN extracted into a relational (database) format. The approach naturally resulted in a translation relation between PWN and FiWN.
Yleisen kielitieteen laitoksella on meneillään hanke, jonka tavoitteena on kehittää opiskelijapal... more Yleisen kielitieteen laitoksella on meneillään hanke, jonka tavoitteena on kehittää opiskelijapalautteen keruuta laitoksellamme. Oheinen sähköinen lomake on osa hankkeen pilottivaihetta. Lomakkeella kerätään palautetta sekä opiskelemistanne kursseista että itse palautelomakkeesta. Palaute menee kurssin vastuuopettajalle ja kurssin luennoitsijalle, jotka käyttävät sitä kurssin kehittämiseen.
This paper presents a simple method for finding new synonym candidates to a bilingual wordnet by ... more This paper presents a simple method for finding new synonym candidates to a bilingual wordnet by using another bilingual resource. Our goal is to add new synonyms to the existing synsets of the Finnish WordNet, which has direct word sense translation correspondences to the Princeton WordNet. For this task, we use Wikipedia and its links between the articles of the same topic in Finnish and English.
This project report describes a multilingual wordnet initiative embarked in the META-NORD project... more This project report describes a multilingual wordnet initiative embarked in the META-NORD project and concerned with the validation and pilot linking between Nordic and Baltic wordnets. The builders of these wordnets have applied very different compilation strategies: The Danish, Icelandic and Swedish wordnets are being developed via monolingual dictionaries and corpora and subsequently linked to Princeton WordNet.
Abstract We outline the design and creation of a syntactically and morphologically annotated corp... more Abstract We outline the design and creation of a syntactically and morphologically annotated corpora of Finnish for use by the research community. We motivate a definitional, systematic “grammar definition corpus” as a basic step in an threeyear annotation effort to help create systematically documented extensive parsebanks. The syntactic representation, consisting of a dependency structure and a basic set of dependency functions, is outlined with examples.
This paper introduces the META-NORD project which develops Nordic and Baltic part of the European... more This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described.
FinnWordNet is a WordNet for Finnish that conforms to the framework given in Fellbaum (1998) and ... more FinnWordNet is a WordNet for Finnish that conforms to the framework given in Fellbaum (1998) and Vossen (ed.)(1998). FinnWordNet is open source and currently contains 117,000 synsets. A classic WordNet consists of synsets, or sets of partial synonyms whose shared meaning is described and exemplified by a gloss, a common part of speech and a hyperonym. Synsets in a WordNet are arranged in hierarchical partial orderings according to semantic relations like hyponymy/hyperonymy.
The work is based on the assumption that words with similar syntactic usage have similar meaning,... more The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954, 1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora.
Abstract. Lexical transducers form part of most language-aware applications, which means that les... more Abstract. Lexical transducers form part of most language-aware applications, which means that less time spent on lexical lookup will have wide-ranging effects. The efficiency of a morphological analyzer stems mainly from the properties of the underlying transducer, but the way its transition sets are represented also plays a large role, since this determines how efficiently transitions can be accessed. We consider three principal ways to represent transition sets leading to three different transducer formats.
Abstract. Morphological analysis of a wide range of languages can be implemented efficiently usin... more Abstract. Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [7] and implemented in tools like TwolC and LexC.
Abstract Language software applications encounter new words, eg, acronyms, technical terminology,... more Abstract Language software applications encounter new words, eg, acronyms, technical terminology, loan words, names or compounds of such words. To add new words to a lexicon, we need to indicate their base form and inflectional paradigm. In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer lexicons.
CSLI Studies in Computational Linguistics ONLINE, CSLI Publications, 2005
A paper pre-print of this electronic publication was presented to Professor Kimmo Koskenniemi at ... more A paper pre-print of this electronic publication was presented to Professor Kimmo Koskenniemi at a special Colloquium on Friday, September 2nd, 2005, at the Auditorium of the Arppeanum building of the University of Helsinki, arranged in association with the Workshop on Finite State Methods for Natural Language Processoring (FSMNLP)< http://www. ling. helsinki. fi/events/FSMNLP2005/>. This final electronic version of the publication contains a few minor revisions, as compared to the pre-printed paper version. ...
Abstract This project report describes a multilingual wordnet initiative embarked in the META-NOR... more Abstract This project report describes a multilingual wordnet initiative embarked in the META-NORD project and concerned with the validation and pilot linking between Nordic and Baltic wordnets. The builders of these wordnets have applied very different compilation strategies: The Danish, Icelandic and Swedish wordnets are being developed via monolingual dictionaries and corpora and subsequently linked to Princeton WordNet.
Abstract Corpus-based treebank annotation is known to result in incomplete coverage of mid-and lo... more Abstract Corpus-based treebank annotation is known to result in incomplete coverage of mid-and low-frequency linguistic constructions: the linguistic representation and corpus annotation quality are sometimes suboptimal. Large descriptive grammars cover also many midand low-frequency constructions. We argue for use of large descriptive grammars and their sample sentences as a basis for specifying higher-coverage grammatical representations.
In the theory of linguistic morphology, morphemes are considered to be the smallest meaning-beari... more In the theory of linguistic morphology, morphemes are considered to be the smallest meaning-bearing elements of language, and they can be defined in a language-independent manner. It seems that even approximative automated morphological analysis is beneficial for many natural language applications dealing with large vocabularies, such as speech recognition and machine translation. Many existing applications make use of words as vocabulary units.
Abstract. HFST–Helsinki Finite-State Technology (hfst. sf. net) is a framework for compiling and ... more Abstract. HFST–Helsinki Finite-State Technology (hfst. sf. net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications.
Abstract This paper presents a simple method for finding new synonym candidates for a bilingual w... more Abstract This paper presents a simple method for finding new synonym candidates for a bilingual wordnet by using another bilingual resource. Our goal is to add new synonyms to the existing synsets of the Finnish WordNet, which has direct word sense translation correspondences to the Princeton WordNet. For this task, we use Wikipedia and its links between the articles of the same topic in Finnish and English. One of the automatically extracted groups of synonyms yielded ca. 2,000 synonyms with 89% accuracy.
Abstract This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN... more Abstract This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN), and constructing the FiWN database. FiWN was created by translating all the word senses of the Princeton WordNet (PWN) into Finnish and by joining the translations with the semantic and lexical relations of PWN extracted into a relational (database) format. The approach naturally resulted in a translation relation between PWN and FiWN.
Yleisen kielitieteen laitoksella on meneillään hanke, jonka tavoitteena on kehittää opiskelijapal... more Yleisen kielitieteen laitoksella on meneillään hanke, jonka tavoitteena on kehittää opiskelijapalautteen keruuta laitoksellamme. Oheinen sähköinen lomake on osa hankkeen pilottivaihetta. Lomakkeella kerätään palautetta sekä opiskelemistanne kursseista että itse palautelomakkeesta. Palaute menee kurssin vastuuopettajalle ja kurssin luennoitsijalle, jotka käyttävät sitä kurssin kehittämiseen.
This paper presents a simple method for finding new synonym candidates to a bilingual wordnet by ... more This paper presents a simple method for finding new synonym candidates to a bilingual wordnet by using another bilingual resource. Our goal is to add new synonyms to the existing synsets of the Finnish WordNet, which has direct word sense translation correspondences to the Princeton WordNet. For this task, we use Wikipedia and its links between the articles of the same topic in Finnish and English.
This project report describes a multilingual wordnet initiative embarked in the META-NORD project... more This project report describes a multilingual wordnet initiative embarked in the META-NORD project and concerned with the validation and pilot linking between Nordic and Baltic wordnets. The builders of these wordnets have applied very different compilation strategies: The Danish, Icelandic and Swedish wordnets are being developed via monolingual dictionaries and corpora and subsequently linked to Princeton WordNet.
Abstract We outline the design and creation of a syntactically and morphologically annotated corp... more Abstract We outline the design and creation of a syntactically and morphologically annotated corpora of Finnish for use by the research community. We motivate a definitional, systematic “grammar definition corpus” as a basic step in an threeyear annotation effort to help create systematically documented extensive parsebanks. The syntactic representation, consisting of a dependency structure and a basic set of dependency functions, is outlined with examples.
This paper introduces the META-NORD project which develops Nordic and Baltic part of the European... more This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described.
FinnWordNet is a WordNet for Finnish that conforms to the framework given in Fellbaum (1998) and ... more FinnWordNet is a WordNet for Finnish that conforms to the framework given in Fellbaum (1998) and Vossen (ed.)(1998). FinnWordNet is open source and currently contains 117,000 synsets. A classic WordNet consists of synsets, or sets of partial synonyms whose shared meaning is described and exemplified by a gloss, a common part of speech and a hyperonym. Synsets in a WordNet are arranged in hierarchical partial orderings according to semantic relations like hyponymy/hyperonymy.
The work is based on the assumption that words with similar syntactic usage have similar meaning,... more The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954, 1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora.
Abstract. Lexical transducers form part of most language-aware applications, which means that les... more Abstract. Lexical transducers form part of most language-aware applications, which means that less time spent on lexical lookup will have wide-ranging effects. The efficiency of a morphological analyzer stems mainly from the properties of the underlying transducer, but the way its transition sets are represented also plays a large role, since this determines how efficiently transitions can be accessed. We consider three principal ways to represent transition sets leading to three different transducer formats.
Abstract. Morphological analysis of a wide range of languages can be implemented efficiently usin... more Abstract. Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [7] and implemented in tools like TwolC and LexC.
Abstract Language software applications encounter new words, eg, acronyms, technical terminology,... more Abstract Language software applications encounter new words, eg, acronyms, technical terminology, loan words, names or compounds of such words. To add new words to a lexicon, we need to indicate their base form and inflectional paradigm. In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer lexicons.
Large digital datasets of cuneiform sources lend themselves to computational analysis that can co... more Large digital datasets of cuneiform sources lend themselves to computational analysis that can complement and improve upon traditional philological work. The present article applies social network analysis to an electronic corpus of 1,532 texts to study the god Aššur and his position in divine networks in the Neo-Assyrian period. Our results show that the performance of social network analysis can be improved by using a small window size and calculating tie strengths with pointwise mutual information. This allows us to study the co-occurrences of gods in semantic contexts. From a network perspective, Aššur is not a very central god in our corpus despite his importance in Assyrian royal theology, but he rather joins the existing networks of gods without altering them.
Uploads
Papers by Krister Lindén
Available at https://doi.org/10.1086/703859 and https://researchportal.helsinki.fi/en/publications/a%C5%A1%C5%A1ur-and-his-friends-a-statistical-analysis-of-neo-assyrian-text.