This paper shows the necessity of distinguishing different referential uses of NPs in Machine Tra... more This paper shows the necessity of distinguishing different referential uses of NPs in Machine Translation. We propose a three-way distinction between the generic, referential and as- criptive uses of noun phrases (NPs), and argue that this is the minimum necessary to generate articles and number correctly when translating from Japanese to English. A detailed algorithm is proposed for determining the referentiality of Japanese NPs, based on a defeasible hierarchy of pragmatic rules that are applied top-down, from the clause to the NP. We also sketch the process of generating determiners and number using rules based on the different NP referentialities for a Japanese-English MT system. Using the proposed heuristics has raised the percentage of NPs generated with correct use of articles and number in the Japanese-English MT system ALT-J/E from 65% to 85%.
This is the proceedings of the second ACL workshop on multiword expressions (MWEs). MWEs are incr... more This is the proceedings of the second ACL workshop on multiword expressions (MWEs). MWEs are increasingly being singled out as a problem for NLP, particularly for the many applications which require some degree of semantic interpretation and require tasks such as parsing and word sense disambiguation. In the call for papers we solicited papers that especially laid emphasis on integrating analysis, acquisition and treatment of various kinds of multiword expressions in natural language NLP. For example, research that combines a linguistic analysis with a method of automatically acquiring the classes described, work that combines the computational treatment of a class of MWEs with a solid linguistic analysis and research that extracts MWEs and either classifies them or uses them in some task. We received 23 submissions (3 from Asia, 11 from Europe and 9 from the Americas), and accepted 11 of them for presentation, with two reserves. Each submission was reviewed by three members of the program committee, who not only judged each submission but also gave detailed comments to the authors. The overall quality of submissions was high, making the final selection very difficult. The papers in these proceedings are those which were finally selected for presentation. Many of the papers deal with MWEs in general, rather than aiming at specific subtypes, with examples from a wide range of languages (Basque, English, Japanese, Portuguese, Russian and Turkish). There were also a variety of formalisms considered (dependency grammar, finite state machines, lexical conceptual structure, HPSG, . . . ) as well as more descriptive papers. The main applications targeted were machine translation and information retrieval.
This paper presents work in progress on the development of derivational links for the Japanese Wo... more This paper presents work in progress on the development of derivational links for the Japanese WordNet, with a focus on the retrieval, validation and elaboration of nouns and verbs linked by the agentive noun derivation. 2,340 such links are generated, of which we validated 833 such pairs. We briefly discuss some challenges in determining valid link pairs as well as their morphosemantic natures. We also consider the possibilities and challenges of automating the discovery of morphosemantic links, by linking our results with current theoretical issues in agentive nominais. In addition, we are currently corroborating these Japanese agentive derivations with English counterparts from the Princeton WordNet and intend to perform a more rigorous cross-lingual comparison.
It is common to discover an epigraph in the opening pages of a novel that highlights one or more ... more It is common to discover an epigraph in the opening pages of a novel that highlights one or more of the major themes and denotes the influence of another author on the composition of the text. Yet, the inclusion of an epigraph also bestows prestige on the citing text – helping the author to select his or her place in the wider literary tradition – and situates the text in a particular genre or historical period. In order to trace the development of what Gérard Genette dubbed the ‘epigraph effect’, we collected 16,963 epigraphs and recorded their provenance (author, work, date, and country of origin). This collection enables us to trace intertextual connections between authors throughout literary history and national traditions.
This paper shows the necessity of distinguishing different referential uses of NPs in Machine Tra... more This paper shows the necessity of distinguishing different referential uses of NPs in Machine Translation. We propose a three-way distinction between the generic, referential and as- criptive uses of noun phrases (NPs), and argue that this is the minimum necessary to generate articles and number correctly when translating from Japanese to English. A detailed algorithm is proposed for determining the referentiality of Japanese NPs, based on a defeasible hierarchy of pragmatic rules that are applied top-down, from the clause to the NP. We also sketch the process of generating determiners and number using rules based on the different NP referentialities for a Japanese-English MT system. Using the proposed heuristics has raised the percentage of NPs generated with correct use of articles and number in the Japanese-English MT system ALT-J/E from 65% to 85%.
This is the proceedings of the second ACL workshop on multiword expressions (MWEs). MWEs are incr... more This is the proceedings of the second ACL workshop on multiword expressions (MWEs). MWEs are increasingly being singled out as a problem for NLP, particularly for the many applications which require some degree of semantic interpretation and require tasks such as parsing and word sense disambiguation. In the call for papers we solicited papers that especially laid emphasis on integrating analysis, acquisition and treatment of various kinds of multiword expressions in natural language NLP. For example, research that combines a linguistic analysis with a method of automatically acquiring the classes described, work that combines the computational treatment of a class of MWEs with a solid linguistic analysis and research that extracts MWEs and either classifies them or uses them in some task. We received 23 submissions (3 from Asia, 11 from Europe and 9 from the Americas), and accepted 11 of them for presentation, with two reserves. Each submission was reviewed by three members of the program committee, who not only judged each submission but also gave detailed comments to the authors. The overall quality of submissions was high, making the final selection very difficult. The papers in these proceedings are those which were finally selected for presentation. Many of the papers deal with MWEs in general, rather than aiming at specific subtypes, with examples from a wide range of languages (Basque, English, Japanese, Portuguese, Russian and Turkish). There were also a variety of formalisms considered (dependency grammar, finite state machines, lexical conceptual structure, HPSG, . . . ) as well as more descriptive papers. The main applications targeted were machine translation and information retrieval.
This paper presents work in progress on the development of derivational links for the Japanese Wo... more This paper presents work in progress on the development of derivational links for the Japanese WordNet, with a focus on the retrieval, validation and elaboration of nouns and verbs linked by the agentive noun derivation. 2,340 such links are generated, of which we validated 833 such pairs. We briefly discuss some challenges in determining valid link pairs as well as their morphosemantic natures. We also consider the possibilities and challenges of automating the discovery of morphosemantic links, by linking our results with current theoretical issues in agentive nominais. In addition, we are currently corroborating these Japanese agentive derivations with English counterparts from the Princeton WordNet and intend to perform a more rigorous cross-lingual comparison.
It is common to discover an epigraph in the opening pages of a novel that highlights one or more ... more It is common to discover an epigraph in the opening pages of a novel that highlights one or more of the major themes and denotes the influence of another author on the composition of the text. Yet, the inclusion of an epigraph also bestows prestige on the citing text – helping the author to select his or her place in the wider literary tradition – and situates the text in a particular genre or historical period. In order to trace the development of what Gérard Genette dubbed the ‘epigraph effect’, we collected 16,963 epigraphs and recorded their provenance (author, work, date, and country of origin). This collection enables us to trace intertextual connections between authors throughout literary history and national traditions.
Uploads
Papers by Francis Bond