Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1118647.1118648dlproceedingsArticle/Chapter ViewAbstractPublication PagesmplConference Proceedingsconference-collections
Article
Free access

Unsupervised learning of morphology for building lexicon for a highly inflectional language

Published: 11 July 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Words play a crucial role in aspects of natural language understanding such as syntactic and semantic processing. Usually, a natural language understanding system either already knows the words that appear in the text, or is able to automatically learn relevant information about a word upon encountering it. Usually, a capable system---human or machine, knows a subset of the entire vocabulary of a language and morphological rules to determine attributes of words not seen before. Developing a knowledge base of legal words and morphological rules is an important task in computational linguistics. In this paper, we describe initial experiments following an approach based on unsupervised learning of morphology from a text corpus, especially developed for this purpose. It is a method for conveniently creating a dictionary and a morphology rule base, and is, especially suitable for highly inflectional languages like Assamese. Assamese is a major Indian language of the Indic branch of the Indo-European family of languages. It is used by around 15 million people.

    References

    [1]
    Shwartz, Steven C., 1986. Applied Natural Language Processing. Petrocelli Books, Princeton, New Jersey
    [2]
    Rich, Alaine and Knight, Kelvin, 1991. Artificial Intelligence, 2e. Tata McGraw-Hill Publishing Company Limited, New Delhi
    [3]
    Allen, James, 1995. Natural Language Understanding, 2e. The Benjamin/Cummings Publishing Company Inc., Redwood City
    [4]
    Bora, Satyanath, 1968. bahal byaakaran. Jnananath Bora, Guwahati
    [5]
    Goswami, Golokchandra, 1990. asamiyaa byaakaranar moulik bisaar. Bina Library, Guwahati
    [6]
    Choudhury, Bhupendranath, 18e, 1973. asamiyaa bhaashaar byaakaran, pratham bhaag. Lawyer's Book Stall, Guwahati
    [7]
    Sarma, Durgashankar Dev, 1977. sahaj byaakaran. Assam State Textbook Production and Publication Corporation Ltd., Guwahati-1
    [8]
    Baruah, Hemchandra, 1985 Hem Kosha, 6e. Hemkosh Prakashan, Guwahati
    [9]
    Verma, Shyamji Gokul, 1981. Maanak Hindi Byaakaran Tatha Rachnaa. Arya Book Depot, New Delhi-5
    [10]
    Whitney, William Dwight, 1977. Sanskrit Grammar. Motilal Banarasidass, Delhi.
    [11]
    Whitney, William Dwight, 1979. Roots, Verb Forms and Primary Derivatives of the Sanskrit Language. Motilal Banarasidass, Delhi.
    [12]
    Gabor Proszeky and Balazs Kis, "A Unification-based Approach to Morpho-syntactic Parsing of Agglutinative and Other (Highly) Inflectional Languages". ACI'99 37th Annual Meeting of the Association of Computational Linguistics
    [13]
    Bharati, Akshar, Chaitanya, Vineet and Sangal, Rajeev, 1995 Natural Language Processing - A Paninian Perspective. Prentice-Hall of India Pvt Ltd., New Delhi
    [14]
    Goldsmith, John, "Unsupervised Learning of the Morphology of a Natural Language" Computational Linguistics, 27:2 (2001), pp 153--193, Association of Computational Linguistics
    [15]
    Kazakov, Dimitar, "Unsupervised Learning of Naive Morphology with Genetic Algorithms" Workshop Notes of the ECML/MLnet Workshop on Empirical Learning of Natural Language Processing Tasks, pp 105--112, April 26, 1997, Prague, Czech Republic

    Cited By

    View all
    • (2014)AMRITA_CEN@FIRE-2014Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824883(112-120)Online publication date: 5-Dec-2014
    • (2014)Stemming resource-poor Indian languagesACM Transactions on Asian Language Information Processing10.1145/262967013:3(1-26)Online publication date: 3-Oct-2014
    • (2013)An improved stemming approach using HMM for a highly inflectional languageProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I10.1007/978-3-642-37247-6_14(164-173)Online publication date: 24-Mar-2013
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    MPL '02: Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
    July 2002
    82 pages
    • Program Chair:
    • Mike Maxwell

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    Published: 11 July 2002

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)8

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)AMRITA_CEN@FIRE-2014Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824883(112-120)Online publication date: 5-Dec-2014
    • (2014)Stemming resource-poor Indian languagesACM Transactions on Asian Language Information Processing10.1145/262967013:3(1-26)Online publication date: 3-Oct-2014
    • (2013)An improved stemming approach using HMM for a highly inflectional languageProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I10.1007/978-3-642-37247-6_14(164-173)Online publication date: 24-Mar-2013
    • (2008)ParaMor and Morpho challenge 2008Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access10.5555/1813809.1813958(967-974)Online publication date: 17-Sep-2008
    • (2008)Acquisition of Morphology of an Indic Language from Text CorpusACM Transactions on Asian Language Information Processing10.1145/1386869.13868717:3(1-33)Online publication date: 1-Jun-2008
    • (2007)Development of prototype morphological analyzer for the South Indian language of KannadaProceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers10.5555/1780653.1780677(109-116)Online publication date: 10-Dec-2007
    • (2006)Poor man’s stemmingProceedings of the Third Asia conference on Information Retrieval Technology10.5555/2111235.2111269(323-337)Online publication date: 16-Oct-2006
    • (2006)A naive theory of affixation and an algorithm for extractionProceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology10.5555/1622165.1622175(79-88)Online publication date: 8-Jun-2006

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media