Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2856151.2856163guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

User-relevant access to textual information through flexible identification of terms: a semi-automatic method and software based on a combination of n-grams and surface linguistic filters

Published: 12 April 2000 Publication History
  • Get Citation Alerts
  • Abstract

    We present a semi-automatic method and software tool for multi-word term identification. Our approach is hybrid in that it combines numeric computations (N-grams) to linguistic filters. The software tool is different from most other term identification tools in that is it by design semi-automatic: i. e. it is interactive and constantly under the user's control. The software supports the knowledge engineer's work, the (corpus) domain's expert, or the linguist, by helping them do their job more efficiently. We justify this semi-automatic approach by the need to have a more flexible and customisable tool to perform certain term identification tasks. More specifically, in some applications we want to allow the user's perspective, knowledge and subjectivity, influence the results: all this within certain limits, of course. An example of such an application on which we are currently working is that of Web personalisation: to allow individuals to develop their own vision of information universes of interest to them, we need flexible and customisable tools that can support them in such a challenging task, not tools that will impose on them a pseudo-standardised vision of the world.

    References

    [1]
    Ananiadou, S. (1994). "A Methodology for Automatic Term Recognition", Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), Kyoto, Japan, 5-9 August 1994, 1034--1038 (Vol.2).
    [2]
    Barker, K., Delisle, S. & Szpakowicz, S. (1998). "Test-Driving TANKA: Evaluating a Semiautomatic System of Text Analysis for Knowledge Acquisition", 12th Biennial Conference of the Canadian Society for Computational Studies of intelligence (CAI'98), Vancouver (B.C.), Canada, June 18-20 1998, 60--71. Published in Lectures Notes in Artificial Intelligence #1418, Springer.
    [3]
    Biskri, I. & Delisle, S. (1999). "Un modèle hybride pour le textual data mining - un mariage de raison entre le numérique le linguistique", Actes de la 6ème Conférence Annuelle sur le Traitement Automatique des Langues Naturelles (TALN-99), Cargèse (Corse), France, 12-17 juillet 1999, 55--64.
    [4]
    Biskri, I. & Meunier, J.-G., (1998). "Vers un modèle hybride pour le traitement de l'information lexicale dans les bases de données textuelles", Actes du Colloque International JADT-98, Nice, France.
    [5]
    Bourigault, D. (1996). "Conception et exploitation d'un logiciel de termes: problèmes théoriques et méthodologiques", IVème journées scientifiques du réseau thématique_--- AUPELF-UREF, Lyon, France, 1996, 137--146.
    [6]
    Collier, N., Hirakawa, H. & Kumano, A. (1998). "Machine Translation vs Dictionary Term Translation -- a Comparison for English-Japanese News Article Alignment", Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL'98), Montreal (Quebec), Canada, 263--267.
    [7]
    Dagan I. & Church, K. (1994). "Termight: Identifying and Translating Technical Terminology", Proceeding of the Fourth Conference on Applied Natural Language Processing, Association for Computational Linguistics, Stuttgart, Germany, 13-15 October 1994, 34--40.
    [8]
    Daille, B. (1994). "Study and Implementation of Combined Techniques for Automatic Extraction of Terminology", Proceedings of the Combining Symbolic and Statistical Approaches to Language Workshop (the Balancing Act), Las Cruces (New Mexico), USA, 1st July 1994, 29--36.
    [9]
    Delisle, S., Barker, K., Copeck, T. & Szpakowicz, S. (1996). "Interactive Semantic Analysis of Technical Texts", Computational Intelligence, 12(2), 273--306.
    [10]
    Enguehard, C. (1993). "Acquisition de terminologie à partir de gros corpus", Actes de la Conférence Informatique & Langue Naturelle---ILN'93, Nantes, France, décembre 1993, 373--384.
    [11]
    Frantzi, K. T. (1997). "Incorporating context information for the extraction of terms", Proceeding of the 35th Annual Meeting and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, 7-12 July 1997, 501--503.
    [12]
    Jouis, C. (1993). Contributions à la conceptualisation et à la modélisation des connaissances à partir d'une analyse linguistique de textes---Réalisation d'un prototype: le système SEEK, Thèse de doctorat de l'École des Hautes Études en.Sciences Sociales, Paris.
    [13]
    Lauriston, A. (1995). "Criteria for Measuring Term Recognition", Seventh Conference of the European Chapter of the Association for Computational Linguistics, Belfield (Dublin), Ireland, 27-31 March 1995, 17--22.
    [14]
    Nait-Baha, L., Jackiewiez, A. & Laublet, P. (1998). "Reformulation de requêtes et extraction de phrases pertinentes pour la collecte d'infomations sur le Web", Rencontre Internationale sur l'Extraction, le Filtrage et le Résumé Automatique (RIFRA-98), Sfax, Tunisie, 11-14 novembre 1998, 177--190.
    [15]
    Nie, J., Chevallet, J. P. & Chiaramella, Y. (1997). "Vers la recherche d'informations à base de termes", Proceedings of the 1st JST FRANCIL de l'AUPELF, Avignon, France, 15-16 avril 1997.
    [16]
    Remaki, L. & Meunier, J. G. (2000). "Un modèle HMM pour la détection des mots composés dans un corpus textuel", Actes de la Conférence JADT-2000, Lausanne, Suisse, 9-11 Mars 2000.
    [17]
    Strzalkowski, T. (1999). Ed., Natural Language Information Retrieval, Kluwer Academie Publishers.

    Index Terms

    1. User-relevant access to textual information through flexible identification of terms: a semi-automatic method and software based on a combination of n-grams and surface linguistic filters
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      RIAO '00: Content-Based Multimedia Information Access - Volume 2
      April 2000
      859 pages

      Publisher

      LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE

      Paris, France

      Publication History

      Published: 12 April 2000

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 88
        Total Downloads
      • Downloads (Last 12 months)49
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media