Abstract
In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afantenos, S.D., Karkaletsis, V., Stamatopoulos, P.: Summarization of medical documents: A survey. Artificial Intelligence in Medicine 2(33), 157–177 (2005)
Alonso, L., Fuentes, M.: Integrating cohesion and coherence for Automatic Summarization. In: EACL 2003 Student Session, ACL, Budapest, pp. 1–8 (2003)
Aretoulaki, M.: COSY-MATS: A Hybrid Connectionist-Symbolic Approach To The Pragmatic Analysis Of Texts For Their Automatic Smmarization. PhD thesis, University of Manchester, Institute of Science and Technology, Manchester (1996)
Asterias, J., Comelles, E., Mayor, A.: TXALA un analizador libre de dependencias para el castellano. Procesamiento del Lenguaje Natural 35, 455–456 (2005)
Attardi, G.: Experiments with a Multilanguage Non-Projective Dependency Parser. In: Tenth Conference on Natural Language Learning, New York (2006)
Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Intelligent Scalable Text Summarization Workshop, ACL, Madrid, Spain (1997)
Brandow, R., Mitze, K., Rau, L.: Automatic condensation of electronic publications by sentence selection. Inf. Proc. and Management 31, 675–685 (1995)
da Cunha, I., Ferraro, G., Cabre, T.: Propuesta de etiquetaje discursivo y sintáctico-comunicativo orientado a la evaluación de un modelo lingüístico de resumen automático. In: Conf. Asoc. Española de Lingüística Aplicada, Murcia (2007)
da Cunha, I., Wanner, L.: Towards the Automatic Summarization of Medical Articles in Spanish: Integration of textual, lexical, discursive and syntactic criteria. In: Crossing Barriers in Text Summarization Research, RANLP, Borovets (2005)
Edmundson, H.P.: New Methods in Automatic Extraction. Journal of the Association for Computing Machinery 16, 264–285 (1969)
Fernández, S., SanJuan, E., Torres-Moreno, J.M.: Énergie textuelle de mémoires associatives. Traitement Automatique des Langues Naturelles, pp.25–34 (2007)
Goldstein, J., Carbonell, J., Kantrowitz, M., Mittal, V.: Summarizing text documents: sentence selection and evaluation metrics. In: 22nd Int. ACM SIGIR Research and development in information retrieval, pp. 121–128. ACM Press, New York (1999)
Hajicova, E., Skoumalova, H., Sgall, P.: An Automatic Procedure for Topic-Focus Identification. Computational Linguistics 21(1) (1995)
Hertz, J., Krogh, A., Palmer, G.: Introduction to the theorie of Neural Computation. Addison-Wesley, Redwood City (1991)
Hopfield, J.: Neural networks and physical systems with emergent collective computational abilities. National Academy of Sciences 9, 2554–2558 (1982)
Kupiec, J., Pedersen, J.O., Chen, F.: A trainable document summarizer. In: SIGIR-1995, New York, pp. 68–73 (1995)
Lin, C.Y.: Rouge: A Package for Automatic Evaluation of Summaries. In: Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)
Lin, C., Hovy, E.: Identifying Topics by Position. In: ACL Applied Natural Language Processing Conference, Washington, pp. 283–290 (1997)
Luhn, H.P.: The automatic creation of Literature abstracts. IBM Journal of research and development 2(2) (1959)
A., M.: Towards a Hybrid Abstract Generation System. In: Int. Conf. on New Methods in Language Processing, Manchester, pp. 220–227 (1994)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Marcu, D.: The rhetorical parsing, summarization, and generation of natural language texts. PhD thesis, Dep. of Computer Science, University of Toronto (1998)
Marcu, D.: The Theory and Practice of Discourse Parsing Summarization. Institute of Technology, Massachusetts (2000)
Mel’cuk, I.: Dependency Syntax: Theory and Practice. Albany: State University Press of New York (1988)
Mel’cuk, I.: Communicative Organization in Natural Language. The semantic-communicative structure of sentences. John Benjamins, Amsterdam (2001)
Nomoto, T., Nitta, Y.: A Grammatico-Statistical Approach to Discourse Partitioning. In: 15th Int. Conf. on Comp. Linguistics, Kyoto, pp. 1145–1150 (1994)
Ono, K., Sumita, K., Miike, S.: Abstract generation based on rhetorical structure extraction. In: 15th Int. Conf. on Comp. Linguistics, Kyoto, pp. 344–348 (1994)
Pardo, T., Nunes, M., Rino, M.: DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese. In: SBIA2004, São Luís, pp. 224–234 (2004)
Salton, G., McGill, M.: Introduction to modern information retrieval. Computer Science Series. McGraw Hill Publishing Company, New York (1983)
Schmid, H.: Probabilistic Part-of-speech Tagging Using Decision Trees. In: International Conference on New Methods in Language Processing (1994)
Silber, H.G., McCoy, K.F.: Efficient text summarization using lexical chains. In: Intelligent User Interfaces, pp. 252–255 (2000)
Swales, J.: Genre Analysis: English in Academic and Research Settings. Cambridge University Press, Cambridge (1990)
Teufel, S., Moens, M.: Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status. Computational Linguistics 28 (2002)
Torres-Moreno, J.M., Velázquez-Morales, P., Meunier, J.G.: Condensés de textes par des méthodes numériques. In: JADT, St. Malo, pp. 723–734 (2002)
Vivaldi, J.: Extracción de candidatos a término mediante combinación de estrategias heterogéneas. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, 2001.
Vivaldi, J.: Medical term extraction using the EWN ontology. In: Terminology and Knowledge Engineering, Nancy, pp. 137–142 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da Cunha, I., Fernández, S., Velázquez Morales, P., Vivaldi, J., SanJuan, E., Torres-Moreno, J.M. (2007). A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_83
Download citation
DOI: https://doi.org/10.1007/978-3-540-76631-5_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76630-8
Online ISBN: 978-3-540-76631-5
eBook Packages: Computer ScienceComputer Science (R0)