Abstract
This paper presents an investigation into the summarisation of the free text element of questionnaire data using hierarchical text classification. The process makes the assumption that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. A hierarchical classification approach is suggested which offers the advantage that different levels of classification can be used and the summarisation customised according to which branch of the tree the current document is located. The approach is evaluated using free text from questionnaires used in the SAVSNET (Small Animal Veterinary Surveillance Network) project. The results demonstrate the viability of using hierarchical classification to generate free text summaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afantenos, S. and Karkaletsis, V. and Stamatopoulos, P. (2005). Summarization from medical documents: a survey. Artificial Intelligence in Medicine Vol. 33, pp157-177.
Alonso, L. and Castell’on, I. and Climent, S. and Fuentes, M. and Padr’o, L. and Rodr’ıguez, H (2004). Approaches to text summarization: Questions and answers. Inteligencia Artificial Vol. 8, pp22.
Celikyilmaz, A. and Hakkani-T‥ur, D. (2011). Concept-based classification for multi-document summarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp5540-5543.
Chuang, W. and Tiyyagura, A. and Yang, J. and Giuffrida, G. (2000). A fast algorithm for hierarchical text classification. Data Warehousing and Knowledge Discovery, pp409-418.
Dhillon, I.S. and Mallela, S. and Kumar, R. (2002). Enhanced word clustering for hierarchical text classification. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp191-200.
Dumais, S. and Chen, H. (2000). Hierarchical classification of web content. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp256-263.
Duwairi, R. and Al-Zubaidi, R. (2011). A Hierarchical K-NN Classifier for Textual Data. The International Arab Journal of Information Technology. Vol. 8, pp251-259.
Fragoudis, D. and Meretakis, D. and Likothanassis, S. (2005). Best terms: an efficient featureselection algorithm for text categorization. Knowledge and Information Systems. Vol. 8, pp16- 33.
Gao, F. and Fu, W. and Zhong, Y. and Zhao, D. (2004). Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information. CIS’09. International Conference on Computational Intelligence and Security. Vol. 1, pp54-58.
Garcia-Constantino, M. F. and Coenen, F. and Noble, P. and Radford, A. and Setzkorn, C. and Tierney, A. (2011). An Investigation Concerning the Generation of Text Summarisation Classifiers using Secondary Data. Seventh International Conference on Machine Learning and Data Mining. Springer, pp387-398.
Garcia-Constantino, M. F. and Coenen, F. and Noble, P. and Radford, A. and Setzkorn, C. (2012). A Semi-Automated Approach to Building Text Summarisation Classifiers. To be presented at the Eight International Conference on Machine Learning and Data Mining. Springer.
Granitzer, M. (2003). Hierarchical text classification using methods from machine learning. Master’s Thesis, Graz University of Technology.
Hand, D.J. and Till, R.J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning, 45, pp171-186.
Hardy, H. and Shimizu, N. and Strzalkowski, T. and Ting, L. and Zhang, X. and Wise, G.B. (2002). Cross-document summarization by concept classification. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp121-128.
Jaoua, M. and Hamadou, A. (2003). Automatic text summarization of scientific articles based on classification of extracts population. Computational Linguistics and Intelligent Text Processing, pp363-377.
Jones, K.S. and others. (1999). Automatic summarizing: factors and directions. Advances in automatic text summarization, pp1-12.
Katakis, I. and Tsoumakas, G. and Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. Proceedings of the ECML/PKDD 2008. Workshop in Discovery Challenge, pp75-83. Antwerp, Belgium.
Koller, D. and Sahami, M. (1997). Hierarchically Classifying Documents Using Very Few Words. Proceedings of the Fourteenth International Conference on Machine Learning, pp170- 178.
Kumilachew, A. (2011). Hierarchical Amharic News Text Classification: Using Support Vector Machine Approach. VDM Verlag Dr. M‥uller.
Platt, J.C. (1999). Using analytic QP and sparseness to speed training of support vector machines. Advances in neural information processing systems, pp557-563.
Pulijala, A. and Gauch, S. (2004). Hierarchical text classification. International Conference on Cybernetics and Information Technologies, Systems and Applications: CITSA, pp21-25.
Qiu, X. and Huang, X. and Liu, Z. and Zhou, J. (2011). Hierarchical Text Classification with Latent Concepts. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Vol. 2, pp598-602.
Radford, A. and Tierney, A’. and Coyne, K.P. and Gaskell, R.M. and Noble, P.J. and Dawson, S. and Setzkorn, C. and Jones, P.H. and Buchan, I.E. and Newton, J.R. and Bryan, J.G.E. (2010). Developing a network for small animal disease surveillance. Veterinary Record. Vol. 167, pp472-474.
Rousu, J. and Saunders, C. and Szedmak, S. and Shawe-Taylor, J. (2005). Learning Hierarchical Multi-Category Text Classification Models. Proceedings of the 22nd International Conference on Machine Learning, pp744-751.
Ruiz, M.E. and Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval. Vol. 5, pp87-118.
Saravanan, M. and Raj, P.C.R. and Raman, S. (2003). Summarization and categorization of text data in high-level data cleaning for information retrieval. Applied Artificial Intelligence, Vol. 17, pp461-474.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR). Vol. 34, pp1-47.
Silla, C.N. and Freitas, A.A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery Vol. 22, pp31-72.
Sun, A. and Lim, E.P. (2001). Hierarchical text classification and evaluation. ICDM 2001, Proceedings IEEE International Conference on Data Mining. IEEE, pp521-528.
Toutanova, K. and Chen, F. and Popat, K. and Hofmann, T. (2001). Text classification in a hierarchical mixture model for small training sets. Proceedings of the tenth international conference on Information and knowledge management, pp105-113.
Willett, P. (2006). The Porter stemming algorithm: then and now. Program: electronic library and information systems Vol. 40, pp219-223.
Zheng, Z. and Wu, X. and Srihari, R. (2004). Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter Vol. 6, pp80-89.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Garcia-Constantino, M., Coenen, F., Noble, PJ., Radford, A. (2012). Questionnaire Free Text Summarisation Using Hierarchical Classification. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_3
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_3
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)