Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving ontology-based text classification

Published: 01 September 2016 Publication History

Abstract

Information retrieval has been widely studied due to the growing amounts of textual information available electronically. Nowadays organizations and industries are facing the challenge of organizing, analyzing and extracting knowledge from masses of unstructured information for decision making process. The development of automatic methods to produce usable structured information from unstructured text sources is extremely valuable to them. Opposed to the traditional text classification methods that need a set of well-classified trained corpus to perform efficient classification; the ontology-based classifier benefits from the domain knowledge and provides more accuracy. In a previous work we proposed and evaluated an ontology-based heuristic algorithm 28 for occupational health control process, particularly, for the case of automatic detection of accidents from unstructured texts. Our extended proposal is more domain dependent because it uses technical terms and contrast the relevance of these technical terms into the text, so the heuristic is more accurate. It divides the problem in subtasks such as: (i) text analysis, (ii) recognition and (iii) classification of failed occupational health control, resolving accidents as text analysis, recognition and classification of failed occupational health control, resolving accidents.

References

[1]
Apache OpenOffice.org, DicSin: Dicionário de sinônimos Português/Brasil. http://extensions.openoffice.org/en/project/DicSin-Brasil
[2]
S. Bloehdorn, A. Hotho, Text classification by boosting weak learners based on terms and concepts, in: Fourth IEEE International Conference on Data Mining, IEEE, 2004, pp. 331-334.
[3]
R.C. Bodner, F. Song, Knowledge-Based Approaches to Query Expansion in Information Retrieval, Springer, 1996.
[4]
M.L. Borrajo, B. Baruque, E. Corchado, J. Bajo, J.M. Corchado, Hybrid neural intelligent system to predict business failure in small-to-medium-size enterprises, Int. J. Neural Syst., 21 (2011) 277-296.
[5]
F. Camous, S. Blott, A.F. Smeaton, Ontology-based MEDLINE document classification, in: Bioinformatics Research and Development, Springer, 2007, pp. 439-452.
[6]
K. Dave, S. Lawrence, D.M. Pennock, Mining the peanut gallery: opinion extraction and semantic classification of product reviews, in: Proceedings of the 12th International Conference on World Wide Web, ACM, 2003, pp. 519-528.
[7]
F. De la Prieta, A.B. Gil, S. Rodríguez, J.B. Pérez, J.A.G. Coria, J.M. Corchado, An enhanced approach to retrieve learning resources over the cloud, in: The 2nd International Workshop on Learning Technology for Education in Cloud, Springer, 2014, pp. 193-203.
[8]
J.F. De Paz, J. Bajo, V.F. López, J.M. Corchado, Biomedic organizations: an intelligent dynamic architecture for KDD, Inf. Sci., 224 (2013) 49-61.
[9]
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci., 41 (1990) 391-407.
[10]
J. Fang, L. Guo, X. Wang, N. Yang, Ontology-based automatic classification and ranking for Web documents, in: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 3, IEEE, 2007, pp. 627-631.
[11]
C. Fellbaum, WordNet, Wiley Online Library, 1999.
[12]
E. Gabrilovich, S. Markovitch, Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge, in: 21st National Conference on Artificial Intelligence, vol. 6, 2006, pp. 1301-1306.
[13]
S.J. Green, Building hypertext links by computing semantic similarity, IEEE Trans. Knowl. Data Eng., 11 (1999) 713-730.
[14]
T.R. Gruber, A translation approach to portable ontology specifications, Knowl. Acquis., 5 (1993) 199-220.
[15]
B. Hammond, A. Sheth, K. Kochut, A modular document enhancement platform for semantic applications over heterogeneous content, in: Real World Semantic Web Applications, vol. 92, 2002, pp. 29.
[16]
E. Hatcher, O. Gospodnetic, M. McCandless, Lucene in Action, Manning Publications, Greenwich, 2004.
[17]
T. Hirsimäki, J. Pylkkonen, M. Kurimo, Importance of high-order n-gram models in morph-based speech recognition, IEEE Trans. Audio Speech Lang. Process., 17 (2009) 724-732.
[18]
A. Hotho, S. Staab, G. Stumme, Ontologies improve text document clustering, in: Third IEEE International Conference on Data Mining, IEEE, 2003, pp. 541-544.
[19]
Y. Huang, Support vector machines for text categorization based on latent semantic indexing, Electrical and Computer Engineering Department, the Johns Hopkins University, 2003.
[20]
T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: Lecture Notes in Computer Science, vol. 1398, Springer, Berlin/Heidelberg, 1998, pp. 137-142.
[21]
T.K. Landauer, P.W. Foltz, D. Laham, An introduction to latent semantic analysis, Discourse Process., 25 (1998) 259-284.
[22]
D.D. Lewis, Naive (Bayes) at forty: the independence assumption in information retrieval, in: Lecture Notes in Computer Science, vol. 1398, Springer, 1998, pp. 4-15.
[23]
H. Masataki, Y. Sgisaka, Variable-order n-gram generation by word-class splitting and consecutive word grouping, in: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, 1996, pp. 188-191.
[24]
M. Nagarajan, A. Sheth, M. Aguilera, K. Keeton, A. Merchant, M. Uysal, Altering document term vectors for classification: ontologies as expectations of co-occurrence, in: Proceedings of the 16th International Conference on World Wide Web, ACM, 2007, pp. 1225-1226.
[25]
P. Rosso, E. Ferretti, D. Jiménez, V. Vidal, Text categorization and information retrieval using wordnet senses, in: Proceedings of the Second International Conference of the Global WordNet Association, 2004, pp. 299-304.
[26]
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.
[27]
N. Sanchez-Pi, L. Martí, A.C. Bicharra Garcia, Information extraction techniques for health, safety and environment applications in oil industry, in: International Conference Intelligent Systems and Agents, 2013, pp. 115-117.
[28]
N. Sanchez-Pi, L. Martí, A.C. Bicharra Garcia, Text classification techniques in oil industry applications, in: Advances in Intelligent Systems and Computing, vol. 239, Springer International Publishing, 2014, pp. 211-220.
[29]
F. Sebastiani, Machine learning in automated text categorization, ACM Comput. Surv., 34 (2002) 1-47.
[30]
A. Sheth, C. Bertram, D. Avant, B. Hammond, K. Kochut, Y. Warke, Managing semantic content for the web, IEEE Internet Comput., 6 (2002) 80-87.
[31]
M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification tasks, Inf. Process. Manag., 45 (2009) 427-437. http://www.sciencedirect.com/science/article/pii/S0306457309000259
[32]
V. Vapnik, The Nature of Statistical Learning Theory, Springer, 2000.
[33]
M. Wo¿niak, M. Graña, E. Corchado, A survey of multiple classifier systems as hybrid systems, Inf. Fusion, 16 (2014) 3-17.
[34]
S.-H. Wu, T.-H. Tsai, W.-L. Hsu, Text categorization using automatically acquired domain ontology, in: Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, vol. 11, Association for Computational Linguistics, 2003, pp. 138-145.

Cited By

View all
  • (2023)An integrated approach using rough set theory, ANFIS, and Z-number in occupational risk predictionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105515117:PAOnline publication date: 1-Jan-2023
  • (2023)Empirical Exploration of Open-Source Issues for Predicting Privacy ComplianceAdvances in Conceptual Modeling10.1007/978-3-031-47112-4_6(63-73)Online publication date: 6-Nov-2023
  • (2022)Knowledge-Infused Text Classification for the Biomedical DomainInternational Journal of Information System Modeling and Design10.4018/IJISMD.30663513:10(1-15)Online publication date: 16-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Applied Logic
Journal of Applied Logic  Volume 17, Issue C
September 2016
58 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2016

Author Tags

  1. Oil and gas industry
  2. Ontology
  3. Text classification

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An integrated approach using rough set theory, ANFIS, and Z-number in occupational risk predictionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105515117:PAOnline publication date: 1-Jan-2023
  • (2023)Empirical Exploration of Open-Source Issues for Predicting Privacy ComplianceAdvances in Conceptual Modeling10.1007/978-3-031-47112-4_6(63-73)Online publication date: 6-Nov-2023
  • (2022)Knowledge-Infused Text Classification for the Biomedical DomainInternational Journal of Information System Modeling and Design10.4018/IJISMD.30663513:10(1-15)Online publication date: 16-Sep-2022
  • (2022)Requirements for AI Support in Occupational Safety Risk AnalysisProceedings of Mensch und Computer 202210.1145/3543758.3547576(561-565)Online publication date: 4-Sep-2022
  • (2022)Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representationNeurocomputing10.1016/j.neucom.2021.10.100484:C(223-237)Online publication date: 1-May-2022
  • (2022)A feature selection method based on term frequency difference and positive weighting factorData & Knowledge Engineering10.1016/j.datak.2022.102060141:COnline publication date: 1-Sep-2022
  • (2022)Classification and pattern extraction of incidents: a deep learning-based approachNeural Computing and Applications10.1007/s00521-021-06780-334:17(14253-14274)Online publication date: 1-Sep-2022
  • (2021)Semantic enrichment of documents: a classification perspective for ontology-based imbalanced semantic descriptionsKnowledge and Information Systems10.1007/s10115-021-01615-y63:11(3001-3039)Online publication date: 1-Nov-2021
  • (2021)Federating Scholarly Infrastructures with GraphQLTowards Open and Trustworthy Digital Societies10.1007/978-3-030-91669-5_24(308-324)Online publication date: 1-Dec-2021
  • (2019)Ontology-Based Framework for the Automatic Recognition of Activities of Daily Living Using Class Expression Learning TechniquesScientific Programming10.1155/2019/29172942019Online publication date: 1-Jan-2019
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media