Abstract
The development of automatic methods to produce usable structured information from unstructured text sources is extremely valuable to the oil and gas industry. A structured resource would allow researches and industry professionals to write relatively simple queries to retrieve all the information regards transcriptions of any accident. Instead of the thousands of abstracts provided by querying the unstructured corpus, the queries on structured corpus would result in a few hundred well-formed results.
On this paper we propose and evaluate information extraction techniques in occupational health control process, particularly, for the case of automatic detection of accidents from unstructured texts. Our proposal divides the problem in subtasks such as text analysis, recognition and classification of failed occupational health control, resolving accidents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Vapnik, V.: The nature of statistical learning theory. Springer (1995)
Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the Society for Information Science 41, 391–407 (1990)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)
Bloehdorn, S., Hotho, A.: Text Classification by Boosting Weak Learners based on Terms and Concepts. In: 4th IEEE International Conference on Data Mining, ICDM 2004 (2004)
Nagarajan, M., Sheth, A.P., Aguilera, M., Keeton, K., Merchant, A., Uysal, M.: Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence. LSDIS Technical Report (November 2006)
Fang, J., Guo, L., Wang, X., Yang, N.: Ontology-Based Automatic Classification and Ranking for Web Documents. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), pp. 627–631 (2007)
Camous, F., Blott, S., Smeaton, A.F.: Ontology-based MEDLINE document classification. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 439–452. Springer, Heidelberg (2007)
Gabrilovich, E., Markovitch, S.: Overcomingthe Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: 21st National Conference on Artificial Intelligence, Boston, MA, USA (2006)
Wu, S.-H., Tsai, T.-H., Hsu, W.-L.: Text categorization using automatically acquired domain ontology. In: 6th International Workshop on Information Retrieval with Asian Languages, Sapporo, Japan, vol. 11 (2003)
Sheth, A.P., Bertram, C., Avant, D., Hammond, B., Kochut, K.J., Warke, Y.: Semantic Content Management for Enterprises and the Web. IEEE Internet Computing (July/August 2002)
Hammond, B., Sheth, A.P., Kochut, K.J.: Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content. In: Real World Semantic Web Applications. IOS Press (2002)
Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)
Sheth, A.P., Arpinar, I.B., Kashyap, V.: Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships. In: Nikravesh, M., Azvin, B., Yager, R., Zadeh, L. (eds.) Enhancing the Power of the Internet. Stud Fuzz. Springer (2003)
Gospodnetic, O., Hatcher, E., McCandless, M.: Lucene in Action, 2nd edn. Manning Publications (2009) ISBN 1-9339-8817-7
DicSin: Dicionário de Sinônimos Português Brasil. Apache OpenOffice.org (2013), http://extensions.openoffice.org/en/project/DicSin-Brasil
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sanchez-Pi, N., Martí, L., Garcia, A.C.B. (2014). Text Classification Techniques in Oil Industry Applications. In: Herrero, Á., et al. International Joint Conference SOCO’13-CISIS’13-ICEUTE’13. Advances in Intelligent Systems and Computing, vol 239. Springer, Cham. https://doi.org/10.1007/978-3-319-01854-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-01854-6_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01853-9
Online ISBN: 978-3-319-01854-6
eBook Packages: EngineeringEngineering (R0)