Abstract
In this paper we outline the design considerations and application of a methodology to author technical documents in order to improve retrieval. Our approach is firmly aimed at large organizations where variations in terminology at personal, national and international scales often impede retrieval of relevant knowledge. We first present the difficulties in performing entity extraction in technical domains and the role variation in terminology has in the information extraction task before outlining and evaluating a methodology that allows for effective retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Just-in-Time Delivery Comes to Knowledge Management. Harvard Business Review 80(7) (July 2002)
Kittredge, R., Lehrberger, J.: Sublanguage: Studies of Language in Restricted Semantic Domains. deGruyter (1982)
Engelson, S.P., Dagan, I.: Minimizing manual annotation cost in supervised training from corpora. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (1996)
Wilson, T., Wiebe, J., Hoffmann, P., et al.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005)
Schlueter, S., Dong, Q., Brendel, V.: GeneSeqer@PlantGDB: gene structure prediction in plant genomes. Nucleic Acids Research 31(13), 3597–3600 (2003)
Grishman, R.: Adaptive Information Extraction and Sublanguage Analysis. In: Proceedings of IJCAI Workshop on Adaptive Text Extraction and Mining, pp. 77–79 (2001)
Ciravegna, F., Dingli, A., Petrelli, D., Wilks, Y.: User-System Cooperation in Document Annotation based on Information Extraction. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 122. Springer, Heidelberg (2002)
Ciravegna, F.: Adaptiveinformationextractionfromtextbyruleinductionandgeneralisation. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI 2001 (2001)
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of ‘the 42nd Annual Meeting of the Association for Computational Linguistics, ACL 2004 (2004)
Zhang, Z., Iria, J.: A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In: Proceedings of the ACL 2009 Workshop on Collaboratively (2009)
Ponzetto, S.P., Strube, M.: Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In: Moore, R.C., Bilmes, J.A., Chu-Carroll, J., Sanderson, M. (eds.) HLT-NAACL. ACL (2006)
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI, pp. 1419–1424. AAAI Press, Menlo Park (2006)
Toraland, A., Munoz, R.: A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In: Workshop on New Text, 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: ACL (2006)
Feldman, R., Rosenfeld, B., Soderland, S., Etzioni, O.: Self-supervised relation extraction from the web. In: ISMIS, pp. 755–764 (2006)
Agichtein, E.: Confidence estimation methods for partially supervised relation extraction. In: SDM 2006 (2006)
Chen, J., Ji, D.-H., Tan, C.L., Niu, Z.-Y.: Semi-supervised relation extraction with label propagation. In: HLT-NAACL (2006)
Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: EMNLP 2003 (2003)
Bhagdev, R., Chakravarthy, A., Chapman, S., Ciravegna, F., Lanfranchi, V.: Creating and Using Organisational Semantic Webs in Large Networked Organisations. In: Proceedings of the 7th International Semantic Web Conference, Karlsruhe, Germany (October 2008)
Liu, H., Lieberman, H., Selker, T.: GOOSE: A Goal-Oriented Search Engine With Commonsense. In: De Bra, P., Brusilovsky, P., Conejo, R. (eds.) AH 2002. LNCS, vol. 2347, p. 253. Springer, Heidelberg (2002)
Giunchiglia, F., Kharkevich, U., Zaihrayeu, I.: Concept search. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 429–444. Springer, Heidelberg (2009)
Sclano, F., Velardi, P.: Termextractor: a web application to learn the shared terminology of emergent web communities. In: Proceedings of the 3rd International Conference on Interoperability for Enterprise Software andApplications, I-ESA 2007 (2007)
Frantzi, K.T., Ananiadou, S.: The c/nc value domain independent method for multi-word term extraction. Journal of Natural Language Processing utilization in the Information Search and Delivery System for IBM Technical Support. IBM Systems Journal 43(3), 546–563 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Butters, J., Ciravegna, F. (2010). Authoring Technical Documents for Effective Retrieval. In: Cimiano, P., Pinto, H.S. (eds) Knowledge Engineering and Management by the Masses. EKAW 2010. Lecture Notes in Computer Science(), vol 6317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16438-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-16438-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16437-8
Online ISBN: 978-3-642-16438-5
eBook Packages: Computer ScienceComputer Science (R0)