Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-04930-9_17guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Context and Domain Knowledge Enhanced Entity Spotting in Informal Text

Published: 06 November 2009 Publication History

Abstract

This paper explores the application of restricted relationship graphs (RDF) and statistical NLP techniques to improve named entity annotation in challenging Informal English domains. We validate our approach using on-line forums discussing popular music. Named entity annotation is particularly difficult in this domain because it is characterized by a large number of ambiguous entities, such as the Madonna album "Music" or Lilly Allen's pop hit "Smile".
We evaluate improvements in annotation accuracy that can be obtained by restricting the set of possible entities using real-world constraints. We find that constrained domain entity extraction raises the annotation accuracy significantly, making an infeasible task practical. We then show that we can further improve annotation accuracy by over 50% by applying SVM based NLP systems trained on word-usages in this domain.

References

[1]
Alba, A., Bhagwan, V., Grace, J., Gruhl, D., Haas, K., Nagarajan, M., Pieper, J., Robson, C., Sahoo, N.: Applications of voting theory to information mashups. In: ICSC, pp. 10-17. IEEE Computer Society, Los Alamitos (2008)
[2]
Ananthanarayanan, R., Chenthamarakshan, V., Deshpande, P.M., Krishnapuram, R.: Rule based synonyms for entity extraction from noisy text. In: ACM Workshop on Analytics for noisy unstructured text data, pp. 31-38 (2008)
[3]
Bozsak, E., Ehrig, M., Handschuh, S., Hotho, A., Maedche, A., Motik, B., Oberle, D., Schmitz, C., Staab, S., Stojanovic, L., Stojanovic, N., Studer, R., Stumme, G., Sure, Y., Tane, J., Volz, R., Zacharias, V.: KAON - towards a large scale semantic web. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 304-313. Springer, Heidelberg (2002)
[4]
Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL. The Association for Computer Linguistics (2006)
[5]
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
[6]
Chieu, H.L., Ng, H.T.: Named entity recognition: A maximum entropy approach using global information. In: COLING (2002)
[7]
Cunha, C., Bestavros, A., Crovella, M.: Characteristics of www client-based traces. Technical report, Boston University, Boston, MA, USA (1995)
[8]
Etzioni, O., Cafarella, M., et al.: Web-scale information extraction in knowitall (preliminary results). In: WWW 2004, pp. 100-110. ACM Press, New York (2004)
[9]
Ide, N., Véronis, J.: Word sense disambiguation: The state of the art. Computational Linguistics 24, 1-40 (1998)
[10]
Aleman-Meza, B., Hassell, J., Arpinar, I.B.: Ontology-driven automatic entity disambiguation in unstructured text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 44-57. Springer, Heidelberg (2006)
[11]
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Machine Learning. LNCS, pp. 137-142. Springer, Heidelberg (1998)
[12]
Marneffe, M., Maccartney, B., Manning, C.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC-2006, pp. 449-454 (2006)
[13]
Minkov, E., Wang, R.C., Cohen, W.W.: Extracting personal names from email: Applying named entity recognition to informal text. In: HLT/EMNLP. The Association for Computational Linguistics (2005)
[14]
Muller, C., Gurevych, I.: Using wikipedia and wiktionary in domain-specific information retrieval. In: Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark (2008)
[15]
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes (2007)
[16]
Doina, T.: Word sense disambiguation by machine learning approach: A short survey. Fundam. Inf. 64(1-4), 433-442 (2004)
[17]
Yarowsky, D.: Hierarchical Decision Lists for WSD. Kluwer Acadmic Publishers, Dordrecht (1999)
[18]
Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge (1949)

Cited By

View all
  • (2015)A Rule-Based Approach to Extracting Relations from Music TidbitsProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2741709(661-666)Online publication date: 18-May-2015
  • (2013)Automatic dominant character identification in fables based on verb analysis - Empirical study on the impact of anaphora resolutionKnowledge-Based Systems10.5555/2770961.277110954:C(147-162)Online publication date: 1-Dec-2013
  • (2012)Context Aware Named Entity DisambiguationProceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 0110.5555/2457524.2457621(402-408)Online publication date: 4-Dec-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ISWC '09: Proceedings of the 8th International Semantic Web Conference
November 2009
1004 pages
ISBN:9783642049293
  • Editors:
  • Abraham Bernstein,
  • David R. Karger,
  • Tom Heath,
  • Lee Feigenbaum,
  • Diana Maynard,
  • Enrico Motta,
  • Krishnaprasad Thirunarayan

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 06 November 2009

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2015)A Rule-Based Approach to Extracting Relations from Music TidbitsProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2741709(661-666)Online publication date: 18-May-2015
  • (2013)Automatic dominant character identification in fables based on verb analysis - Empirical study on the impact of anaphora resolutionKnowledge-Based Systems10.5555/2770961.277110954:C(147-162)Online publication date: 1-Dec-2013
  • (2012)Context Aware Named Entity DisambiguationProceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 0110.5555/2457524.2457621(402-408)Online publication date: 4-Dec-2012
  • (2012)Building a lightweight semantic model for unsupervised information extraction on short listingsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391069(1081-1092)Online publication date: 12-Jul-2012
  • (2012)Various approaches to text representation for named entity disambiguationProceedings of the 14th International Conference on Information Integration and Web-based Applications & Services10.1145/2428736.2428776(256-262)Online publication date: 3-Dec-2012
  • (2011)Bootstrapped named entity recognition for product attribute extractionProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145598(1557-1567)Online publication date: 27-Jul-2011
  • (2011)DBpedia spotlightProceedings of the 7th International Conference on Semantic Systems10.1145/2063518.2063519(1-8)Online publication date: 7-Sep-2011
  • (2011)Disambiguating entity references within an ontological modelProceedings of the International Conference on Web Intelligence, Mining and Semantics10.1145/1988688.1988714(1-11)Online publication date: 25-May-2011
  • (2011)Identifying potential adverse effects using the webJournal of Biomedical Informatics10.1016/j.jbi.2011.07.00544:6(989-996)Online publication date: 1-Dec-2011
  • (2010)Domain adaptation of rule-based annotators for named-entity recognition tasksProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing10.5555/1870658.1870756(1002-1012)Online publication date: 9-Oct-2010
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media