Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2661829.2661954acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Searching Locally-Defined Entities

Published: 03 November 2014 Publication History

Abstract

When consuming content, users typically encounter entities that they are not familiar with. A common scenario is when users want to find information about entities directly within the content they are consuming. For example, when reading the book "Adventures of Huckleberry Finn", a user may lose track of the character Mary Jane and want to find some paragraph in the book that gives relevant information about her. The way this is achieved today is by invoking the ubiquitous Find function ("Ctrl-F"). However, this only returns exact-matching results without any relevance ranking, leading to a suboptimal user experience.
How can we go beyond the Ctrl-F function? To tackle this problem, we present algorithms for semantic matching and relevance ranking that enable users to effectively search and understand entities that have been defined in the content that they are consuming, which we call locally-defined entities. We first analyze the limitations of standard information retrieval models when applied to searching locally-defined entities, and then we propose a novel semantic entity retrieval model that addresses these limitations. We also present a ranking model that leverages multiple novel signals to model the relevance of a passage. A thorough experimental evaluation of the approach in the real-word application of searching characters within e-books shows that it outperforms the baselines by 60%+ in terms of NDCG.

References

[1]
S. Bonzi and E. Liddy. The use of anaphoric resolution for document description in information retrieval. Inf. Process. Manage., 25(4):429--441, June 1989.
[2]
C. J. Burges. From ranknet to lambdarank to lambdamart: An overview. MSR-TR-2010-82, 2010.
[3]
C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In NIPS, pages 193--200, 2006.
[4]
J. P. Callan. Passage-level evidence in document retrieval. In SIGIR '94, pages 302--310, 1994.
[5]
C. L. A. Clarke, G. V. Cormack, and T. R. Lynam. Exploiting redundancy in question answering. In SIGIR '01, pages 358--365, 2001.
[6]
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In EMNLP-CoNLL, pages 708--716, 2007.
[7]
R. J. Edens, H. L. Gaylard, G. J. Jones, and A. M. Lam-Adesina. An investigation of broad coverage automatic pronoun resolution for information retrieval. In SIGIR '03, pages 381--382, 2003.
[8]
H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR '04, pages 49--56, 2004.
[9]
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, pages 363--370, 2005.
[10]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.
[11]
D. J. Harper, S. Coulthard, and S. Yixing. A language modelling approach to relevance profiling for document browsing. In JCDL '02, pages 76--83, 2002.
[12]
D. J. Harper, I. Koychev, Y. Sun, and I. Pirie. Within-document retrieval: A user-centred evaluation of relevance profiling. Inf. Retr., 7(3-4):265--290, Sept. 2004.
[13]
M. A. Hearst. Tilebars: Visualization of term distribution information in full text information access. In CHI '95, pages 59--66, 1995.
[14]
J. Jiang and C. Zhai. Extraction of coherent relevant passages using hidden markov models. ACM Trans. Inf. Syst., 24(3):295--319, July 2006.
[15]
K. S. Jones and C. J. van Rijsbergen. Report on the need for and the provision of an 'ideal' information retrieval test collection. Tech. Rep., University of Cambridge, 1975.
[16]
M. Kaszkiel and J. Zobel. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology, 52(4):344--364, 2001.
[17]
V. Lavrenko and W. B. Croft. Relevance-based language models. In SIGIR '01, pages 120--127, 2001.
[18]
H. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics, 39(4):885--916, 2013.
[19]
J. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. R. Karger. What makes a good answer? the role of context in question answering. In INTERACT '03, pages 25--32, 2003.
[20]
F. Loizides and G. R. Buchanan. The myth of find: user behaviour and attitudes towards the basic search feature. In JCDL '08, pages 48--51, 2008.
[21]
Y. Lv and C. Zhai. Positional language models for information retrieval. In SIGIR '09, pages 299--306, 2009.
[22]
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR '05, pages 472--479, 2005.
[23]
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM '07, pages 233--242, 2007.
[24]
R. Mitkov. Anaphora resolution, volume 134. Longman London, 2002.
[25]
S.-H. Na and H. T. Ng. A 2-poisson model for probabilistic coreference of named entities for improved text retrieval. In SIGIR '09, pages 275--282, 2009.
[26]
D. Petkova and W. B. Croft. Proximity-based document representation for named entity retrieval. In CIKM '07, pages 731--740, Lisbon, Portugal, 2007.
[27]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR '98, pages 275--281, 1998.
[28]
S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR '94, pages 232--241, 1994.
[29]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC '94, pages 109--126, 1994.
[30]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96, pages 21--29, 1996.
[31]
K. Spärck Jones. Automatic summarising: The state of the art. Inf. Process. Manage., 43(6):1449--1481, Nov. 2007.
[32]
S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In SIGIR '03, pages 41--47.
[33]
A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In SIGIR '98, pages 2--10, 1998.
[34]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR '01, pages 334--342, 2001.

Cited By

View all
  • (2023)On the current state of query formulation for book searchArtificial Intelligence Review10.1007/s10462-023-10483-756:10(12085-12130)Online publication date: 19-Apr-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. descriptiveness
  2. locally-defined entities
  3. within-document search

Qualifiers

  • Research-article

Conference

CIKM '14
Sponsor:

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)On the current state of query formulation for book searchArtificial Intelligence Review10.1007/s10462-023-10483-756:10(12085-12130)Online publication date: 19-Apr-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media